Risk quantification, in any field, is not an end in itself. It exists to compel some action. That action might be to drive decisions or simply to inform other analysis which in turn leads to some action. Before I wrote this post I happened to look at the Daily Stoic and by an interesting coincidence today’s quote was quite apt.
For as long as I’ve been involved in security and then more broadly enterprise risk management I’ve been thinking about better means to model and measure information security, cybersecurity and technology risk. My conclusions are simple:
1. Risk quantification and risk communication are two different disciplines
Most criticism of risk quantification is actually criticism of risk communication techniques that have been dressed up or misinterpreted as risk quantification. Pick your tool and use it in the right way. There are a variety of quantification mechanisms ranging from basic counting metrics, Bayesian network models, game-theoretic analysis, to techniques to model for particular scenarios what the frequency distribution of potential events are coupled with the severity distribution if those events occur which is then used to compute a loss distribution to then make more formal risk tolerance decisions. FAIR is a good example of this latter technique but has the added advantage of a well-worked ontology and supporting practice to aid in analytical rigor.
One thing is common to all these techniques, they have to be interpreted and communicated, even to sophisticated audiences. It’s fine for the results of such quantification to be overlaid on grids, to be translated into ratings (High, Medium, Low) as long as the medium of doing that doesn’t lose the message. I would argue that simple pseudo quantification techniques like Risk = Threat x Vulnerability are flawed quite simply because the inputs to such a simple equation can never accurately encapsulate what is going on in a particular situation and it presents an overly simplistic view of risk, for example: Risk (0.3) = Threat (0.5) * Vulnerability (0.6). What does this even mean? (I’ve seen this actual example in an industry report). The other problem to watch out for is the naive assignment of potential monetary losses to metrics developed without using an appropriate loss distribution model.
In terms of risk communication, I would recommend Peter Sandman’s excellent body of work, while being more about crisis communication and precaution advocacy it does have myriad of lessons in how to communicate risk to drive the right outcomes.
2. Risk is managed by experienced people with judgement using data, not by the data alone
The world is littered with failures across many disciplines where the numbers (models or other techniques) suggested a course of action that was, on balance, actually wrong or at best ill-conceived. Many of the mature uses of risk quantification in spaces from safety and hazard analysis, pharmacology to financial risk all have the same theme in common, that they rarely rely on just one number. What they most often do is build decision making processes that are informed by multiple streams of data and risk managers spend a lot of time looking for contradictions in the data that would indicate some underlying problem that needs further investigating. Risk managers also spend a lot of time looking for common themes across the data that increase the confidence level for a subset of potential courses of action. Consider a very simple example, let’s say you have a model that is aiming to predict security incidents in your supply chain and it asserts say a 0.2 likelihood of some security breach from a group of 20 of your most critical vendors in the next 12 months, but an adjacent model predicts a 0.8 likelihood in the same group for a reliability/error event. Both of those might be correct (or reasonable enough) but it certainly warrants a deeper dive in the underlying data to look at why this is divergent when, using your judgement, you know that the underlying controls for both outcomes are often correlated. So, when working with risk quantification think about how it is used and challenged inside a decision making process.
3. Risk quantification has to exist in feedback loops (positive and negative)
Risk quantification approaches need to be constantly refined so you can either build increasing confidence in the approach or that it can be discarded and started over. There are well developed bodies of knowledge and practice to do this across a range of risk disciplines that can be tapped into that include: sense-making, comparability, managing model complexity, validating and back-testing models through to stating and managing ALWs (Assumptions, Limitations and Weaknesses). But the key to all of this is making sure there are one or more feedback loops in place to analyze how accurate the model is in the face of reality and under what conditions the risk quantification approach breaks down. This will guide where the approach should be used and in what way.
4. All risk quantification is wrong, but some is useful (paraphrased)
When approaching risk quantification another mistake I’ve seen is for people to try and get too sophisticated too fast. In reality, some basic metrics like a key control indicator to simply measure what you expect can be remarkably effective but only if you build the process around it to hold the environment to that. In many organizations I’ve seen simply coming up with 20 or so top metrics that are emblematic or can act as a proxy for risk in the environment are good enough. For example: if you can pick 20 metrics that encapsulate a number of the CIS Critical Controls and work like crazy to keep your environment to those then you’re likely to get more benefit than spending your time on more sophisticated approaches.
Avoid cute but inscrutable index based metrics that aggregate counts but don’t reveal the best courses of action. Instead, think about bringing counts together in different ways, for example: rather than some index of control contribution to mitigate the risk of an external threat acting on some sensitive asset simply come up with a control pressure metric that records how many layers of controls did it take to stop an attack. If attacks are mostly getting through the first few layers then that reveals some course of action is needed.
When you do need more advanced methods, because to use basic counts would be too complex and insufficiently forward looking, then be careful how you use them. Methods like FAIR are good, but they are best used in a macro way to influence broad decisions of resource allocation or prioritization vs. being used for every micro decision you might make - if the cost of doing the risk analysis exceeds the cost of implementing the control then just implement the damn control (especially in a regime were you are raising the baseline to reduce the cost of control). The use of Bayesian Network models are remarkably effective at simulating risk and control path analysis, especially when paired with Bow-tie approaches, to reveal some counter-intuition about which of your controls contribute most to mitigating a specific risk.
5. Risk quantification is a multi-disciplinary activity
Cyber risk, or technology risk, quantification approaches should not exist in isolation. Some of the best inputs for cyber are from other risk disciplines in your organization whether its something common to all organizations like data from your SRE or similar process through to more industry specific measures that are emitted from your compliance, safety, quality, financial risk, or scenario planning units.
Speaking of inter-disciplinary, below is a great discussion I had with Dr. Jack Freund as part of this year’s New York University’s Volatility and Risk Institute - an institute founded exactly to look at quantitative methods of how various risks interact as well as looking at risk transmission through businesses, supply chains and the global financial system.
Bottom line: we need to apply more quantitative risk analysis methods to cyber, but to think there will be one unifying approach is naive. Like every other discipline you will need to select the particular method to the task at hand and then iterate. Don’t confuse risk communication techniques with risk quantification techniques.