Risk Appetite and Risk Tolerance - A Practical Approach

If you work for a large organization, especially public or otherwise regulated companies, then you may well have faced the prospect of developing a risk appetite statement. You might have been enthusiastic about this or possibly compelled by a Board member, a regulator or auditor to do it.

This can end up being a "check the box” exercise to develop some abstract statement that no one really uses or values. But it doesn’t have to be this way. Risk appetite, or more specifically, definitions of risk tolerance can be a very useful risk management tool. This is true if (and only if) you relentlessly focus on the specific and hold the line against the abstract. The true test of anything you do here is being brutally honest about whether it actually adds value to the organization’s business or mission objectives.

Let’s look at some practical ways to do this. And, for any risk management purists out there that might want to criticize some of my points as heresy, then I welcome alternative views - but, in my defense, I will point out I have not only been a CISO but have also been a Chief Risk Officer (of enterprise risks) and have led teams that developed risk appetite statements, tolerance, limit and threshold frameworks and have myself struggled (but won through) on moving from the abstract to the specific and getting some value out of this - but only in some of the ways I’m going to talk about.

First, we need to get some definitions out of the way.

ISO Guide 73:2009 Risk Management – Vocabulary defines risk appetite as the “amount and type of risk that an organization is willing to pursue or retain.” In other words it is an approach for organizations to determine how much they are willing to take risks (including financial and operational impacts) in the pursuit of business or mission objectives.
ISO Guide 73:2009 Risk Management – Vocabulary defines risk tolerance as “an organization’s or stakeholder’s readiness to bear the risk after risk treatment in order to achieve its objectives.” It is more granular and covers individual risks more than the aggregated risk appetite.
ISACA’s Risk IT Framework, 2nd Edition, defines risk tolerance as “the acceptable deviation from the level set by the risk appetite and business objectives.” Risk tolerance is usually communicated in quantitative terms, for example: systems should be 99.9% percent available with isolated deviations to 99.5% tolerated.
The COSO ERM Framework defines risk appetite as “the amount of risk, on a broad level, an organization is willing to accept in pursuit of stakeholder value.” Their Strengthening Enterprise Risk Management for Strategic Advantage report explains: “An entity should also consider its risk tolerances, which are levels of variation the entity is willing to accept around specific objectives.

All of this is about as much use as a chocolate teapot. In fact if you read some risk appetite statements of major organizations you will be even more perplexed about how these are useful in any way. I’d go so far to say some statements are actively harmful. For example, many organization’s risk appetite statements say things like, “We have no appetite for security breaches that will impact customer data.” Ok, so we sort of know what they mean. They’re saying all things being equal they’d rather not have a breach happen. But when they say “no appetite” that should really mean that all choices the company makes will be about making sure such a breach doesn’t happen. All choices. No breaches of any form. While many organizations intensively prioritize the protection of customer data there are some risks taken - often in the very service of providing appropriate functionality to those same customers. To think otherwise creates an illusion of defense that forestalls the risk transparency needed to actually reduce the risk of a bad outcome. Worse, it creates the notion in the mind of the inexperienced that it is possible to fully mitigate any risk. If only it were so.

Instead, what we need is a framework that helps management pursue goals to deliver value to customers, grow the business, and benefit stakeholders all without blowing things up along the way. The extent that risk will be taken will vary depending on the situation. For example, a start up business needs to take a lot of risk to establish a position in the market. Even a large corporation when expanding into a new market or geography might need to take on risk that it wouldn’t in one of their more established markets. Therefore, whatever risk management framework we have needs to be a useful tool to help management and the risk and control teams make this balancing act effective.

This is easier for some risks than others. If you’re a bank and are wanting to set risk tolerances for credit risk, in the context of an overall risk appetite, then you will describe actual limits on loans that can be extended. This will relate to the credit scoring of the person or company you’re loaning the money to. If you are too restrictive then you will lose the upside, but if you are too lax then you will suffer losses upon inevitable defaults. What limit you pick vs. the particular scoring criteria is developed over time based on loss history, industry data, external scoring services (e.g. FICO, S&P, Fitch, Moodys) and so on. The credit risk team will constantly adjust this, often with Board Risk Committee oversight. This also works well for other financial risks like market or liquidity risk. It can also work well for some of the more quantitatively expressed operational risks in industries beyond finance. Risks like fraud, retail shrinkage, payment reversals, trade finance losses and so on can be managed quantitatively. This can include trading off inconvenience for legitimate and well behaved customers vs. defending yourself against some level of criminal or reckless losses.

It gets trickier for the harder to quantify risks like many other operational risks, reputation risks, or business/mission strategic risks. This is where risk appetite statements fall down especially on cyber, technology or information risk topics.

So, you’ve been tasked or feel the need the develop something like a risk appetite statement, then do this:

1.Define the Enterprise Risk Management Framework

Yes, this is a bit of a formality but it is still useful to define terms and explain the various roles and responsibilities of all those involved in assessing and managing risk. It’s important to state the goal that risk management is in the service of business or mission outcomes - not the other way round - and how the framework is to be used to inform management choices. Constantly ask: “Does this process, measurement, limit, threshold, artifact actually bring clarity to a decision making process?” In other words, does it all pass the “So what?” test. This framework should be just a few pages, anything more and you’re going to inevitably drift into the abstract.

2.Develop a Risk Taxonomy

You need to list the risks you care about and consistently use the language that describes them. That’s it. Don’t obsess over taxonomical purity. Yes, try and make your list MECE (mutually exclusive and collectively exhaustive) but remember the path to madness lies in obsession over making this perfect. I’ve seen some organizations take quarters (even years) to develop an initial taxonomy until some executive ran out of patience and ordered “pencils down” and, you know what, the imperfect taxonomy was nevertheless perfectly serviceable.

3.Define Risk Limits and Thresholds

Here’s the real meat of the work. For a set of risks you especially care about then define some quantitative measures. If you don’t have good measures of risk then you will need to measure control(s) adherence as a proxy.

Some examples might be:

Financial losses associated with fraud due to account takeover or other abuse.

Financial losses associated with platform abuse from free access tiers.

Conformance to vulnerability resolution SLOs by severity.

Percentage of software that can be reproducibly built.

Percentage of systems deemed stagnant (legacy systems that are unmaintainable or otherwise end of life).

Specific business process measures of ability to meet business resilience goals expressed as RTO/RPO.

Number of people whose privileges exceed a defined potential blast radius.

The list could go on - but be careful about having too many. Managing thresholds and limits from this can be hard work. Apply the rubric of the 80/20, that is what are the 20% of your measures that proxy 80% of the enterprise risk. All of them should be precisely measurable. It doesn’t have to be a monetary value but it has to be a value. Now, you have the measures you can do two things:

Set risk thresholds. These are values that if crossed require some action. It might be a management escalation, a diversion of resources to resolve the problem, a simple alert, or triggering an operational process to further monitor.
Set risk limits. These are values that if crossed require some more stringent action to immediately bring that value below the limit. This might be ceasing a business activity, triggering an extensive event response, convening a Board discussion and so on. Typically each value will have both a threshold and a limit, so action can be taken on a threshold breach well before a limit breach is in sight.

Organizations that do this well often differentiate between Board limits / thresholds and limits at other levels of management. The real value of all of this is the ability to formally agree and communicate those goals to the wider organization so other management processes can be built around the adherence to those goals. However, the vital use of this is to drive two sets of regular conversations:

What are the appropriate thresholds and limits in the first place? Doing this well involves multiple levels of management and the Board in deciding whether the cost and other impact of implementing and sustaining adherence is worth it - in the context of the opportunity cost or business limitations such adherence might create. This can, and possibly should be, a vigorous debate. But in the end there should be no surprises. To pick a risk example to make the illustration more stark: a Board of a Bank has told management to be more aggressive in growing a portfolio of loans to make more profit. When the risk team gets the Board to approve the projected credit loss limit at $X then the Board should not complain when losses rise to that amount or be surprised if some new loans have to be stopped when that limit has been hit. One example I experienced in a prior role was setting an end of life systems limit. The Board wanted this to be 0%, the current reality was much higher. Business unit CIOs wanted to do better but there was insufficient funding to achieve that. So the Board set a ratchet down approach of moving from 20% to 15% to then settle on a steady state of 5% (with some tighter variances for a sub-class of especially critical systems). This debate with business leadership led to allocation of budget and ongoing preventative maintenance funding to hit and then sustain these goals.

When a threshold has been crossed or a limit hit what do we do? I’ve seen plenty of cases where a threshold or even a limit has to be crossed in the service of a particular goal due to some important business growth requirements or time sensitive market opportunities. The existence of the threshold causes a better management discussion and a time-bound deviation decision which ensures resources are available to correct the situation. This resolution usually happens quicker than if the deviation had just been buried in layers of management. One example I remember, also in a prior organization, was during the initial Spectre/Meltdown vulnerabilities. We had a limit on the time and coverage expectation of endpoint security patches. We also had a clearly defined limit on end point availability and recoverability. The nature of some of the early (O/S and browser) patches to mitigate some of the potential effects of these hardware vulnerabilities raised a lot of reliability concerns. So we were in the situation where we couldn’t assure our ability to stay within both limits. We went to the Board Risk Committee and requested a limit increase on patching timeliness for 1 month. They approved, appreciated the discussion and said that if it went longer they would prefer to take a reliability hit than a security hit. They also requested that we inform them if exploits occurred that would prompt a change in stance. This was a great use of defined risk tolerances to drive an executive and Board level business decision even for a technically intricate topic.

4.Defined Levels of Approval for Policy Deviation

There are many situations where there is some unquantifiable aspect of risk management that still needs handling in a more formal way. Examples of this might be a new business or product launch that has a crunch date and is deficient in some way, or an acquisition that you know is in need of much remediation, or bringing on a new supplier that has issues but is your only choice in a specific market.

Now, in each of these situations you could try and contort the potential risk that may result from these issues into some value, perhaps even a monetary value, but that rarely survives scrutiny. Even if it is reasonable it can lead to an excessively limited focus that makes a risk decision that is itself too narrow. For example, let’s say your business needs to launch in a new geography which results in a projected revenue opportunity of $X in a specific period. You might project $Y of losses by launching before certain controls are in place. Classically you would say if $X > $Y then that’s a good business choice. It might be so, but then you might need to consider the wider reputation impact of such losses leading to a loss of confidence (and revenue) in more established markets or even various governments or regulators withdrawing licenses in other markets as a result.

The answer to this is a leadership discussion - perhaps informed by some numbers - but ultimately it‘s a more complex and nuanced business decision. I’ve seen plenty of situations where a decision has been full steam ahead and then fix things later. I’ve also seen many other situations where it’s been slamming on the brakes and delaying the launch. All were subject to much discussion and debate. In some cases where the right answer was to stop, the resulting management decision was to add significantly more people and resources to the work to hit the deadline and implement the controls. On the example of acquisitions, I don’t think I’ve ever been in one that didn’t need post-completion remediation and so most of that part of the process is not seeking to delay the completion but, rather, to make sure the work that needs to be done is clear and budgeted and in extreme situations actually taken off the deal closing price.

So, back to the risk appetite / tolerance topic. Here, instead of numeric thresholds and limits, you are defining a risk escalation framework that triggers these discussions. Typically this is what type of risk or issue in what situation requires what level of leadership involved in that decision. In other words, can a decision be made by a business leader for their division or does it need to be an enterprise wide decision, perhaps by the CEO or Board.

5.Establish a Governance Framework

You need to define a short governance framework for how all this will be managed and tracked. How are decisions on threshold breaches and associated action plans maintained, how are policy deviations recorded and tracked and who makes the call to who if a limit breach is imminent.

Behind the scenes of all of this needs to be a person or team who is looking at themes and taking action. For example, is there one team or business unit that is always breaching a threshold, skirting too close to a limit, or always trying to launch products that are missing key controls and seeking a policy exception. It might be the thresholds and limits are set wrong, it might be more investment is needed in their tooling, or even just a plain old change of leadership focus (or leadership).

6.Measure Effectiveness and Adjust

To be useful over time we need to measure effectiveness and adjust accordingly. Thresholds and limits may need to change over time - either be reconstructed or have the values changed according to how stringent you want to be. It might be thresholds and limits are set with quite wide bands initially but as you implement control programs you can rein them in.

Overall I’ve found the best way to do this is to use triggers, either formally (best) or informally (still pretty useful). Such triggers could include:

Incident or loss levels that contradict what the risk / control threshold and limits indicate they should be.

Look for events broadly that would signal some deficiency in your risk taxonomy (risks falling between the cracks or just outright omitted).

Constant threshold or limit breaches.

Accumulation of risk issue backlogs in the risk register i.e. your risk inventory growing is a signal that you are finding lots of risk but not resolving that risk.

Concentrations of certain executives that always approve all issues.

External triggers like threat intelligence, regulatory outlook, geo-political shifts or other situations that would cause you to want to get more stringent (remember that there could be triggers that might cause relaxation) on thresholds and limits - perhaps just temporarily.

Another useful external trigger is looking at close calls or incidents at other organizations as part of your incident learning and seeing what that says about your thresholds and limits.

Look for contradictory limits emerging, for example where a security limit might contradict a resiliency limit. Sometimes these become apparent only in crises. I was in one organization where we had a tight limit on significant admin privileges that proved too restrictive during a natural disaster event where we nearly hit the situation that insufficient admins were available to enact a change requiring multi-party approval that if not done would have blown the availability limit. In the after action we slightly relaxed the privilege limit and took some other measures to increase resiliency of admin access. In hindsight, we could have foreseen this by doing more formal work to cross reference limits. Subsequently we did look for that pattern and corrected some other tensions.

7.Keep the Formalists Happy (if you have to)

When you’ve done all of this and it’s working well you may still have someone ask “Where’s your risk appetite statement?” If you have to, you can write a conventional risk appetite statement by simply putting together sections 1 through 6 in a document and call it a risk appetite statement. If you positively have to write vacuous statements like “We have zero appetite for X” then just file that away and don’t let it distract you from the actual business of using risk management to support the necessary controlled risk taking that is vital for business and mission outcomes.

Bottom line: defining risk appetite is of little value if it doesn’t support business decision making. That should include balancing upside and downside - ensuring risks are taken for strategic objectives while capping the downsides. Above all, the expressions of risk tolerance should permit actual choices to be made and measurements to have meaning and useful escalations if there are deviations. Finally, there must be a process that tunes the limits and thresholds of specific risk measures based on actual outcomes, current risk profile and business / mission opportunities.

RISK & CYBERSECURITY

Thoughts from the Field