On the SIRA mailing list, we are discussing the age-old risk equation “Risk = Threats x Vulns x Impact (or Consequences).” A number of folks think it is nonsense. Here’s why I don’t. (Email to SIRA mailing list).
Before I get into this, I should re-acknowledge that I believe there are better methods to measure/evaluate risk, and I fully subscribe to their development. However, I am looking for evolution not revolution – Geoffrey Moore pointed out the challenges of disruptive innovation in “Crossing the Chasm” many years ago and I agree wholeheartedly. Evolution to me means slightly modifying existing approaches in beneficial ways. That is why a few of us are developing the Tech Risk Mgt Maturity Model.
So my goal is, essentially, to be “better than existing practices in techrisk mgt” – I am looking for marginal utility.
I also believe that resources are scarce and that every time infosec/techrisk folks make decisions about allocating them they are revealing preferences that are measurable in very coarse ways. Even though the existing models are seen as “qualitative” we can create control horizons and conduct breakeven analysis in ways to tease out some thresholds at the very least.
Now, to answer the questions:
- Pr(t): Yes, “the probability that a (sufficiently capable) threat actor will attack the system of interest” characterizes my belief well. And since I go out of my way to remind liability-minded folks that the intelligent adversary makes our situation much different from the “acts of god” kinds of hazard, I should acknowledge the non-randomness of the threat… but I am not ready to do that, exactly… for the same reasons as the “random walk down Wall Street” problem – easy to assert non-randomness yet hard to show otherwise.
Here is my thought process:
a) If it isn’t random, it should be predictable; and
b) if it isn’t predictable, then it approximates randomness (especially in the aggregate).
c) Since we can’t predict threat (afaik) then we should be evaluating any model compared to random, so
d) random is a good place to start.
There are many ways to approach how to determine Pr(t) – could be degrees of belief, could be public data (real-time blacklists, etc.), could be based on historical data, could be something else. My favorite application is a simple comparison of two scenarios. I don’t even quantify – just look at the accessibility of the two “systems of interest” and determine which one is higher (compare, say, a bluesnarf attack that requires local proximity to a sql injection that can happen from anywhere; or assess the diff in wi-fi attacks btwn being in the city and in the country). I come up with higher Pr(t) for the latter and the former in my two examples. (It may also be useful to factor in attacker’s costs in the first example).
- Pr(v): This is difficult to characterize, but I think of it more as “the probability that a system of interest will be attacked, and that the attack will succeed [within some time period].” While I agree that any non-trivial system is vulnerable in a theoretical sense, it does not appear that every system is compromised (and I think that “two kinds of orgs – those that are compromised and those that don’t know it yet” *is* closer to nonsense than r=t*v*i). Whether there is an over-abundance of targets, the attacker costs are too high, the control environment is sufficiently strong, or some other reason, not all systems are in a compromised state and so it is worthwhile to measure. It is especially important since the bulk of our defensive efforts revolve around reducing this probability.
Again, estimating Pr(v) can be done in similar ways as Pr(t). In my comparative analysis – I look at things like number of users (as vulns), size of code base, number of open ports, RASQ, etc…
- It is worth discussing why breaking down Pr(event) into Pr(t)*Pr(v) is beneficial. For the most part, I would actually prefer to simply use Pr(event) if we have enough information (historical data). For example, I think we have pretty good data on email-borne attacks and so I wouldn’t be working too hard on assessing ‘t’ and ‘v’ there, though the McColo takedown can show how much of an impact a change in ‘t’ can have.
Maybe the biggest reason is that the respective populations are different and can change drastically. Here are some use cases:
a) One of the better uses is to compare two scenarios/architectures. Banking from smartphone vs. laptop; moving to cloud from internal; determining risk btwn WEP vuln and remote Windows vuln; etc…
b) Acknowledge that if ‘t’ or ‘v’ is 0, then Pr(event) is 0. Though it is hard to conceive of a case where ‘v’ is 0, we can see ‘t’ approaching it in lots of PoCs.
c) Showing the significance of ‘t’ or ‘v’ as its partner approaches 1. I agree that ‘v’ is essentially 1.0 so why do we spend all our time on it? Maybe we should be doing other things… this is also why I think the move towards threat intel is so important.
d) To help folks see how changes in populations of either ‘t’ or ‘v’ might affect each other, and ultimately risk. Like the McColo takedown, bounties (on malware writers and bugs), etc. My favorite use may be pointing out that vuln disclosure does nothing to ‘v’ since it was already there; the impact is on ‘t.’
To round things out, all this is “good enough” at the level of precision we are working at, and “better than” existing practices, IMO.