One of the papers at the Workshop on the Economics of Information Security (WEIS '09) last month was "Modelling the Security Ecosystem – the Dynamics of (In)Security" by Frei, et.al. The paper does an excellent job of defining the state changes and interval periods throughout the vulnerability and exploit lifecycles. They also create scatterplots for discovery, exploit, and patch intervals relative to disclosure date.
I think it is worth clarifying one aspect of the paper that is (unintentionally) deceptive. In their attempt to compare availability of exploits and patches over time, the authors use different sampling techniques and populations that are not representative of the whole pool of vulnerabilities. This skews the data and ultimately leads to an incorrect conclusion.
On the surface, it makes sense to use vulnerabilities with known exploit dates as the sample population when analyzing exploit timing. And the authors correctly point out that the number of exploits they use is a minimum number. However, when they use the number as a ratio using the smaller "known-exploit" population in the denominator, and not the overall population of vulnerabilities, it is no longer conservative. In fact, because the ratio over time reaches 100% (inherent to the sample technique), it becomes a maximum ratio.
To illustrate: By my estimate, approximately 40% of the vulnerabilities in the time period being analyzed had known exploits. So conservatively, any distribution function should top off at 40% over time. Instead, the authors discuss numbers reaching "94% 30 days after disclosure."
The fatal flaw here is that the authors assume that there is an exploit for every single vulnerability. This is true for the sample they used, but it is unlikely to be true for the total population of vulnerabilities being assessed. More importantly, it is certainly not a minimum number.
The problem is exacerbated when the authors perform a similar analysis on patch data but select a different sample (top ten vendors) in conducting their analysis. I think it is reasonable to assume that these vendors are either more responsive or less responsive than the overall population of vendors, but I have no reason to believe it is representative.
The final issue here is that after conducting similar analyses on exploits and patches using different, non-random, sample populations, the authors then compare the two results: "We found that expoit availability has consistently exceeded patch availability since 2000." Given the weaknesses in methodology, this is a faulty conclusion.
The work is important and useful, but it needs to be redone using appropriate populations if it is going to be accurate.
It might be worth defining what you mean by “exploit” in this context. It seems that the definition of a vulnerability is that it is a bug or flaw that can be exploited in a certain manner, but that the ease of exploitation can range from very easy to very hard. If I’m reading correctly, you’re talking more about “easily” or “readily” exploited vulnerabilities, or maybe even that a working exploit is in the wild (as opposed to one being academic). fwiw.
@Ben -
I am using the definitions (explicit or implicit) directly from the paper. Here is what the paper said: “An exploit is a piece of software, a virus, a set of data, or sequence of commands that takes advantage of a vulnerability in order to cause unintended or unanticipated behavior to occur in software or an embedded device. Proof-of-concept code or exploits provided within security research and analysis tools are also deemed exploits.”
The way they chose the population is simply by using all listings from the data sources that had exploit data listed.