Estimate: Lines of Code in the World

I am curious about how many lines of code there are in the Internet-accessible World codebase. (I will purposely ignore all sorts of systems that aren’t connected because 1- that makes the estimate even harder; and 2- currently, the code is not vulnerable to Internet worms, by definition). Here are three ideas for estimating this number:

  1. Guess about 3 trillion LOC. I think there was an article around Y2K that said there was something like a trillion LOC then, so I am going to guess it has tripled.
  2. Estimate average lines of code under management for top 200 software manufacturers (my guess: 1 billion each, for a total of 200 billion LOC) plus custom software owned/used by Global 2000 (100 million is my estimate, for another 2 trillion LOC) for a total of 2.2 trillion lines of code in the world.
  3. Estimate the number of software developers in the world and multiply by the average number of lines of code "under management" per developer. I think I had seen somewhere that there were something like 600,000 developers in the U.S. alone, and let’s estimate 100,000 LOC per developer, then we end up with 60 billion LOC, and let’s quadruple that to incorporate the rest of the world, to 240 billion. Obviously, very different from the 2.2 trillion, but perhaps a more accurate count of "active" code rather than existing code.

So, why do I care? Well, to estimate the number of vulnerabilities that exist in the world, of course. If we know the world’s codebase, and use some ratios of vulnerabilities per lines of code, then we can make the estimate. After that, we can factor in the vulnerabilities that have been found to date.

It gets even more interesting if you factor in the number of LOCs being added to the world’s codebase every day. Let’s back into it to illustrate: in 2003, we found about 10 vulnerabilities per day. If there are .1 vulnerabilities per KLOC (thousand lines of code), we would need to be limiting our development to 100,000 LOC per day in order to be finding all vulnerabilities. Alternatively, we are creating many more vulnerabilities than we are finding, so we are losing ground every day.