Rocket science and big money - a cautionary tale of math gone wrong

Morendil

The 2006 report from NASA's "Independent Verification and Validation Facility" makes some interesting claims. Turning to page 6, we learn that thanks to IV&V, "NASA realized a software rework risk reduction benefit of $1.6 Billion in Fiscal Year 2006 alone". This is close to 10% of NASA's overall annual budget, roughly equal to the entire annual budget of the International Space Station!

If the numbers check out, this is an impressive feat for IV&V (the more formal big brother of "testing" or "quality assurance" departments that most software development efforts include). Do they?

Flaubert and the math of ROI

Back in 1841, to tease his sister, Gustave Flaubert invented the "age of the captain problem", which ran like this:

A ship sails the ocean. It left Boston with a cargo of wool. It grosses 200 tons. [...] There are 12 passengers aboard, the wind is blowing East-North-East, the clock points to a quarter past three in the afternoon. It is the month of May. How old is the captain?

Flaubert was pointing out one common way people fail at math: you can only get sensible results from a calculation if the numbers you put in are related in the right ways. (Unfortunately, math education tends to be excessively heavy on the "manipulate numbers" part and to skimp on the "make sense of the question" part, a trend dissected by French mathematician Stella Baruk who titled one of her books after Flaubert's little joke on his sister.)

Unfortunately, NASA's math turns out on inspection to be "age-of-the-captain" math. (This strikes me as a big embarrassment to an organization literally composed mainly of rocket scientists.)

The $1.6 billion claimed by NASA's document is derived by applying a ROI calculation: NASA spent $19 million on IV&V services in 2006, and the Report further claims that IV&V can be shown to have a 83:1 ROI (Return on Investment) ratio. Thus, $19M times 83 gives us the original $1.6 billion. (The $19M is pure personnel cost, and does not include e.g. the costs of the building where IV&V is housed.)

What is Return on Investment? Economics defines it as the gain from an investment, minus the cost of investment, divided by (again) the cost of investment. An investment is something you spend so as to obtain a gain, and a gain is something caused by the investment. This isn't rocket science but basic economics.

But how does NASA arrive at this 83:1 figure?

NASA IV&V's math

NASA relies on the widespread claim that in software efforts, "fixing a bug later costs more". Specifically, it focuses on the costs of fixing software defects (as they're more formally known) at the various "phases" often said to compose a project: requirements, design, coding, unit test, system test, or "in the field". For instance, it supposedly costs on average 200 times as much to fix a defect in the field than it does at the Requirements stage. (I have debunked that claim elsewhere, but it does yet enjoy a relatively robust status within academic software engineering, so we can't fault NASA for relying on it back in 2006.)

NASA counted 490 "issues" that IV&V discovered at the requirements stage of the Space Shuttle missions, during some unspecified period between 1993 (the founding of the IV&V Facility) and 2006. (An "issue" is not the same as a defect, but for the time being we will ignore this distinction.) To this, NASA adds 304 issues found between 2004 and 2006 in other ("Science") missions. (We are also told that this analysis includes only the most "severe" issues, i.e. ones for which a work-around cannot be found and which impair a mission objective.)

We can verify that (490+304)*200 = 158,000, which NASA counts as the "weighed sub-total" for Requirements; adding up the somewhat smaller totals from other phases, NASA finds a total of 186,505.

NASA also adds up the number of issues found during all phases, which is 2,239. We can again verify that 186,505 / 2,239 = 83 and some change.

How old is the captain?

Now, the immediate objection to this procedure is that an ROI calculation involves dollars, not numbers of "issues". ROI is a ratio of money gained (or saved) over money invested, and while you can reasonably say you've "saved" some number of issues it's silly to talk about "investing" some number of issues.

We will want to "steel-man" NASA's argument. (This is the opposite of a "straw man", an easily knocked down argument that your interlocutor is not actually advancing, but that you make up to score easy points.) We will be as generous with this math as we can and see if it has even a small chance of holding up.

To rescue the claim, we need to turn issues into dollars. Let us list the assumptions that need to hold for NASA's calculations to be valid:

there is some determinate average cost to detecting an issue
there is some determinate average cost to fixing an issue
if an issue is not detected at the earliest opportunity, it always ends up being detected "in the field" and its repair cost is the maximum

The first two assumptions give our steelman attempt some leeway; not all issues need to cost the same to detect, but it has to make sense to talk about the "average cost of detecting an issue". Mathematically, this implies that the cost of fixing an issue obeys some well-behaved function such as the famous "bell curve". (However, there are some functions for which it makes no sense, mathematically, to speak of an average: for instance some "power law" curves. These are distributions often found to describe, for instance, the size of catastrophes such as avalanches or forest fires; no one would be very surprised to find that defect costs in fact follow a power law.)

The third assumption makes things even more problematic. NASA's calculations are based on hypotheticals: what if we used different assumptions, for instance that an "issue" in Requirements has a good likelihood of being found by NASA's diligent software engineers in the design phase? If all issues detected by IV&V in Requirements had been fixed in Design, then the ratio would only be about 5:1 (that is, the ratio between 200:1 and 40:1). Using a similar procedure for the other phases, we would find a "ROI" of less than 3:1. This isn't to say that my assumption is better than NASA's, but merely to observe that the final result is very sensitive to this kind of assumption.

However, we may grant that it is in NASA's culture to always assume the worst case. And anyway "up to $1.6 billion" is almost as impressive as "$1.6 billion", isn't it?

Eighy-three! For some value of eighty-three.

If we do accept all of NASA's claim, then an "issue" costs on average about $9K to detect. (As a common-sense check, note that this on the order of one person-month, assuming a yearly loaded salary in the $100K range. That seems a bit excessive; not a slur on NASA's competence, but definitely a bad knock for the notion that "averages" make sense at all in this context.)

However, note that NASA's data is absolutely silent on how much the same issues cost to fix. Detecting is IV&V's job, but fixing is the job of the software engineers working on the project.¹

NASA is therefore reporting on the results of the following calculation...

ROI = (Savings from IV&V - Actual cost of IV&V) / Actual cost of IV&V

where

Savings from IV&V = Hypothetical cost of fixing defects without IV&V - Actual cost of fixing defects

...and the above cannot be derived from the numbers used in the calculation - which are 1) counts of issues and 2) actual IV&V budget. Even if we do grant an 83:1 ratio between the hypothetical cost of fixing defects (had IV&V not been present to find them early) and the actual cost of fixing, we are left with an unknown variable - an unbound x in the equation - which is the average cost of fixing a defect.

This, then, is the fatal flaw in the argument, the one that cannot be steel-manned and that exposes NASA's math for what it is - Flaubert-style, "age of the captain" math, not rocket science.

¹ - Relatedly, an "issue" is just an observation that something is wrong, whereas a "defect" the thing software developers fix; it's entirely possible for several "issues" related to one "defect" to be corrected simultaneously by the same fix; NASA's conceptual model grossly oversimplifies the work relationship between those who "validate and verify" and those who actually write the software.

Acknowledgements

Thanks to Aleksis Tulonen, a reader of my book, for finding the NASA document in the first place, and spotting the absurdity of the ROI calculation.

and by an elementary reasoning known in physics as "dimensional analysis", dividing a number of issues by another number of issues cannot give us an ROI

This is just being nit-picky, but from a dimensional analysis point of view, both "dollars per dollar" and "issues per issue" are dimensionless figures, and are thus in fact the same dimension.

While writing I was wondering if I should clarify that or if my meaning would come through even if I was somewhat imprecise - thanks for settling that.

My point here is that ROI is a ratio of something gained (or saved) over something invested, and while you can reasonably say you've "saved" some number of issues it's silly to talk about "investing" some number of issues.

It doesn't really matter for the rest of the argument, since the steelman tries to reconstruct investments and gains from the numbers given, but I've amended my sentence to say "something similar to dimensional analysis" instead.

What you said in the above comment is not what you wrote in the article. I'd encourage you to rewrite that section to be what you said here, as its a valid argument but one that's very different from what you wrote. And for me at least, your dimensional analysis point made me stop and go "huh?" and now I'm reading comments instead of the rest of your otherwise quite interesting article.

Thanks for the additional prod towards clarity. Removed mention of dimensional analysis altogether and updated with the content of the comment I wrote to defend the weak spot. (It's galling, but this is a technique I actually try to teach others from time to time - when you feel the need to write something in defense of your writing, put that into the original piece instead.)

There are cases in which you can relate dimensionless units. For instance, moles is a dimensionless unit, it just means times 6.022*10^23. But you can relate moles to moles in some cases, for instance with electrolysis. If you know how many electrons are being pumped into a reaction and you want to know how much Fe(II) becomes Fe, then you can compare moles of electrons to moles of Iron, even though neither moles, elements, or electrons can be related directly to one another in the conventional sense of m/s. In the same way one can relate dollars of one thing to dollars of another and get a meaningful answer.

You are right to point this out though, it is skirting very close to the gray areas of dimensional analysis without being explicitly mentioned as doing so.

"Issues" are kind of dimensionless already.

Bugs incorporated early have a high chance of being rendered moot because the functionality they were supposed to implement ends up being cut altogether.

Put differently (and, to my ear, more intuitively), time spent fixing bugs early has a high chance of being wasted because the part you fixed may not be needed after all.

Which is my experience, too, but then again I'm not writing software for space ships. I had the impression that space programs tend to be very well[1] planned in advance, and requirements change little once stuff actually starts to get built. (I'm saying this pretty much as an ignorant outsider, hoping to be corrected if I'm wrong.)

[1: Or at least meticulously, though looking at the apparent difficulty of the problems and the success rate I'd think "well" is deserved more often that in most endeavors.]

Granting that there is a "Flaubert's captain problem" in one of NASA's 2006 budget reports... now what? Is there some personally applicable upshot we can derive from it? What was your larger rhetorical point?

I could imagine someone making all the factual claims you've raised in order to prove the larger point that math and budgets are flexible and learning to plan and reason precisely are not that big a deal because no one who matters bothers to check the actual details and thus what really matters is something like getting along with influential and powerful people... Is that what you wanted to show here?

I could also imagine someone making all the factual claims you've raised here to show that corruption and/or incompetence was rampant in a science oriented public institution seven years ago and thus that citizens who contribute to that institution have a moral duty to respond somehow... but if that was your rhetorical goal then the call to action seems to be missing? And in the meantime the sense of moral outrage has been fanned somewhat, and now there's no outlet. Was inducing generalized angst against NASA your rhetorical goal?

I'm moderately friendly to the basic point being made that "someone in a high place was wrong at one time!" if there is admission of limits, but this article seems to be presented as something self contained but feels to me like half the story at best. If all you want to say is that something is wrong then I guess I'm cool with that being all you're saying, but I'd like you to admit it at the end so that, as a reader, I'm not left waiting for the other shoe to drop.

Thanks for the feedback. Here's an attempt below at responding; please let me know what you think, I might incorporate it into the piece.

What was your larger rhetorical point?

"Big bucks" claims like NASA's are the intellectual equivalent of schoolyard bullying: they use their reputation and bluster to grab your lunch money, that is, your assent to the claim that Independent Validation and Verification has very high expected value.

Always fact-check and logic-check claims, even when the source has a formidable reputation; certain domains, such as software engineering, are particularly rife with bogus claims and failures of critical thinking from smart people; quantitative claims in particular are often easy to fact-check and logic-check, so that a even self-taught smartass like me (the unathletic weakling in the schoolyard) can stand up to the biggest bullies. "If I can do it, so can you".