Causal diagrams and software engineering

32 Morendil 07 March 2012 06:23PM

Fake explanations don't feel fake. That's what makes them dangerous. -- EY

Let's look at "A Handbook of Software and Systems Engineering", which purports to examine the insights from software engineering that are solidly grounded in empirical evidence. Published by the prestigious Fraunhofer Institut, this book's subtitle is in fact "Empirical Observations, Laws and Theories".

Now "law" is a strong word to use - the highest level to which an explanation can aspire to reach, as it were. Sometimes it's used in a jokey manner, as in "Hofstadter's Law" (which certainly seems often to apply to software projects). But this definitely isn't a jokey kind of book, that much we get from the appeal to "empirical observations" and the "handbook" denomination.

Here is the very first "law" listed in the Handbook:

Requirement deficiencies are the prime source of project failures.

Previously, we observed that in the field of software engineering, a last name followed by a year, surrounded by parentheses, seems to be a magic formula for suspending critical judgment in readers.

Another such formula, it seems, is the invocation of statistical results. Brandish the word "percentage", assert that you have surveyed a largish population, and whatever it is you claim, some people will start believing. Do it often enough and some will start repeating your claim - without bothering to check it - starting a potentially viral cycle.

As a case in point, one of the most often cited pieces of "evidence" in support of the above "law" is the well-known Chaos Report, according to which the first cause of project failure is "Incomplete Requirements". (The Chaos Report isn't cited as evidence by the Handbook, but it's representative enough to serve in the following discussion. A Google Search readily attests to the wide spread of the verbatim claim in the Chaos Report; various derivatives of the claim are harder to track, but easily verified to be quite pervasive.)

Some elementary reasoning about causal inference is enough to show that the same evidence supporting the above "law" can equally well be suggested as evidence supporting this alternative conclusion:

Project failures are the primary source of requirements deficiencies.

continue reading »

Diseased disciplines: the strange case of the inverted chart

47 Morendil 07 February 2012 09:45AM

Imagine the following situation: you have come across numerous references to a paper purporting to show that the chances of successfully treating a disease contracted at age 10 are substantially lower if the disease is detected later: somewhat lower at age 20 to very poor at age 50. Every author draws more or less the same bar chart to depict this situation: the picture below, showing rising mortality from left to right.

Rising mortality, left to right

You search for the original paper, which proves a long quest: the conference publisher have lost some of their archives in several moves, several people citing the paper turn out to no longer have a copy, etc. You finally locate a copy of the paper (let's call it G99) thanks to a helpful friend with great scholarly connections.

And you find out some interesting things.

The most striking is what the author's original chart depicts: the chances of successfully treating the disease detected at age 50 become substantially lower as a function of age when it was contracted; mortality is highest if the disease was contracted at age 10 and lowest if contracted at age 40. The chart showing this is the picture below, showing decreasing mortality from top to bottom, for the same ages on the vertical axis.

Decreasing mortality, top to bottom

Not only is the representation topsy-turvy; the two diagrams can't be about the same thing, since what is constant in the first (age disease detected) is variable in the other, and what is variable in the first (age disease contracted) is constant in the other.

Now, as you research the issue a little more, you find out that authors prior to G99 have often used the first diagram to report their findings; reportedly, several different studies on different populations (dating back to the eighties) have yielded similar results.

But when citing G99, nobody reproduces the actual diagram in G99, they all reproduce the older diagram (or some variant of it).

You are tempted to conclude that the authors citing G99 are citing "from memory"; they are aware of the earlier research, they have a vague recollection that G99 contains results that are not totally at odds with the earlier research. Same difference, they reason, G99 is one more confirmation of the earlier research, which is adequately summarized by the standard diagram.

And then you come across a paper by the same author, but from 10 years earlier. Let's call it G89. There is a strong presumption that the study in G99 is the same that is described in G89, for the following reasons: a) the researcher who wrote G99 was by then already retired from the institution where they obtained their results; b) the G99 "paper" isn't in fact a paper, it's a PowerPoint summarizing previous results obtained by the author.

And in G89, you read the following: "This study didn't accurately record the mortality rates at various ages after contracting the disease, so we will use average rates summarized from several other studies."

So basically everyone who has been citing G99 has been building castles on sand.

Suppose that, far from some exotic disease affecting a few individuals each year, the disease in question was one of the world's major killers (say, tuberculosis, the world's leader in infectious disease mortality), and the reason why everyone is citing either G99 or some of the earlier research is to lend support to the standard strategies for fighting the disease.

When you look at the earlier research, you find nothing to allay your worries: the earlier studies are described only summarily, in broad overview papers or secondary sources; the numbers don't seem to match up, and so on. In effect you are discovering, about thirty years later, that what was taken for granted as a major finding on one of the principal topics of the discipline in fact has "sloppy academic practice" written all over it.

If this story was true, and this was medicine we were talking about, what would you expect (or at least hope for, if you haven't become too cynical), should this story come to light? In a well-functioning discipline, a wave of retractations, public apologies, general embarrassment and a major re-evaluation of public health policies concerning this disease would follow.

 

The story is substantially true, but the field isn't medicine: it is software engineering.

I have transposed the story to medicine, temporarily, as an act of benign deception, to which I now confess. My intention was to bring out the structure of this story, and if, while thinking it was about health, you felt outraged at this miscarriage of academic process, you should still feel outraged upon learning that it is in fact about software.

The "disease" isn't some exotic oddity, but the software equivalent of tuberculosis - the cost of fixing defects (a.k.a. bugs).

The original claim was that "defects introduced in early phases cost more to fix the later they are detected". The misquoted chart says this instead: "defects detected in the operations phase (once software is in the field) cost more to fix the earlier they were introduced".

Any result concerning the "disease" of software bugs counts as a major result, because it affects very large fractions of the population, and accounts for a major fraction of the total "morbidity" (i.e. lack of quality, project failure) in the population (of software programs).

The earlier article by the same author contained the following confession: "This study didn't accurately record the engineering times to fix the defects, so we will use average times summarized from several other studies to weight the defect origins".

Not only is this one major result suspect, but the same pattern of "citogenesis" turns up investigating several other important claims.

 

Software engineering is a diseased discipline.

 

 


The publication I've labeled "G99" is generally cited as: Robert B. Grady, An Economic Release Decision Model: Insights into Software Project Management, in proceedings of Applications of Software Measurement (1999). The second diagram is from a photograph of a hard copy of the proceedings.

Here is one typical publication citing Grady 1999, from which the first diagram is extracted. You can find many more via a Google search. The "this study didn't accurately record" quote is discussed here, and can be found in "Dissecting Software Failures" by Grady, in the April 1989 issue of the "Hewlett Packard Journal"; you can still find one copy of the original source on the Web, as of early 2013, but link rot is threatening it with extinction.

A more extensive analysis of the "defect cost increase" claim is available in my book-in-progress, "The Leprechauns of Software Engineering".

Here is how the axes were originally labeled; first diagram:

  • vertical: "Relative Cost to Correct a Defect"
  • horizontal: "Development Phase" (values "Requirements", "Design", "Code", "Test", "Operation" from left to right)
  • figure label: "Relative cost to correct a requirement defect depending on when it is discovered"

Second diagram:

  • vertical: "Activity When Defect was Created" (values "Specifications", "Design", "Code", "Test" from top to bottom)
  • horizontal: "Relative cost to fix a defect after release to customers compared to the cost of fixing it shortly after it was created"
  • figure label: "Relative Costs to Fix Defects"

Hacking Less Wrong made easy: Vagrant edition

28 Morendil 30 January 2012 06:51PM

The Less Wrong Public Goods Team has already brought you an easy-to use virtual machine for hacking Less Wrong.

But virtual boxes can cut both ways: on the one hand, you don't have to worry about setting things up yourself; on the other hand, not knowing how things were put together, having to deal with a "black box" that doesn't let you use your own source code editor or pick an OS - these can be offputting. To me at least, these were trivial inconveniences that might stand in the way of updating my copy of the source and making some useful tweaks.

Enter Vagrant - and a little work I've done today for LW hackers and would-be hackers. Vagrant is a recent tool that allows you to treat virtual machine configurations as source code.

Instead of being something that someone possessed of arcane knowledge has put together, a virtual machine under Vagrant results from executing a series of source code instructions - and this source code is available for you to read, review, understand or change. (Software development should be a process of knowledge capture, not some hermetic discipline where you rely on the intransmissible wisdom of remote elders.)

Preliminary (but tested) results are up on my Github repo - it's a fork of the offiical LW code base, not the real thing. (One this is tested by someone else, and if it works well, I intend to submit a pull request so that these improvements end up in the main codebase.) The following assumes you have a Unix or Mac system, or if you're using Windows, that you're command-line competent.

Hacking on LW is now done as follows (compared to using the VM):

  • The following prerequisites are unchanged: git, Virtualbox
  • Install the following prerequisites: Ruby, rubygems, Vagrant
  • Download the Less Wrong source code as follows: git clone git@github.com:Morendil/lesswrong.git
  • Enter the "lesswrong" directory, then build the VM with: vagrant up (may take a while)
  • Log into the virtual box with: vagrant ssh
  • Go to the "/vagrant/r2" directory, and copy example.ini to development.ini
  • Change all instances of "password" in development.ini to "reddit"
  • You can now start the LW server with: paster serve --reload development.ini port=8080
  • Browse the URL http://localhost:8080/
The cool part is that the "/vagrant" directory on the VM is mapped to where you checked out the LW source code on your own machine: it's a shared directory, which means you can use your own code editor, run grep searches and so on. You've broken out of the black box!
If you try it, please report your experience in the thread below.

 

[Link] How doctors die

20 Morendil 06 December 2011 01:22PM

I'm reposting this from HN's front page, because it brought up a non-cached thought on cryonics:

The patient will get cut open, perforated with tubes, hooked up to machines, and assaulted with drugs. All of this occurs in the Intensive Care Unit at a cost of tens of thousands of dollars a day. What it buys is misery we would not inflict on a terrorist. I cannot count the number of times fellow physicians have told me, in words that vary only slightly, “Promise me if you find me like this that you’ll kill me.” [...] I’ve had hundreds of people brought to me in the emergency room after getting CPR. Exactly one, a healthy man who’d had no heart troubles (for those who want specifics, he had a “tension pneumothorax”), walked out of the hospital.

In short, end-of-life medical care is often pointless, painful and costly; doctors and ER personnel know this so well that they go to great lengths to ensure it doesn't happen to them.

It seems as if our systems and conventions around end of life are designed to not let people have a say in how they spend their final moments, even when letting them have their way would result in significant savings (note the dollar figures quoted above). I've already speculated on why that might be, but I keep seeing that turn up in unexpected ways.

I suspect that this is the bigger obstacle to cryonics, not so much e.g. the lack of scientific proof. "Freeze me cheaply instead of spending insane amounts of money on brutal attempts at keeping me alive" sounds like a sensible thing to tattoo on your chest, but the evidence suggests that it wouldn't be honored any more than "DNR" tattoos.

[Link] Awesome interactive visualization article

17 Morendil 13 October 2011 12:46PM

"Up and Down the Ladder of Abstraction"

Have you seen something similar to explain Bayesian updating? If not, how would one go about doing that?

The rest of the site and in particular the "Kill Math Project" may also be of interest to LWers. Author Bret Victor, whose CV includes "designed the initial user interface concepts for the iPad", comes across overall as a particularly awesome fellow.

Book trades with open-minded theists - recommendations?

8 Morendil 29 August 2011 05:23AM

In an Open Thread comment beriukay mentioned that he's reading C.S. Lewis' Mere Christianity. I've been reading it too, for interesting reasons.

In my case it so happened that I started discussing faith with a long-time online friend whose spiritual views I didn't yet know, and he turned out to be a Christian with a high regard for the Bible, who also has an interest in science. As our discussion turned to our readings on spirituality, I acknowledged (I think it was me) that I probably spent more time on books that reinforce my point of view than on books that challenge it, perhaps a case of confirmation bias. (I've been exposed to many poor arguments for Christianity, and dismissed them; but possibly that was largely a function of having started out with that bottom line already written and picking arguments I wouldn't have much trouble refuting.)

In the spirit of experiment we agreed to a "trade" - he would read (thoughtfully and with an open mind) a book of my choosing on reasons to doubt faith, and I'd do the same with a book he chose on Christianity.

So the idea here is to pick a book that's the "best argument from the other side" (as in quote 3 here).

I recommended The God Delusion - I'm not sure if that's the best choice given the above intent, but it's what came to mind on the spot.

Would you make a different choice? If so, what?

[Link] The Myth of the Three Laws of Robotics

2 Morendil 10 May 2011 05:44PM

At SingularityHub. Promising title; disappointing content. Author proceeds by pure perceptual analogy with the Asimovian Three Laws alluded to; argues that the mere possibility of self-modification renders AI uncontrollable - without considering the possibility of fixed points in the goal computation. ("Do you really think it can be constrained?" - i.e. argument from limited imagination.)

[Link] John Baez interviews Eliezer

15 Morendil 07 March 2011 07:49AM

[Link] Cool or creepy?

2 Morendil 31 January 2011 05:31PM

Why is this collection of vat-brains described as "cool" when cryonics - frozen, severed heads - is described as "creepy"?

Rationality quotes: October 2010

4 Morendil 05 October 2010 11:38AM

This is our monthly thread for collecting these little gems and pearls of wisdom, rationality-related quotes you've seen recently, or had stored in your quotesfile for ages, and which might be handy to link to in one of our discussions.

  • Please post all quotes separately, so that they can be voted up/down separately.  (If they are strongly related, reply to your own comments.  If strongly ordered, then go ahead and post them together.)
  • Do not quote yourself.
  • Do not quote comments/posts on LW/OB.
  • No more than 5 quotes per person per monthly thread, please.

View more: Prev | Next