You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

IlyaShpitser comments on Question about application of Bayes - Less Wrong Discussion

0 Post author: RolfAndreassen 31 October 2012 02:35AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread.

Comment author: IlyaShpitser 01 November 2012 05:51:46AM *  1 point [-]

If you are talking about what "causes" what, maybe you should think about the problem causally first, not as a standard hypothesis testing problem (although hypothesis testing may reappear later). What does it mean for a line to be a cause of bad behavior? What is "a causal effect"? Often people formalize causal effect as a difference of means between the "control group" and a "test group." In your case the control group is the original program, and the test group is the program where you intervened to comment out the offending line, say line 500.

You have program output O, which is either good (o1) or bad (o2). You have the original program, where nothing was done to it, where you get crashes sometimes P(O = o2) > 0. You also have an altered program, where you intervened to comment out line 500, say, where P(O = o2 | do(line500 = noop)) = 0.

The statistic you want is, for example, E(O) - E(O | do(line500 = noop)). If this statistic is not zero, we say line 500 has a causal effect on the crash.

Since you can just intervene directly in your system, you can just gather enough samples of this statistic to figure out whether there is a causal effect or not. In systems that are not computer programs, people often cannot intervene directly, and so resort to "trickery" to get statistics like the above mean difference.


If this seems simple, that's because it is. This setup mirrors how people actually debug -- they intervene in systems and compare with "the test group," sometimes doing multiple runs if the bug is a "Heisenbug."


There is also the issue of whether you can really treat the outputs of repeated program runs as iid samples. Sometimes you can, often you cannot, as other posters pointed out.