Irgy comments on Question about application of Bayes - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (30)
Well, the problem is that you have uncertainty over the probability of the code crashing with or without fixing the particular bug you're looking for. What you need, in order to apply Bayes, is:
P(bug on that line| E1, E2) = P(E1, E2 | bug on line) P(bug on that line) / (P(E1, E2)
P(bug on that line) = b
P(E1, E2) = P(E1, E2 | bug on line) + P(E1, E2 | bug elsewhere)
P(E1, E2 | bug on line) = P(E1)P(E2 | bug on line) (since the 2 crashes out of 7 are independent of the bug location)
P(E1, E2 | bug elsewhere) = P(E1)P(E2 | bug elsewhere) (as above)
P(E2 | bug on line) = 1
Given a frequency of crashing 'f', P(E2 | f, bug elsewhere) = (1-f)^10
P(E1|f) = f^2 * (1-f)^5
So, then you need to integrate over all possible values of 'f': P(E1) = Integral over [0,1] of: P(E1|f)pr(f)df
P(E2 | bug elsewhere) = Integral over [0,1] of: P(E2 | f, bug elsewhere)pr(f)df
That's everything you need, the rest is just picking those priors and integrating back up the line. Of course the results are only as good as the priors. A much easier solution is: "The chance of a crash appears to be about 2/7. The chance of getting 10 non-crashes is (5/7)^10 ~= 3.5% " Note that this is not the same as the above, it's an approximation, but it's probably going to be just as good as doing it the hard way.
Incidentally, you need to be aware that, particularly with intermittant bugs, just because commenting out a line stops the crash (even when you're 100% sure of the correlation) that doesn't mean the line itself is the problem. Bugs can be absolutely pathological. For example, if the problem is, say, freeing the same memory twice, then any line that calls a lot of memory allocations will increase the frequency of crashes even if the problem is actually earlier in the code. Also, if the bug is overrunning the end of an array, taking out any line can have a chaotic effect on the optimiser, moving the relative locations of things in memory around and causing the bug to disappear without fixing it (only for it to reappear later). On a simpler level, taking a line out might change the execution path avoiding the bug without fixing it. There seems to be no end to ways in which impossible seeming things can happen in computer code.
This is very true. I simplified the information a bit because I was posting about the math as a matter of intellectual curiosity, not to get help debugging. I have a model of what was causing the crash that I find reasonably convincing, which I outlined in my response to jimrandomh, below. So, while it's a real-world problem, for purposes of math we can assume that the effect of commenting out the line is an indication of a point bug, as it were. I found the Bayes confusing anyway, so there's no need to complexify further. :)
Thanks for the math answer; I need to think about it carefully to absorb it fully, but I thought I'd respond to the programming answer first.