I have successfully confused myself about probability again. 

I am debugging an intermittent crash; it doesn't happen every time I run the program. After much confusion I believe I have traced the problem to a specific line (activating my debug logger, as it happens; irony...) I have tested my program with and without this line commented out. I find that, when the line is active, I get two crashes on seven runs. Without the line, I get no crashes on ten runs. Intuitively this seems like evidence in favour of the hypothesis that the line is causing the crash. But I'm confused on how to set up the equations. Do I need a probability distribution over crash frequencies? That was the solution the last time I was confused over Bayes, but I don't understand what it means to say "The probability of having the line, given crash frequency f", which it seems I need to know to calculate a new probability distribution. 

I'm going to go with my intuition and code on the assumption that the debug logger should be activated much later in the program to avoid a race condition, but I'd like to understand this math. 

New to LessWrong?

New Comment


30 comments, sorted by Click to highlight new comments since:

Well, the problem is that you have uncertainty over the probability of the code crashing with or without fixing the particular bug you're looking for. What you need, in order to apply Bayes, is:

  • A prior model 'pr(f)' of the frequency of the code crashing with the bug present. Which is a distribution over crash frequencies. 1/(f(1-f)) is an ignorance prior that might do the job.
  • I'd assume the probability of the code crashing with the bug fixed is 0, even if that's not strictly true (since there could be other bugs).
  • A prior likelihood 'b' of the bug being on that line of code. This could depend (for instance) on how you identified that particular line to comment out in the first place. I'll call your evidence from running the program E1 (2 crashes out of 7) and E2 (10 runs no crashes)

P(bug on that line| E1, E2) = P(E1, E2 | bug on line) P(bug on that line) / (P(E1, E2)

P(bug on that line) = b

P(E1, E2) = P(E1, E2 | bug on line) + P(E1, E2 | bug elsewhere)

P(E1, E2 | bug on line) = P(E1)P(E2 | bug on line) (since the 2 crashes out of 7 are independent of the bug location)

P(E1, E2 | bug elsewhere) = P(E1)P(E2 | bug elsewhere) (as above)

P(E2 | bug on line) = 1

Given a frequency of crashing 'f', P(E2 | f, bug elsewhere) = (1-f)^10

P(E1|f) = f^2 * (1-f)^5

So, then you need to integrate over all possible values of 'f': P(E1) = Integral over [0,1] of: P(E1|f)pr(f)df

P(E2 | bug elsewhere) = Integral over [0,1] of: P(E2 | f, bug elsewhere)pr(f)df

That's everything you need, the rest is just picking those priors and integrating back up the line. Of course the results are only as good as the priors. A much easier solution is: "The chance of a crash appears to be about 2/7. The chance of getting 10 non-crashes is (5/7)^10 ~= 3.5% " Note that this is not the same as the above, it's an approximation, but it's probably going to be just as good as doing it the hard way.

Incidentally, you need to be aware that, particularly with intermittant bugs, just because commenting out a line stops the crash (even when you're 100% sure of the correlation) that doesn't mean the line itself is the problem. Bugs can be absolutely pathological. For example, if the problem is, say, freeing the same memory twice, then any line that calls a lot of memory allocations will increase the frequency of crashes even if the problem is actually earlier in the code. Also, if the bug is overrunning the end of an array, taking out any line can have a chaotic effect on the optimiser, moving the relative locations of things in memory around and causing the bug to disappear without fixing it (only for it to reappear later). On a simpler level, taking a line out might change the execution path avoiding the bug without fixing it. There seems to be no end to ways in which impossible seeming things can happen in computer code.

Bugs can be absolutely pathological.

This is very true. I simplified the information a bit because I was posting about the math as a matter of intellectual curiosity, not to get help debugging. I have a model of what was causing the crash that I find reasonably convincing, which I outlined in my response to jimrandomh, below. So, while it's a real-world problem, for purposes of math we can assume that the effect of commenting out the line is an indication of a point bug, as it were. I found the Bayes confusing anyway, so there's no need to complexify further. :)

Thanks for the math answer; I need to think about it carefully to absorb it fully, but I thought I'd respond to the programming answer first.

Is this program written in C or C++, by any chance? In that case, you can't conclude anything from probabilistic data - because you're probably hunting memory corruption, commenting out lines affects memory layout, and memory layout affects the symptoms of memory corruption. Actually, this problem is pretty general; anything that looks like a probabilistic software crash needs a source of entropy to randomize it, and usually this "entropy" will be in the form of a fragile dependency on timing or memory arrangement which interacts with attempts to measure it in impossibly confusing ways. (This is why using C and C++ is usually a terrible idea, and why threads are so feared.)

[-][anonymous]30

(This is why using C and C++ is usually a terrible idea, and why threads are so feared.)

I was with you up to this parenthetical. The same problems affect every other programming language, though perhaps with different root causes. Race conditions are hardly language-dependent.

The probability of getting a race condition, given that you're programming in Haskell, is orders of magnitude lower than the probability of getting a race condition, given that you're programming in C or C++.

That may well be true. I do have some other constraints, though. For example, does Haskell have a well-supported GUI library that works on Windows and talks to OpenGL?

Haskell requires bending your brain in ways that are often uncomfortable to non-mathematicians. I, personally, use Python, which strikes a balance between race avoidance and library support. I meant more to argue that race conditions are language-dependent than to recommend Haskell as a perfect programming language for any purpose.

Not different root causes, fewer root causes. Threading problems and race conditions exist in all languages, but memory corruption does not; the majority of languages won't let you do things like write out of bounds or use after free.

[-][anonymous]40

Yes, memory corruption in particular is no longer an issue in most higher languages, but memory management in general has become more complicated and less transparent. The Law of Leaky Abstractions ruins everything.

In any case this is reasonably off-topic for this post, so I won't belabor the point further.

Yes, it's C++. I'm reasonably convinced the problem was a race condition in initialising the GUI; it appears that the debug logger would try to write to a text field that might or might not exist, probably because a different thread was creating it. (I don't have full control of the threading since I'm using Qt for my GUI stuff, so exactly when my code is called is unfortunately a bit black-box-ish.) I believe I resolved the issue by introducing a special debug stream for use during startup, which doesn't try to write to the GUI but instead writes to a log file.

Please note that there are tools to track such memory corruption issues, like Valgrind. Using C and C++ isn't such a bad idea in itself (for some tasks, it is arguably the most appropriate language) but using them without using the related tools to track memory corruption is.

In a Unix environment that would have been my first thought. However this is Windows. (Yes, yes, C++ on Windows, my geek penis just shrank two inches, whatever.) As far as I could tell from a quick Google, valgrind isn't supported on Windows. (I'll be delighted if anyone tells me this is wrong.)

gdb is supported on Windows, and I did use that; but the program obstinately refused to crash in the debugger, presumably because the timing changed. Race conditions for the win...

If you are talking about what "causes" what, maybe you should think about the problem causally first, not as a standard hypothesis testing problem (although hypothesis testing may reappear later). What does it mean for a line to be a cause of bad behavior? What is "a causal effect"? Often people formalize causal effect as a difference of means between the "control group" and a "test group." In your case the control group is the original program, and the test group is the program where you intervened to comment out the offending line, say line 500.

You have program output O, which is either good (o1) or bad (o2). You have the original program, where nothing was done to it, where you get crashes sometimes P(O = o2) > 0. You also have an altered program, where you intervened to comment out line 500, say, where P(O = o2 | do(line500 = noop)) = 0.

The statistic you want is, for example, E(O) - E(O | do(line500 = noop)). If this statistic is not zero, we say line 500 has a causal effect on the crash.

Since you can just intervene directly in your system, you can just gather enough samples of this statistic to figure out whether there is a causal effect or not. In systems that are not computer programs, people often cannot intervene directly, and so resort to "trickery" to get statistics like the above mean difference.


If this seems simple, that's because it is. This setup mirrors how people actually debug -- they intervene in systems and compare with "the test group," sometimes doing multiple runs if the bug is a "Heisenbug."


There is also the issue of whether you can really treat the outputs of repeated program runs as iid samples. Sometimes you can, often you cannot, as other posters pointed out.

If the example is intended to be concrete rather than abstract... First understand what that line does. Shotgun debugging is seldom a good idea.

Sure. Although I was inspired to think about this Bayesian problem by an issue I was having with coding, I'm not asking about the coding problem, which I have got under control. I'm asking about the math. So it's an abstracted version of a problem I was concretely having, but the concrete problem is solved.

I'd recommend using the beta distribution. I'd recommend Jeffrey's prior (beta(1/2,1/2)), though I don't fully understand it.

I've only ever used it to figure out the probability of getting heads on the next step, but you could just multiply these together to get the probability of the sequence, so it's 0.5/1x1.5/2x2.5/3x3.5/4x4.5/5x4.5/6x4.5/7 if that is where the bug is (since it failed five times then passed twice, or might as well have since order doesn't really matter, and clearly it will pass the next ten) and 0.5/1x1.5/2x2.5/3x3.5/4x4.5/5x4.5/6x4.5/7x5.5/8x6.5/9x...x14.5/17 if you didn't find it. The first seven terms will cancel, so you just get that it's 5.5/8x...x14.5/17 = (14.5!/4.5!)/(17!/7!) = 0.0906 times as unlikely if there's no bug.

I'm afraid I don't quite see how to apply this to the problem. The beta distribution is presumably a probability, but what is it a probability of? Is there an interpretation to its two parameters that I'm not seeing?

It's a probability distribution of probabilities. You don't know how likely it is that the program crashes given that there's no bug. You just know that it crashed two out of seven times you ran it. If you start with a beta distribution for how likely it is to have different probabilities of crashing, you'll have an easy-to-calculate beta distribution for how likely it is after, and it's easy to calculate exactly how likely it is to crash given that distribution of probabilities.

Let me see if I understand what you are saying. You seem to be analogising the problem to detecting whether a coin or die is biased? That is, a biased die has some frequency of coming up 6 which is not one-sixth; and you are proposing beta(0.5, 0.5) as my prior distribution for that frequency. I roll the die 100 times, getting some number of sixes; for each point in the [0, 1] space of possible frequencies, I do Bayes. This presumably concentrates my probability around some particular frequency, and if that frequency is different from 0.16667 I say the die is biased.

Now, the analogy is that commenting out the line is equivalent to removing a piece of gum from the die. So I perform the test twice, once with and once without the gum; and if the results differ then I say that the gum changed the bias, and if the un-gummed die is fair then the gum caused the bias. Or, going back to the program, the mean of my probability distribution for the crash frequency in the presence of the line may be high relative to the mean without the line, and that's what we mean by "We think this line is causing the crash". Right?

So, if I understand correctly, you are proposing the same math as gjm did below, but suggesting a specific prior

P(p, q) = beta(p; 0.5, 0.5) beta(q; 0.5, 0.5).

Is that correct?

No. My way was assuming that it either crashes exclusively because of that line or exclusively because of something else. Furthermore, I only gave priors for if it's given which it's doing.

Let A = that line is the cause of the crash.

Let q!=x mean that this is true for any value of q besides x.

P(p,0|A) = beta(0.5,0.5)(p)

P(p,q!=0|A) = 0

P(p,p|!A) = beta(0.5,0.5)(p)

P(p,q!=p|A) = 0

You need a probability distribution over something like pairs (crash probability without this line, crash probability with this line). So the likelihood for (p,q) is Pr(2 crashes of 7 with the line | p) Pr(0 crashes of 10 without the line | q) = 21 p^2 (1-p)^5 (1-q)^10, and presumably you've got a prior that prefers small p, small q, and p ~= q.

Then you could look deeper into the kind of thing that might be going on and consider hypotheses like "there's a race condition that happens early in the program when debug logging is active, because of the extra time taken writing to the log" but it's going to be really tough assigning the relevant conditional probabilities. In practice, if a lot of your probability in the initial (p,q) thing ends up in regions where p<q by a substantial margin, it's probably reasonable to say that adding the line causes the program to misbehave.

So, if I understand correctly, my posterior probability is a distribution over the (p, q) space; and presumably, the particular data I've got becomes more likely as p approaches 2/7 and q approaches 0. So if I had, say, a uniform prior, or a prior that was uniform in the lower-left corner where both p and q are small, then I'd move towards zero q and nonzero p, or in other words "The line causes the crash". That seems to make sense.

The first step is to be clear about what you're asking and what you're trying to accomplish.

A whole bunch of things could be said to "cause" the crash, such as the power supplied to the system. You want a program that does what you want, and doesn't crash. You could immediately exit the program and likely always avoid a crash. The lack of such an exit could be said to be "the cause" of the crash. But what good does identifying such a cause do you?

A cause isn't necessarily "the thing that needs to be changed", though people often treat it that way.

Do I need a probability distribution over crash frequencies?

I think the data you have, and some ignorance prior, and an independence assumption of crash trials (which I think is likely a false assumption), you have Bernoulli trials and can assign a probability distribution to each of the two states - with and without the line.

But I don't know what good those distributions do you.

Why are you assigning probability distributions instead of doing more debugging? What do you expect to do with those distributions? What does the line do? Why not just remove it?

Basically, what's the problem you're trying to solve?

I'm reasonably confident I solved my actual coding issue; I have a mental model of what the race condition was and how I resolved it, and in many runs of the modified program I have not seen the crash. So the problem for this thread is just that I was confused on how to use Bayes in such a case, and would like to learn some math.

For the math on Bernoulli trials, see Jaynes and wikipedia for the Rule of Succession:

http://en.wikipedia.org/wiki/Rule_of_succession

http://www-biba.inrialpes.fr/Jaynes/cc18i.pdf

And this looks like a pretty good paper on setting priors by maximum entropy and transformation groups. http://bayes.wustl.edu/etj/articles/prior.pdf

As a relatively new member of this community I was wondering what implications you could foresee because of this:

http://understandinguncertainty.org/court-appeal-bans-bayesian-probability-and-sherlock-holmes

I know this might not be the right place to post, but Karma disallowed me from posting anywhere else, and I was hoping to get some thoughts on this.

I would kind of strongly suggest that a months-old thread is not the right place for questions about current events; if nothing else, nobody but me is likely to see the discussion. Additionally, although both are in some sense tagged 'Bayes', the discussions are very different. I suggest you take this to the most recent open thread.

Can you suggest where a Karma-less account (new) can post then? As everything ends up as draft for me.

Clearly you can post comments in threads, right? We have a biweekly open thread; dig down a bit in discussion, or wait for the one that'll be posted around March 1st. Alternatively, pick a post on the front page, not this ancient one. I only saw this because I get notifications of responses to my threads; nobody else is going to be seeing anything this deeply buried. You may still get some objections to off-topicness, but at least you won't be thread-necromancing in addition.

. Do I need a probability distribution over crash frequencies? That was the solution the last time I was confused over Bayes, but I don't understand what it means to say "The probability of having the line, given crash frequency f", which it seems I need to know to calculate a new probability distribution.

What you want to find is P(line causes crash|observed crashes)

let's call

L: "line is cause of crash"
R: "crash is random" (so basically R = ~L)
O: "observed crash distribution"

p(L|O) = p(O|L)p(L)/p(O) = p(O|L)p(L)/(p(O|L) + p(O|R))

If your prior is that L and R are equally likely, this becomes

p(L|O) = 0.5/(1+p(O|R)/p(O|L))

... so you just need the ratio between p(O|R) and p(O|L).

To get that, you may want to compare two models, one where the crash occurs randomly with probability pr, one when the crash occurs only when the line is present with probability pl (pr and pl with some simple prior distribution like Jeffrey's prior, as DanielLC recommends).