I have successfully confused myself about probability again.
I am debugging an intermittent crash; it doesn't happen every time I run the program. After much confusion I believe I have traced the problem to a specific line (activating my debug logger, as it happens; irony...) I have tested my program with and without this line commented out. I find that, when the line is active, I get two crashes on seven runs. Without the line, I get no crashes on ten runs. Intuitively this seems like evidence in favour of the hypothesis that the line is causing the crash. But I'm confused on how to set up the equations. Do I need a probability distribution over crash frequencies? That was the solution the last time I was confused over Bayes, but I don't understand what it means to say "The probability of having the line, given crash frequency f", which it seems I need to know to calculate a new probability distribution.
I'm going to go with my intuition and code on the assumption that the debug logger should be activated much later in the program to avoid a race condition, but I'd like to understand this math.
The first step is to be clear about what you're asking and what you're trying to accomplish.
A whole bunch of things could be said to "cause" the crash, such as the power supplied to the system. You want a program that does what you want, and doesn't crash. You could immediately exit the program and likely always avoid a crash. The lack of such an exit could be said to be "the cause" of the crash. But what good does identifying such a cause do you?
A cause isn't necessarily "the thing that needs to be changed", though people often treat it that way.
I think the data you have, and some ignorance prior, and an independence assumption of crash trials (which I think is likely a false assumption), you have Bernoulli trials and can assign a probability distribution to each of the two states - with and without the line.
But I don't know what good those distributions do you.
Why are you assigning probability distributions instead of doing more debugging? What do you expect to do with those distributions? What does the line do? Why not just remove it?
Basically, what's the problem you're trying to solve?
I'm reasonably confident I solved my actual coding issue; I have a mental model of what the race condition was and how I resolved it, and in many runs of the modified program I have not seen the crash. So the problem for this thread is just that I was confused on how to use Bayes in such a case, and would like to learn some math.