I have successfully confused myself about probability again.
I am debugging an intermittent crash; it doesn't happen every time I run the program. After much confusion I believe I have traced the problem to a specific line (activating my debug logger, as it happens; irony...) I have tested my program with and without this line commented out. I find that, when the line is active, I get two crashes on seven runs. Without the line, I get no crashes on ten runs. Intuitively this seems like evidence in favour of the hypothesis that the line is causing the crash. But I'm confused on how to set up the equations. Do I need a probability distribution over crash frequencies? That was the solution the last time I was confused over Bayes, but I don't understand what it means to say "The probability of having the line, given crash frequency f", which it seems I need to know to calculate a new probability distribution.
I'm going to go with my intuition and code on the assumption that the debug logger should be activated much later in the program to avoid a race condition, but I'd like to understand this math.
I'm afraid I don't quite see how to apply this to the problem. The beta distribution is presumably a probability, but what is it a probability of? Is there an interpretation to its two parameters that I'm not seeing?
It's a probability distribution of probabilities. You don't know how likely it is that the program crashes given that there's no bug. You just know that it crashed two out of seven times you ran it. If you start with a beta distribution for how likely it is to have different probabilities of crashing, you'll have an easy-to-calculate beta distribution for how likely it is after, and it's easy to calculate exactly how likely it is to crash given that distribution of probabilities.