Bayes’ theorem tells us how we ought to update our beliefs given evidence.
It involves the following components:
P(A|B), called the posterior; it is the probability of A given B. In the case of the coin this is the probability that the coin is biased given the result of an experiment (i.e., a sequence of flips).
P(B|A), called the likelihood, is the probability of B given A. For our coin example this would be the probability of some particular ratio of heads to tails, given that the coin is biased.
P(A) is called the prior. This is the probability that the coin is biased, before we consider any evidence.
P(B) is the marginal. This is probability of us getting some sequence of head and tails, before considering any evidence.
The overall shape of the theorem is this:
Posterior ∝ likelihood × prior.
If you were explain this to a high-school student, they might ask this naïve question:
Why should we bother to go through the process of calculating the likelihood and prior at all? Why can’t we just try and directly calculate the posterior? We have a formula for P(A|B), namely P(A∩B)P(B).
Maybe you'll say "That formula is fine but not useful in real life. It's usually tractable to go via conditional updates rather than the high school definition."
But if conditionals are easy to get, why not just go directly to the posterior? What's even the difference between A and B? Aren't they just symbols? We could easily rearrange the theorem to calculate P(B|A) as a function of P(A|B).
What is it that makes using strings of coin flips to calculate biases more natural or scientific?
Perhaps it is ease. If it is the case that for some reason calculating P(B|A) is easier, what makes it easier?
This is a coin.
It might be biased.
This is Bayes’ theorem.
P(A|B)=P(B|A)×P(A)P(B)
Bayes’ theorem tells us how we ought to update our beliefs given evidence.
It involves the following components:
The overall shape of the theorem is this:
Posterior ∝ likelihood × prior.
If you were explain this to a high-school student, they might ask this naïve question:
Why should we bother to go through the process of calculating the likelihood and prior at all? Why can’t we just try and directly calculate the posterior? We have a formula for P(A|B), namely P(A∩B)P(B).
Maybe you'll say "That formula is fine but not useful in real life. It's usually tractable to go via conditional updates rather than the high school definition."
But if conditionals are easy to get, why not just go directly to the posterior? What's even the difference between A and B? Aren't they just symbols? We could easily rearrange the theorem to calculate P(B|A) as a function of P(A|B).
What is it that makes using strings of coin flips to calculate biases more natural or scientific?
Perhaps it is ease. If it is the case that for some reason calculating P(B|A) is easier, what makes it easier?
Perhaps it is usefulness. If likelihoods are what's worth publishing, not posteriors, why are they worthier?
How do you spot a likelihood in the wild?