One sort of answer is that we often want the posterior, and we often have the likelihood. Slightly more refined: we often find the likelihood easier to estimate than the posterior, so Bayes' Rule is useful.
Why so?
I think one reason is that we make it the "responsibility" of hypotheses to give their likelihood functions. After all, what is a hypothesis? It's just a probability distribution (not a probability distribution that we necessarily endorse, but, one which we are considering as a candidate). As a probability distribution, its job is to make predictions; that is, give us probabilities for possible observations. These are the likelihoods.
We want the posterior because it tells us how much faith to place in the various hypotheses -- that is, it tells us whether (and to what degree) we should trust the various probability distributions we were considering.
So, in some sense, we use Bayes' Rule because we aren't sure how to assign probabilities, but we can come up with several candidate options.
One weak counterexample to this story is regression, IE, curve-fitting. We can interpret regression in a Bayesian way easily enough. However, the curves don't come with likelihoods baked in. They only tell us how to interpolate/extrapolate with point-estimates; they don't give a full probability distribution. We've got to "soften" these predictions, layering probabilities on top, in order to apply the Bayesian way of thinking.
This is a coin.
It might be biased.
This is Bayes’ theorem.
P(A|B)=P(B|A)×P(A)P(B)
Bayes’ theorem tells us how we ought to update our beliefs given evidence.
It involves the following components:
The overall shape of the theorem is this:
Posterior ∝ likelihood × prior.
If you were explain this to a high-school student, they might ask this naïve question:
Why should we bother to go through the process of calculating the likelihood and prior at all? Why can’t we just try and directly calculate the posterior? We have a formula for P(A|B), namely P(A∩B)P(B).
Maybe you'll say "That formula is fine but not useful in real life. It's usually tractable to go via conditional updates rather than the high school definition."
But if conditionals are easy to get, why not just go directly to the posterior? What's even the difference between A and B? Aren't they just symbols? We could easily rearrange the theorem to calculate P(B|A) as a function of P(A|B).
What is it that makes using strings of coin flips to calculate biases more natural or scientific?
Perhaps it is ease. If it is the case that for some reason calculating P(B|A) is easier, what makes it easier?
Perhaps it is usefulness. If likelihoods are what's worth publishing, not posteriors, why are they worthier?
How do you spot a likelihood in the wild?