Recently someone posed an oddly-constructed exercise on Bayes' Theorem, where instead of the usual given information P(A|B),P(A|¬B),P(B) they gave P(A|B),P(A|¬B),P(A). I won't link to the source, because they intended to pose a problem in the usual form and I don't want to draw attention to their mistake. But the problem itself is an interesting one, and it does have a solution (in most cases) which turns out to be quite neat.
Problem Statement
Given P(A|B)=x,P(A|¬B)=y,P(A)=a, find P(B|A),P(B|¬A).
Why is this interesting?
(Putting this section here to separate the problem statement from my solution, since I don't know of a way to spoiler in MarkDown.)
Suppose you want to evaluate the performance of a heuristic, B, for determining the underlying A, which is accurately determined later. You know the base rate of A since it is (say) recorded in public statistics, and you know how common B -errors in either direction are, because the same statistics report P(A|B) and P(A|¬B) for their obvious use in decisionmaking.
But you are interested not just in the general case, but in some subcategory C. In the absence of further information your default prior hypothesis is that P(B|A∧C)=P(B|A) and similarly P(B|¬A∧C)=P(B|¬A) (i.e., the heuristic's reliability is independent of C), but you know that within C the base rate of A, P(A|C), is some value different from P(A).
Now under that hypothesis, knowing P(B|A),P(B|¬A) would allow you to determine your prior for P(A|B∧C) and P(A|¬B∧C) by an ordinary application of Bayes' Theorem. If you then have a small sample drawn from C, you could for instance denote those priors as p and q and take as your prior distributions that P(A|B∧C)∼β(p,1−p) and P(A|¬B∧C)∼β(q,1−q), which you then update from your sample. (I don't know whether this places the 'right' amount of credence in P(B|A∧C)=P(B|A) etc.; one could just as easily take β(2p,2(1−p)) to treat the ¬C results as more 'relevant'.)
But we don't have P(B|A), we just have P(A|B). Hence the problem-as-stated.
Solution
We first expand P(A|B) and P(¬A|B) using Bayes' Theorem:
P(A|B)=P(A∧B)P(B)P(¬A|B)=P(¬A∧B)P(B)
Now we can re-arrange these into two expressions for P(B), and set them equal:
P(A∧B)P(A|B)=P(B)=P(¬A∧B)P(¬A|B)
Cross-multiplying and substituting in our given x=P(A|B),
(1−x)P(A∧B)=xP(¬A∧B)xP(¬A∧B)+(x−1)P(A∧B)=0
We can do the same thing with y=P(A|¬B) to obtain:
yP(¬A∧¬B)+(y−1)P(A∧¬B)=0
We then use the exhaustivity of {B,¬B} to expand P(A) and P(¬A):
P(A∧B)+P(A∧¬B)=P(A)=aP(¬A∧B)+P(¬A∧¬B)=P(¬A)=1−a
At this point we have four simultaneous equations in the four unknowns P(A∧B),P(A∧¬B),P(¬A∧B),P(¬A∧¬B), which we can solve by matrix inversion, Gaussian elimination, or some similar method. Our matrix is
The determinant is det(M)=y−x, so our condition for a solution is that x≠y. This makes sense: if x=y then the heuristic B tells us nothing about A, so there is no way to link our givens (all probabilities of A) to joint probabilities involving B. So let us assume that x≠y. In that case the solution is
[Epistemic status: QED.]
Recently someone posed an oddly-constructed exercise on Bayes' Theorem, where instead of the usual given information P(A|B),P(A|¬B),P(B) they gave P(A|B),P(A|¬B),P(A). I won't link to the source, because they intended to pose a problem in the usual form and I don't want to draw attention to their mistake. But the problem itself is an interesting one, and it does have a solution (in most cases) which turns out to be quite neat.
Problem Statement
Given P(A|B)=x,P(A|¬B)=y,P(A)=a, find P(B|A),P(B|¬A).
Why is this interesting?
(Putting this section here to separate the problem statement from my solution, since I don't know of a way to spoiler in MarkDown.)
Suppose you want to evaluate the performance of a heuristic, B, for determining the underlying A, which is accurately determined later. You know the base rate of A since it is (say) recorded in public statistics, and you know how common B -errors in either direction are, because the same statistics report P(A|B) and P(A|¬B) for their obvious use in decisionmaking.
But you are interested not just in the general case, but in some subcategory C. In the absence of further information your default prior hypothesis is that P(B|A∧C)=P(B|A) and similarly P(B|¬A∧C)=P(B|¬A) (i.e., the heuristic's reliability is independent of C), but you know that within C the base rate of A, P(A|C), is some value different from P(A).
Now under that hypothesis, knowing P(B|A),P(B|¬A) would allow you to determine your prior for P(A|B∧C) and P(A|¬B∧C) by an ordinary application of Bayes' Theorem. If you then have a small sample drawn from C, you could for instance denote those priors as p and q and take as your prior distributions that P(A|B∧C)∼β(p,1−p) and P(A|¬B∧C)∼β(q,1−q), which you then update from your sample. (I don't know whether this places the 'right' amount of credence in P(B|A∧C)=P(B|A) etc.; one could just as easily take β(2p,2(1−p)) to treat the ¬C results as more 'relevant'.)
But we don't have P(B|A), we just have P(A|B). Hence the problem-as-stated.
Solution
We first expand P(A|B) and P(¬A|B) using Bayes' Theorem:
Cross-multiplying and substituting in our given x=P(A|B),
We can do the same thing with y=P(A|¬B) to obtain:
We then use the exhaustivity of {B,¬B} to expand P(A) and P(¬A):
At this point we have four simultaneous equations in the four unknowns P(A∧B),P(A∧¬B),P(¬A∧B),P(¬A∧¬B), which we can solve by matrix inversion, Gaussian elimination, or some similar method. Our matrix is
and our linear system is
The determinant is det(M)=y−x, so our condition for a solution is that x≠y. This makes sense: if x=y then the heuristic B tells us nothing about A, so there is no way to link our givens (all probabilities of A) to joint probabilities involving B. So let us assume that x≠y. In that case the solution is
as can be verified by multiplying out Mv.
From here the rest is trivial:
Analysis
To see what this result means, let's write it out with our givens substituted back in:
which holds as long as P(A|¬B)≠P(A|B).
We could also have determined this by summing the first and third rows of v, 1y−x(x+1−x)(y−a)=y−ay−x, to find that
under the same condition, and then substituting this into Bayes' Theorem.
Now let's consider b=P(A|B)−P(A),¯b=P(A|¬B)−P(A) as, in some sense, measures of the "excess A" under B and ¬B respectively. This now tells us that
which I think is a good place to stop.