Inherited Improbabilities: Transferring the Burden of Proof

komponisto

One person's modus ponens is another's modus tollens.

- Common saying among philosophers and other people who know what these terms mean.

If you believe A => B, then you have to ask yourself: which do I believe more? A, or not B?

- Hal Daume III, quoted by Vladimir Nesov.

Summary: Rules of logic have counterparts in probability theory. This post discusses the probabilistic analogue of modus tollens (the rule that if A=>B is true and B is false, then A is false), which is the inequality P(A) ≤ P(B)/P(B|A). What this says, in ordinary language, is that if A strongly implies B, then proving A is approximately as difficult as proving B.

The appeal trial for Amanda Knox and Raffaele Sollecito starts today, and so to mark the occasion I thought I'd present an observation about probabilities that occurred to me while studying the "motivation document"⁽¹⁾, or judges' report, from the first-level trial.

One of the "pillars" of the case against Knox and Sollecito is the idea that the apparent burglary in the house where the murder was committed -- a house shared by four people, namely Meredith Kercher (the victim), Amanda Knox, and two Italian women -- was staged. That is, the signs of a burglary were supposedly faked by Knox and Sollecito in order to deflect suspicion from themselves. (Unsuccessfully, of course...)

As the authors of the report, presiding judge Giancarlo Massei and his assistant Beatrice Cristiani, put it (p.44):

What has been explained up to this point leads one to conclude that the situation of disorder in Romanelli's room and the breaking of the window constitute an artificially created production, with the purpose of directing investigators toward someone without a key to the entrance, who would have had to enter the house via the window whose glass had been broken and who would then have perpetrated the violence against Meredith that caused her death.

Now, even before examining "what has been explained up to this point", i.e. the reasons that Massei and Cristiani (and the police before them) were led to this conclusion, we can pretty easily agree that if it is correct -- that is, if Knox and Sollecito did in fact stage the burglary in Filomena Romanelli's room -- then it is extremely likely that they are guilty of participation in Kercher's murder. After all, what are the chances that they just happened to engage in the bizarre offense of making it look like there was a burglary in the house, on the very same night as a murder occurred, in that very house? Now, one could still hypothetically argue about what their precise role was (e.g. whether they actually physically caused Kercher's death, or merely participated in some sort of conspiracy to make the latter happen via the actions of known burglar and undisputed culprit Rudy Guede), and thus possibly about how severely they should be treated by the justice system; but in any case I think I'm on quite solid ground in asserting that a faked burglary by Knox and Sollecito would very strongly imply that Knox and Sollecito are criminally culpable in the death of Meredith Kercher.

...which is in fact quite a problem for Massei and Cristiani, as I'll now explain.

Probability theory can and should be thought of as a quantitative version -- indeed, a generalization -- of the "rules of logic" that underpin Traditional Rationality. (Agreement with the previous sentence is essentially what it means to be a Bayesian.) One of these rules is this:

(1) If A implies B, then not-B implies not-A.

For example, all squares are in fact rectangles; which means that if something isn't a rectangle, it can't possibly be a square. Likewise, if "it's raining" implies "the sidewalk is wet", and you know the sidewalk isn't wet, then you know it's not raining.

The rule that gets you from "A implies B" and "A" to "B" is called modus ponens, which is Latin for "method that puts". The rule that gets you from "A implies B" and "not-B" to "not-A" is called modus tollens, which is Latin for "method that takes away". As the saying goes, and as we have just seen, they are really one and the same.

If, for a moment, we were to think about the Meredith Kercher case as a matter of pure logic -- that is, where inferences were always absolutely certain, with zero uncertainty -- then we could say that if we know that "burglary is fake" implies "Knox and Sollecito are guilty", and we also know that the burglary was in fact fake, then we know that Knox and Sollecito are guilty.

But, of course, there's another way to say the same thing: if we know that "burglary is fake" implies "Knox and Sollecito are guilty", and we also know that Knox and Sollecito are innocent, then we know that the burglary wasn't fake. (And that to the extent Massei and Cristiani say it was, they must be mistaken.)

In other words, so long as one accepts the implication "burglary fake => Knox and Sollecito guilty", one can't consistently hold that the burglary was fake and that Knox and Sollecito are innocent, but one can consistently hold either that the burglary was fake and Knox and Sollecito are guilty, or that Knox and Sollecito are innocent and the burglary was not fake.

The question of which of these two alternatives to believe thus reduces to the question of whether, given the evidence in the case, it's more believable that Knox and Sollecito are guilty, or that the burglary was "authentic". Massei and Cristiani, of course, aim to convince us that the latter is the more improbable.

But notice what this means! This means that the proposition that the burglary was fake assumes, or inherits, the same high burden of proof as the proposition that Knox and Sollecito committed murder! Unfortunately for Massei and Cristiani, there's no way to "bootstrap up" from the mundane sort of evidence that seemingly suffices to show that a couple of youngsters engaged in some deception, to the much stronger sort of evidence required to prove that two honor students⁽²⁾ with gentle personalities suddenly decided, on an unexpectedly free evening, to force a friend into a deadly sex game with a local drifter they barely knew, for the sake of a bit of thrill-seeking⁽³⁾.

You may have noticed that, two paragraphs ago, I left the logical regime of implication, consistency, and absolute certainty, and entered the probability-theoretic realm of belief, uncertainty, and burdens of proof. So to make the point rigorous, we'll have to switch from pure logic to its quantitative generalization, the mathematics of probability theory.

When logical statements are translated into their probabilistic analogues, a statement like "A is true" is converted to something like "P(A) is high"; "A implies B" becomes "A is (strong) evidence of B"; and rules such as (1) above turn into bounds on the probabilities of some hypotheses in terms of others.

Specifically, the translation of (1) into probabilistic language would be something like:

(2) If A is (sufficiently) strong evidence of B, and B is unlikely, then A is unlikely.

(2') If A is (sufficiently) strong evidence of B, then the prior probability of A can't be much higher than the prior probability of B.

Let's prove this:

Suppose that A is strong evidence of B -- that is, that P(B|A) is close to 1. We'll represent this as P(B|A) ≥ 1-ε, where ε is a small number. Then, via Bayes' theorem, this tells us that

$P(B)\frac{P(A|B)}{P(A)} \geq 1-\epsilon$

$\frac{P(A|B)}{P(A)} \geq \frac{1-\epsilon}{P(B)}$

so that

$\frac{P(A)}{P(A|B)} \leq \frac{P(B)}{1-\epsilon}$

and thus

$P(A) \leq \frac{P(A|B)P(B)}{1-\epsilon} \leq \frac{P(B)}{1-\epsilon}$

since P(A|B) ≤ 1. Hence we get an upper bound of P(B)/(1-ε) on P(A). For instance, if P(B) is 0.001, and P(B|A) is at least 0.95, then P(A) can't be any larger than 0.001/0.95 = 0.001052...

Actually, there's a simpler proof, direct from the definition of P(B|A), which goes like this: P(B|A) = P(A&B)/P(A), whence P(A) = P(A&B)/P(B|A) ≤ P(B)/P(B|A). (Note the use of the conjunction rule: P(A&B) ≤ P(B).)

The statement

(3)

$P(A) \leq \frac{P(B)}{P(B|A)}$

is a quantitative version of modus tollens, just as the equivalent statement

(4)

$P(B) \geq P(A)P(B|A)$

is a quantitative version of modus ponens. Assuming P(B|A) is high, what (4) says is that if P(A) is high, so is P(B); what (3) says is that if P(B) is low, so is P(A).

Or, in other words, that the improbability -- burden of proof -- of B is transferred to, or inherited by, A.

...which means you cannot simultaneously believe that (1) Knox and Sollecito's staging of the burglary would be strong evidence of their guilt; (2) proving their guilt is hard; and (3) proving they staged the burglary is easy. Something has to give; hard work must be done somewhere.

Of their 427-page report, Massei and Cristiani devote approximately 20 pages (mainly pp. 27-49) to their argument that the burglary was staged by Knox and Sollecito rather than being the work of known burglar Rudy Guede (including a strange section devoted to the refuting the hypothesis that the burglary was staged by Guede). But think about it: if they were really able to demonstrate this, they would scarcely have needed to bother writing the remaining 400-odd pages of the report! For, if it is granted that Knox and Sollecito staged the burglary, then, in the absence of any other explanation for the staging (like November 1 being Annual Stage-Burglary Day for some group to which Knox or Sollecito belonged) it easily follows with conviction-level confidence that they were involved in a conspiracy that resulted in the death of Meredith Kercher. You would hardly need to bother with DNA, luminol, or the cell phone traffic of the various "protagonists".

Yet it doesn't appear that Massei and Cristiani have much conception of the burden they face in trying to prove something that would so strongly imply their hugely a-priori-improbable ultimate thesis. Their arguments purporting to show that Knox and Sollecito faked the burglary are quite weak -- and, indeed, are reminiscent of those used time and again by their lower-status counterparts, conspiracy theorists of all types, from 9/11 "truthers" to the-Moon-landing-was-faked-ists. Here's a sample, from p.39:

Additionally, the fragments of broken glass were scattered in a homogeneous manner on the internal and external windowsill, without any noticeable displacement and without any piece of glass being found on the surface below the window. This circumstance...rules out the possibility that the stone was thrown from outside the house to allow access inside via the window after the glass was broken. The climber, in leaning his hands and then his feet or knees on the windowsill, would have caused some of the glass to fall, or at least would have had to move some of the pieces lest they form a trap and cause injury. However, no piece of glass was found under the window and no sign of injury was discovered on the glass found in Romanelli's room.

(The question to ask, when confronted with an argument like this, is: "rules out" with what confidence? If Massei and Cristiani think this is strong evidence against the hypothesis that the stone was thrown from outside the house, then that means they have a model that makes highly specific predictions about the behavior of glass fragments when a stone is thrown from inside, versus when it is thrown from outside. Predictions which can be tested⁽⁴⁾. This is one reason why I advocate using numbers in arguments; if Massei and Cristiani had been required to think carefully enough to give a number, that would have forced them to examine their assumptions more critically, rather than stopping on plausible-sounding arguments consistent with their already-arrived-at bottom line.)

The impression one gets is that Massei and Cristiani thought, on some level, that all they needed to do was make the fake-burglary hypothesis sound coherent -- and that if they did so, that would count as a few points against Knox and Sollecito. They could then do the same thing with regard to the other pieces of evidence in the case, each time coming up with an explanation of the facts in terms of an assumption that Knox and Sollecito are guilty, and each time thereby scoring a few more points against them -- points which would presumably add up to a substantial number by the end of the report.

But, of course, the mathematics of probability theory don't work that way. It's not enough for a hypothesis, such as that the apparent burglary in Filomena Romanelli's room was staged, to merely be able to explain the data; it must do so better than its negation. And, in the absence of the assumption that Knox and Sollecito are guilty -- if we're presuming them to be innocent, as the law requires, or assigning a tiny prior probability to their guilt, as epistemic rationality requires -- this contest is rigged. The standards for "explaining well" that the fake-burglary hypothesis has to meet in order to be taken seriously are much higher than those that its negation has to meet, because of the dependence relation that exists between the fake-burglary question and the murder question. Any hypothesis that requires the assumption that Knox and Sollecito are guilty of murder inherits the full "explanatory inefficiency penalty" (i.e. prior improbability) of the latter proposition.

If A implies B, then not-B implies not-A. It goes both ways.

Notes

⁽¹⁾ Some pro-guilt advocates have apparently produced a translation, but I haven't looked at it and can't vouch for it. Translations of passages appearing in this post are my own.

⁽²⁾ One of whom, incidentally, is known to be enjoying Harry Potter and the Methods of Rationality -- so take that for whatever it's worth.

⁽³⁾ From p. 422 of the report:

The criminal acts turned out to be the result of purely accidental circumstances which came together to create a situation which, in the combination of the various factors, made the crimes against Meredith possible: Amanda and Raffaele, who happened to find themselves without any commitments, randomly met up with Rudy Guede (there is no trace of any planned appointment), and found themselves together in the house on Via Della Pergola where, that very evening, Meredith was alone. A crime which came into being, therefore, without any premeditation, without any animosity or rancorous feeling toward the victim...

⁽⁴⁾ And sure enough, during the trial, the defense hired a ballistics expert who conducted experiments showing that a rock thrown from the outside would produce patterns of glass, etc. similar to what was found at the scene -- results which forced the prosecutors to admit that the rock was probably thrown from the outside, but which were simply ignored by Massei and Cristiani! (See p. 229 of Sollecito's appeal document, if you can read Italian.)

I'm not sure if I understand the post completely. Is the following a fair translation?

"If our prior against Knox's guilt is 1:1000000, and a staged burglary would imply with 99% certainty that Knox is guilty, and we have 1000:1 evidence that the burglary was staged, then mathematically this isn't enough to convict Knox. You need more evidence."

(For some reason the post is much longer than that, and makes all those arguments whose purpose I don't understand...)

(For some reason the post is much longer than that, and makes all those arguments whose purpose I don't understand...)

Such as....?

Yes, the point is a mathematical triviality. For that matter, so is Bayes' theorem itself. That doesn't mean that everybody grasps its implications at once, so that it isn't worth writing detailed posts on.

Pretty much, I think.

If the prior P(guilty) is 1:1000000 and P(guilty|staged) is really high, a consistent prior requires that P(staged) is around 1:1000000 as well. Therefore 1000:1 evidence isn't enough.

Prediction that defense appeal will succeed: http://predictionbook.com/predictions/1804

Great analysis. I am a little bit worried about adopting the idea of Bayesian logic in the criminal justice system, though, since it seems like it will just give people an incentive to commit a priori improbable crimes!

This amounts to saying that we should crank up the innocent/guilty conviction ratio for things that are improbable, which doesn't make much sense. The only way we'd catch more low a priori criminals is by lowering our standards of evidence, which necessarily means convicting more innocents.

That's no more helpful than saying "I'm worried that there's going to be an incentive not to get caught because we only punish the criminals we think are guilty". We still can't punish someone we don't think is guilty- but as an aside, it does mean we should punish effort spent on not getting caught.

We could also catch more low a priori criminals by improving our methods of dealing with the evidence, like using Bayesian logic.

Good post though I'm not sure the whole discussion about MP and MT was necessary- isn't the relationship between the break in and the murder biconditional (which I guess probabilistically speaking means that the specificity and sensitivity are about equally high)?

I think part of the problem we see is people either unfamiliar with or incapable of reasoning probabilistically. Yes the fundamental error the prosecutors are making applies to a deductive interpretation of their position- but my sense is the prosecutors (and probably most people involved) don't realize that a convincing arguments for a guilty verdict could begin by showing that the probability of a staged break in was say, 1/3, independent of all the other evidence. Saying that doesn't even sound like a point in favor of guilt. But of course 1/3 is well above a reasonable prior and requires a fair amount of evidence. And given enough additional evidence against Knox and Sollecito that estimate can go up to the .999 you want it at. But traditional rationality doesn't give people a good way of thinking about how low-probability sub-hypotheses can provide evidence for high probability hypotheses.

If A is strong evidence of B

I would word this as "If A is sufficiently strong evidence of B to overcome B's prior improbability". Simply saying "A is strong evidence of B" feels to me like a statement about the likelihood ratio, not the posterior probability.

This concern occurred to me; but consider: just how large does the likelihood ratio have to be for the evidence to be considered "strong"? Arguably, this depends on the prior probability (and thus the desired posterior probability) in the first place.

In any event, my hope is that the meanings of these vague verbal mnemonics are sufficiently clarified by the formulas.

ETA: Word "sufficiently" added to post.

One person's modus ponens is another's modus tollens.

I found a citation for this in Here is a hand to 'Dretske, Fred (1995), Naturalizing the Mind, Cambridge, Mass.: The MIT Press. ISBN 0-262-04149-9'. Let's give credit where credit is due; it's a truth worth remembering about logically valid arguments.

It's much older than that. A quick Google found a mention from 1973, but I would not be surprised to find it is at least a century older.

Summary added:

Rules of logic have counterparts in probability theory. This post discusses the probabilistic analogue of modus tollens (the rule that if A=>B is true and B is false, then A is false), which is the inequality P(A) ≤ P(B)/P(B|A). What this says, in ordinary language, is that if A strongly implies B, then proving A is approximately as difficult as proving B.

I'd like to add that the high "burden of proof" in this case comes from both

(1) the low prior probability of guilt in this case; and

(2) the high probability threshold that the court generally demands ("beyond a reasonable doubt") before it will condemn the defendants. If we wanted to bring in decision theory, we would assign a lot of disutility to a wrongful conviction. This determines what "likely" and "unlikely" mean in this context.

The former dwarfs the latter.

The prior for any given person being guilty is on the 'one in a million' order of magnitude, but the courts are probably closer to 1 in 10 on the margin (wild ass guess). If you translate "beyond reasonable doubt" to 99% or 99.9%, that still might translate down to 90% once you take into account overconfidence.

From looking at this example, it certainly doesn't look like the algorithm used by the court system has an innocent to guilty ratio of anywhere near as low as 1 in a million on the marginal cases.

It's a bit of an 'Einstein's arrogance' thing

Yes, if you want to argue P1 from P2, you must show P2. And whatever standard of proof you demand for acting on P1, you should also demand for P2 if you're using it this way: lowering the bar for proving P2 and then arguing P1 from P2 implicitly lowers the bar for proving P1.

And that's no less true if P1 = Knox and Sollecito are guilty and P2 = the burglary was fake. This much ought to be uncontroversial.

Of course, what evidence the prosecution actually has for P1 and P2 in this case is a different question.

I must sheepishly admit that the elaborate explanation actually made it more difficult for me to understand the argument; I had to reconstruct it myself in order to see it.

I must sheepishly admit that the elaborate explanation actually made it more difficult for me to understand the argument; I had to reconstruct it myself in order to see it.

I'm sorry to hear that. Unfortunately, adopting a general policy of eschewing elaborate explanations, and just stating the distilled main point and expecting everyone to understand its significance, won't work either (it's been tried).

I'm happy you at least did end up understanding the argument.

It might be more helpful if you identified particular passages that seemed to cloud your understanding.

Of course, what evidence the prosecution actually has for P1 and P2 in this case is a different question.

Do you suspect them of having strong evidence for either?

Agreed that just distilling the main point would do no better, and quite possibly worse.

It might be more helpful if you identified particular passages that seemed to cloud your understanding.

I didn't mean to suggest it was particularly cloudy; it was more meant as an admission of personal failing.

That said, most of my confusion stemmed from thinking you were introducing a new claim when you were actually introducing an alternate framing of the same claim. Were I editing this for publication I would recommend clearly labeling and separating those framings -- e.g., the probability-mathematical discussion vs. the real-world background -- and adding a brief introduction summarizing the basic argument.

Aka "tell 'em what you're going to tell 'em, tell 'em, and then tell 'em what you told 'em."

Do you suspect them of having strong evidence for either?

Nope. I know absolutely nothing about this case other than what you present here, and what you present doesn't suggest any such evidence.

(2') If A is (sufficiently) strong evidence of B, then the prior probability of A can't be much higher than the prior probability of B.

The logic and math of this post seems very confused. It feels like you are saying "If the sun rises tomorrow, I will kill you. The probability of me being a murderer is 1:10^8, therefor the probability of the sun rising tomorrow cannot be much higher than 1:10^8"

First off, there's some very crucial evidence you are forgetting in evaluating this case. The key element here is that numerous small bits of evidence are cumulative. This is a very important point, and one which jsteinhardt touched on already.

First, we have a very major piece of evidence: A murder did in fact occur, and the murderer must have been in Perugia at the time they committed this murder. At this point, we have approximately 10^5 possible suspects (Perugia has a population of 166,253), and we know, factually, that one of them is the guilty party. If we had no other evidence, we could reasonably assign a probability of 1:10^5 that each one is guilty. You'll notice that this is vastly higher than the normal probability of someone being a murderer, because we already have quite a few bits of evidence.

If the burglary was faked with odds of 10^4:1, then we can assume that everyone that had a motive to do so now has a guilt probability of 10^4:10^5, or approximately 1:10. A 10% chance of Amanda Knox being guilty is certainly poor evidence, and I don't see any reason to favor her over other people who have been demonstrated to have equal motive, but I'm also basing this entirely on this specific post.

The consequences of the burglary being faked does not change based the probability that it occurred, any more than my threat to kill you tomorrow will prevent the sun from rising. If we're dealing with probability, then there is some factual probability that the burglary was faked, based on it's own evidence, and this probability is entirely independent of the consequences. Further, this probability, and the probability that (Burglary Faked => Amanda is Guilty) cannot be 100%, despite your post assuming such. You cannot include impossible numbers and then expect a firm conclusion to arise.

P.S. If your point was simply "The judge is assuming impossible numbers", then I'd feel you are probably wrong on this point. I'd be happy to elaborate if that is in fact the case.

P.P.S. You can argue that a "higher standard of evidence" for proving that may be required, based on legal and moral principles, but that has nothing at all to do with probabilities.

First of all, Welcome to Less Wrong!

The logic and math of this post seems very confused. It feels like you are saying "If the sun rises tomorrow, I will kill you. The probability of me being a murderer is 1:10^8, therefor the probability of the sun rising tomorrow cannot be much higher than 1:10^8"

Well, if you knew that

(1) if the sun will rise tomorrow, then I am a murderer,

and you also knew that

(2) I am not a murderer,

then you would indeed know that

(3) the sun will not rise tomorrow.

First off, there's some very crucial evidence you are forgetting in evaluating this case.

There is very little -- certainly very little of importance -- that I have forgotten about this case. And I have pretty much all of the publicly available information that exists about it at my fingertips, in case I do forget anything. So, no.

What I am aware of and what I explicitly mention in a particular post are not the same thing.

The key element here is that numerous small bits of evidence are cumulative.

While this is mathematically as beyond dispute as (say) the formulas I presented in the post, it's worth noting that approaching something like a murder case in this way is highly dangerous, due to various cognitive biases (which of course are our subject matter here on LW). There is a serious risk of misjudging the strength of such small pieces of evidence, and compounding the error by missing dependence relations, so that you end up double-counting evidence.

But anyway, this doesn't have much to do with this post.

The consequences of the burglary being faked does not change based the probability that it occurred, any more than my threat to kill you tomorrow will prevent the sun from rising. If we're dealing with probability, then there is some factual probability that the burglary was faked, based on it's own evidence, and this probability is entirely independent of the consequences

The intuition you're describing here is exactly the one that my post aims to refute.

It might seem, as it no doubt did to Massei and Cristiani, that you should be able to establish whether the burglary was fake independently of whether Knox and Sollecito killed Kercher. After all, there isn't much physical connection between the events in Romanelli's room and the events in Kercher's, is there? But this is a mistake -- or at least, it is so long as you believe that establishing the burglary was fake would imply that Knox and Sollecito killed Kercher.

In principle, you certainly could establish that the burglary was fake without making any tacit assumption that Knox and Sollecito have a substantial probability of being guilty of murder; but the type of evidence you would need to do that would have to be very strong -- around as strong as the evidence needed to show their guilt independently of the burglary question.

P.S. If your point was simply "The judge is assuming impossible numbers", then I'd feel you are probably wrong on this point. I'd be happy to elaborate if that is in fact the case.

I'm not sure what you mean here, but it sounds like you perhaps think that Massei and Cristiani's reasoning is sound. (Do you think that Knox and Sollecito are likely guilty? If so, I'd be happy to discuss that, but this post wouldn't be the place to do it.)

P.P.S. You can argue that a "higher standard of evidence" for proving that may be required, based on legal and moral principles, but that has nothing at all to do with probabilities.

If you read the post, you'll see that it's pretty much entirely about probabilities.

I feel you can demonstrate quite amply that A is not sufficient proof of B, and that A=>B has not been sufficiently proven either.

However, neither of these assertions seems to be your point. You seem to be insisting that you can't prove A, and I see absolutely no evidence of that, unless you take as given the assumption A=>B. I would certainly challenge that assumption.

Am I mistaken in this understanding of your point?

P.S. I feel the evidence suggets Knox is guilty at around a 10% chance, based solely on the evidence in this post. I do not feel a 10% chance of guilt is sufficient. I have not considered any evidence outside this post, as my interest is in the probability math, and not in the actual case itself.

P.P.S. A discussion of the dangers of cognitive biases is, I feel, entirely orthogonal to a discussion on probabilities and mathematics. Given my interest is in the math, not the case, I am going to skip over discussion of such biases.

So you don't agree that if Knox and Sollecito faked the burglary, then they are likely guilty of murder?

I feel the evidence suggets Knox is guilty at around a 10% chance, based solely on the evidence in this post

There isn't much evidence presented in this post -- hardly any at all. (Plenty of information is linked to, of course...)

A discussion of the dangers of cognitive biases is, I feel, entirely orthogonal to a discussion on probabilities and mathematics.

Well, then I must say you're on the wrong website!

But if your interest is more in the math than in the case, I'm not sure what you're disagreeing with me about. It's kind of hard to dispute the inequality

$P\(A\$ %20\leq%20\frac{P(B)}{P(B%7CA)})

isn't it?

Your post is entangling three separate issues, and I think that's making it confusing to discuss (it was certainly confusing to read!)

Mathematics: "P(A) <= P(B) / P(B|A)."

No argument here.

Probability: How does the evidence A impact the probability of conclusion B?

I feel you are using entirely incorrect math for the situation, as stated in my previous posts. Just because the formula is correct, does not mean it is applicable to the problem you are trying to solve.

If A is proven, and A=>B is proven, then B is proven. The prior probability of B cannot negate the proof of A, nor the proof of A=>B, and thus has absolutely no bearing on the situation. Prior probability matters if, and only if, we are discussing p(A) and p(A=>B), at which point we still have new evidence (A, A=>B) that requires us to update to a new new probability of B.

You cannot continue to assert the prior probability of B, despite new evidence that suggests a higher or lower chance of B.

Cognitive Bias: Is the judge properly evaluating p(A) and p(A=>B)?

I feel that there is insufficient information to draw a firm conclusion here. However, based on what you have said, I feel rather strongly that you have misinterpreted his evaluations, because you are assuming that common language and logical language are the same.

If A is proven, and A=>B is proven, then B is proven

Agreed.

The prior probability of B cannot negate the proof of A, nor the proof of A=>B, and thus has absolutely no bearing on the situation

This sentence doesn't make sense as written. I don't know what it means for a probability to "negate" a proof, and so I don't know what you're trying to say when you assert that this can't happen.

My best guess is that you're trying to say that "even if P(A) is small on account of P(B) being small, some finite amount of evidence will still suffice to prove A, and therefore B." Which is obviously true, and nothing I have written says otherwise.

You cannot continue to assert the prior probability of B, despite new evidence that suggests a higher or lower chance of B.

This sounds like our previous discussion, where you said, and I agreed, that other evidence that Knox and Sollecito killed Kercher could raise the probability of their having faked the burglary. I've never disputed this, but have pointed out that this isn't Massei and Cristiani's reasoning. They attempted to prove the fake burglary without invoking the other murder evidence.

However, based on what you have said, I feel rather strongly that you have misinterpreted his evaluations, because you are assuming that common language and logical language are the same.

You'll have to be more specific here.

Ahhh, you make so much more sense when you phrase it this way!

"other evidence that Knox and Sollecito killed Kercher could raise the probability of their having faked the burglary"

But my point is, this is backwards. It only works if you assume with near-100% certainty that faking the burglary and being the murderer are correlated. Otherwise "faked the burglary" IS simply evidence that Knox is the murderer.

If we prove that Knox killed Kercher, it proves that any 100% correlation is true. It does NOT prove any less-than-100% correlation. It's even entirely possible for a correlation to be one-directional (A implies B, but B does not imply A).

Thus, Knox killed Kercher is only proof of a faked burglary if you already assume the correlation is proven and two-directional.

In probability, "correlations" are always bidirectional. Bayes theorem:

$P(A|B\$ =\frac{P(B|A)P(A)}{P(B)})

If P(B|A) > P(B), then P(A|B) > P(A). By the same factor even:

$\frac{P(A|B\$ }{P(A)}=\frac{P(B|A)}{P(B)})

The analogy to biconditionality in deductive logic would be P(A|B)= P(B|A) which obviously isn't always true.

I'm just trying to understand your point a bit better. Hopefully you don't mind the late reply (I've been on vacation for a while)

"In probability, "correlations" are always bidirectional."

Can't there be three separate, equally valid points which, if proven, would prove she was the murderer? Even if those three equally valid proofs of her guilt are contradictory? Once we know she is guilty, they can't all three be true, can they?

I'm not sure how one would accurately express this, given what you're saying. The probability that A implies Guilt, B implies Guilt, and C implies Guilt can all be 100%, yes? Obviously, the probability that guilt implies all of A+B+C is 0%, since they are contradictory. Therefor, how can it be correct to assume the opposite correlation, that Guilt implies A at 100% certainty?

It isn't!

In general it is not true that P(A|B) = P(B|A). P(A|Guilt) depends on the prior probabilities of A and Guilt, as well as P(Guilt|A). For example, say we have four possible proofs A, B, C, D, and P(Guilt|A or B or C) = 1, and P(Guilt|D) = 0. Our prior is all four are equally likely: P(A) = P(B) = P(C) = P(D) = 0.25. P(Guilt) is then 0.75 = P(Guilt|A)P(A) + P(Guilt|B)P(B)...

Given this, we have:

$\\begin\{aligned\}P\(A|Guilt\$ &=\frac{P(Guilt%7CA)P(A)}{P(Guilt)}\\&=\frac{1.0*0.25}{0.75}\\&=\frac{1}{3}\end{aligned})

P(A|Guilt) isn't 1. But it's 33%, which is still higher than the prior %25: that is, Guilt is evidence for A.

By the way I think it might help if you avoid talking in proofs and implication and 100% certainty. In hypothetical examples it's useful to set things to P(X) = 1, but in the real world evidence is always probabilistic; nothing's ever 100%.

Ahhh, that helps clear things up. For some reason I'd been understanding you as saying that, given P(Guilt|A) = 1, P(A|Guilt) was also 1. It looks like what you meant was just that Guilt is evidence for, but not necessarily 100% proof of, A. Am I getting that all correct?

Yes.

P(Guilt|A) = P(A|Guilt) only when P(A) = P(Guilt). In which case it would be 100% proof. But that is a rare situation.

Nitpick: the two conditionals also be equal if A and Guilt were mutually exclusive. (in that case, of course, the two conditionals would be both zero)

Theorem: If A is evidence of B, then B is also evidence of A.

Proof: To say that A is evidence of B means that P(A|B) > P(A|~B), or in other words that P(A&B)/P(B) > P(A&~B)/P(~B), which we may write as P(A&B)/P(B) > (P(A)-P(A&B))/(1-P(B)). Algebraic manipulation turns this into P(A&B) > P(A)P(B), which is symmetric in A and B; hence we can undo the manipulations with the roles of A and B reversed to arrive back at P(B|A) > P(B|~A). QED.

Hence, if A implies B, then B also implies A!

Now of course, the strengths of these implications might be vastly different. But that's a separate matter.

Here, the point is that A implies B with near certainty (where A is "K&S faked burglary" and B is "K&S killed Kercher"); I'm not terribly concerned with how strongly B implies A. I don't need for B to imply A very strongly to make my point, but Massei and Cristiani would definitely need that in order to enable any charitable reading of their burglary section at all.

But, of course, the mathematics of probability theory don't work that way. A hypothesis, such as that the apparent burglary in Filomena Romanelli's room was staged -- doesn't get points for its ability to explain the data unless it does so better than its negation. And, in the absence of the assumption that Knox and Sollecito are guilty -- if we're presuming them to be innocent, as the law requires, or assigning a tiny prior probability to their guilt, as epistemic rationality requires -- this contest is rigged. The standards for "explaining well" that the fake-burglary hypothesis has to meet in order to be taken seriously are much higher than those that its negation has to meet, because of the dependence relation that exists between the fake-burglary question and the murder question.

This isn't quite true. If the prior probability of being a murderer is 1 in 10^6, and I can find 30 things that are explained twice as well by the murder hypothesis as the non-murder hypothesis, then the posterior probability of being a murderer is 99.9%, in the absence of mitigating factors (since 2^30/10^6 is about 1000.) So, many pieces of weak evidence for an unlikely proposition can still establish that proposition.

You'd also need those 30 things to be independent.

Probably superfluous nitpicking: you can build a strong case even with partially-interdependent pieces of evidence, you will just need more of them since you have to work off their conditional probabilities (which is mathematically equivalent to "splitting" them into independent pieces of evidence).

So, many pieces of weak evidence for an unlikely proposition can still establish that proposition.

This doesn't contradict anything in the paragraph you quoted. (If you don't mind, tell me where you thought the contradiction was, so that I can explain further.)

The sentence in particular that I objected to was

The standards for "explaining well" that the fake-burglary hypothesis has to meet in order to be taken seriously are much higher than those that its negation has to meet, because of the dependence relation that exists between the fake-burglary question and the murder question.

My impression was that you were claiming that, since the fake burglary hypothesis would imply murder, evidence must be extremely strong to be counted in favor of fake burglary. But I may have misunderstood you. At any rate, you elsewhere state that

Of their 427-page report, Massei and Cristiani devote approximately 20 pages (mainly pp. 27-49) to their argument that the burglary was staged by Knox and Sollecito [...] if they were really able to demonstrate this, they would scarcely have needed to bother writing the remaining 400-odd pages of the report!

If you agree with my point, then I don't see how you can find it odd that they would feel obliged to include more in their report than just the claim that the burglary was faked. Like you said, even if the evidence is fairly strong in favor of this assertion, far more evidence would be needed to convict those two of murder, which is presumably the point of the remaining 400 pages.

My impression was that you were claiming that, since the fake burglary hypothesis would imply murder, evidence must be extremely strong to be counted in favor of fake burglary

My claim is that to reach a desired level of certainty about the burglary being faked, you would need evidence of approximately the same strength required to reach the same level of certainty about murder. (In other words, that the prior probability of fake burglary is roughly the prior probability of murder.)

Like you said, even if the evidence is fairly strong in favor of [fake burglary], far more evidence would be needed to convict those two of murder

This is the very opposite of what I said! What I said was that if you knew with high confidence that the burglary was fake, then you would need almost no additional evidence to convict of murder.

Okay, so we seem to be in complete agreement about how the math works out. If so, then I'm confused as to why you object so strongly to the prosecution's argument on purely mathematical grounds; I haven't read their argument myself, so it's entirely possible that the argument itself is weak in some way, but I think that right now we're just talking about the math.

If we ignore their specific language, the plan of coming up with ~20 pieces of moderate evidence is a perfectly reasonable strategy for correctly establishing guilt, assuming that there is absolutely no mitigating evidence. Your complaint seems to be that they use different language/notation than you and I do to talk about evidence, which seems hardly fair.

Although I would also note that since humans are bad at intuitively distinguishing between moderate evidence for and moderate evidence against a hypothesis, trying to find many pieces of weak evidence is probably not a good strategy if the goal is to get humans to correctly decide the accuracy of an assertion.

ETA: By the way, I've been working under the assumption, based on the tone of the original post, that you think there are serious mathematical flaws in the prosecutions argument. If that's not the case, and you just wanted to use this case as a point of illustration, then I apologize for the confusion.

What I gather is that the prosecution concludes, after the first twenty pages of the brief that discuss the break-in exclusively that the break in was almost certainly staged by Knox and Sallecito. But if they really thought that they would have already more or less made the case the Knox and Sallecito are guilty and the remaining 380 pages would be unnecessary. So the prosecution can't be weighing the evidence correctly.

Okay, so we seem to be in complete agreement about how the math works out. If so, then I'm confused as to why you object so strongly to the prosecution's argument on purely mathematical grounds; I haven't read their argument myself, so it's entirely possible that the argument itself is weak in some way, but I think that right now we're just talking about the math.

If I may presume to diagnose your confusion, it seems that you're compartmentalizing between "mathematical" aspects of an argument and "other" aspects. But I'm not. I'm taking it for granted that "the math" is the argument. Probability theory is a mathematical formalization of the process of argument and inference. It isn't just a cool gadget that one throws in on special occasions.

So, I don't object to Massei and Cristiani's argument on "purely mathematical grounds". I simply object to it, period -- and in this post I have used mathematical language to describe, in precise terms, what my objection is.

(And I expected readers to assume, given my previous writing on the case, that this particular point was far from my only objection to Massei and Cristiani's 427-page argument that Knox and Sollecito killed Kercher; hence I was not expecting replies of the form "well, but they might have other good evidence that Knox and Sollecito are guilty". They don't; we've already covered that.)

If we ignore their specific language, the plan of coming up with ~20 pieces of moderate evidence is a perfectly reasonable strategy for correctly establishing guilt, assuming that there is absolutely no mitigating evidence. Your complaint seems to be that they use different language/notation than you and I do to talk about evidence, which seems hardly fair.

I honestly have no idea where you're getting this from. I don't know of any passage in the post where I complained about Massei and Cristiani's choice of language; and nor did I attempt to argue (as several people seem to have thought I did) against a strategy of proving one's case by adducing a large amount of weak evidence in one's favor (although as a matter of fact I do believe that is the wrong type of argument to expect for a proposition of this sort, and that people have probably been misled by detective stories and the like into thinking it a reasonable strategy, when it would actually be very difficult to make work in practice -- that however would be the topic of a separate post, and isn't addressed in this one).

My criticism of Massei and Cristiani in this post is really quite simple, or so I thought: the type of evidence that they cite to prove that the burglary was faked suggests that they did not realize how high the burden of proof for this proposition was -- that, just to prove the burglary was faked, they needed evidence of the same level of strength as would be required to directly prove Knox and Sollecito guilty of murder.

Quite frankly, I'm baffled at how this point seems to have gotten lost, because I thought I was emphatic and indeed repetitious about it in the post.

If we ignore their specific language, the plan of coming up with ~20 pieces of moderate evidence is a perfectly reasonable strategy for correctly establishing guilt, assuming that there is absolutely no mitigating evidence. Your complaint seems to be that they use different language/notation than you and I do to talk about evidence, which seems hardly fair.

I think the assertion is that they appear to be coming up with ~20 pieces of evidence and then trying to say that each piece is very strong - or at least, they have done so for the burglary hypothesis, so they might be doing so for the other pieces of evidence too. Naturally, their methods of making each piece look very strong are flawed.

You almost pinpointed the reason why this happening here:

trying to find many pieces of weak evidence is probably not a good strategy if the goal is to get humans to correctly decide the accuracy of an assertion.

Humans are bad at intuitively handling evidence in general. There is a possibility that this case suffers from a serious malady: presiding judge Massei has decided the correct, accurate decision in this matter is that Knox and Sollecito are guilty, and has strategically prepared the judge's report to get people to decide this way. This hypothesis explains why the judge has produced such a weighty document when 20 pages of it would have sufficed.

"my claim is that to reach a desired level of certainty about the burglary being faked, you would need evidence of approximately the same strength required to reach the same level of certainty about murder."

This assumes that the burglary being faked is the only piece of evidence. If we have three sets of evidence, and each one suggests a 90% chance of guilt, and each is independent of the other, then we have probability (10:1) x (10:1) x (10:1) = (1000:1). No one set of evidence needs to have a (1000:1) probability of guilty in order to reach a final conclusion that the odds are (1000:1). Arguing via modus tollens about a single piece of evidence tells us only that that evidence, in and of itself, is insufficient proof. It tells us nothing about how that evidence may act cumulatively with other pieces of evidence.

This assumes that the burglary being faked is the only piece of evidence

No; I fully grant that other evidence that Knox and Sollecito are guilty, if it exists, would be evidence of the burglary being fake, which would lower the burden of proof on that hypothesis.

However, that isn't how Massei and Cristiani reason. They don't say, in the section on the burglary (which is at the beginning of the report), "and since we know from all the other evidence that Knox and Sollecito are guilty, we can therefore easily use these arguments about glass patterns to confirm that they did in fact stage the burglary, in case you were wondering about that". And it's easy to see why they don't say that: there wouldn't be much point, because if they've already shown that Knox and Sollecito are guilty, their work is done! (*)

Instead, what they say is "these arguments about glass patterns etc. prove that the burglary was staged. Now, having established that piece of evidence against them (i,e. the staging of the burglary), let us now consider the other evidence, which, in combination with the burglary, will show how really guilty they are."

( ) Technically, staging a burglary is itself an offense, so there may actually have been reason for them to proceed this way. But in that case the burglary issue would have come at the end* of the report, not the beginning.

"these arguments about glass patterns etc. prove that the burglary was staged. Now, having established that piece of evidence against them (i,e. the staging of the burglary), let us now consider the other evidence, which, in combination with the burglary, will give us an accurate probability on whether they are guilty"

I've bolded a single change to your quote. With that change made, do you feel this is a reasonable assertion?

No. The error is in the first sentence

these arguments about glass patterns etc. prove that the burglary was staged.

They only (conceivably) prove the burglary was staged if you're already taking into account the rest of the evidence of murder.

That's only true if you assume p(A=>B) is 1

...or approximately 1.

(And by P(A=>B), I think you meant P(B|A), didn't you?)

P(Someone faked the burglary) != P(Amanda Knox faked the burglary). The report asserts the first, not the second, from my reading.

Given that "someone faked" is true, I think assigning an approximately 100% chance that Amanda Knox is guilty is rather seriously unfounded. What am I missing?

What am I missing?

That "burglary was faked" is shorthand for "burglary was faked by Knox and Sollecito" throughout this post and discussion. The latter is what Massei and Cristiani argue, and is what would most strongly imply that Knox and Sollecito are guilty of murder.

The evidence you quoted merely suggests the burglary was faked. I'd assume there are more people with a motive to do that than just Knox and Sollecito? Why would we assume, with high enough certainty to convict, that it was certainly them and not a roommate, or someone who knew them?

Look, I'm not saying Massei and Cristiani's argument that Knox and Sollecito staged the burglary is convincing, by any means!

That said, their argument that if the burglary was staged, the staging was done by Knox and Sollecito is probably the most convincing part of it. At the very least, they would have a highish prior, since they had access to the house and were "available" that night to do the staging if they wanted to.

I figured this out but it threw me when I got to this part of the post. I'm not sure the convenience of the shorthand justifies throwing your readers off.

Exactly. The problem for Knox and Sollecito is that there is so much evidence that even if it was all weak (and it isn't) just the number of items is sufficent to arrive at a high certainty of guilt because they are all independent events.

http://themurderofmeredithkercher.com/The_Evidence

That is a lot of evidence. Some items are so strong that they in isolation would be sufficent to reach the level of certainty required to convict and others are strong evidence but not enough. I count 24 items.

Only works if each piece of evidence is independent. They're clearly not.

That's only true if you assume p(A=>B) is 1

...or approximately 1.

(And by P(A=>B), I think you meant P(B|A), didn't you?)

LESSWRONG
LW

LESSWRONG
LW

46

Inherited Improbabilities: Transferring the Burden of Proof

46

46