I am sorry I did not manage to comment on this earlier; I did not suspect it would get promoted.
In short, your treatment of hearsay, and how the legal system addresses it, is simply wrong. Most of what you talk about is actually about the Confrontation Clause. I don't know if this is due to an intentional simplification of your examples, but the cases you use just don't work that way.
The main case you talk about, Davis v. Washington, is not a case about hearsay; just look at the wikipedia summary. It is a case about the confrontation clause. This is a clause that says that those accused of crimes have the right to confront the witnesses against them; if someone talks to the police under certain circumstances, that testimony may not be entered. It does not matter how reliable it is. See Crawford v. Washington. The "indicia of reliability test" was abandoned in Crawford, because it was completely circular - it was compared to doing away with a jury trial because the defendant was obviously guilty.
More generally, there is almost never a balancing test in hearsay. Hearsay is a series of rules that are applied systematically. Out of court statements are considered unreliable ...
Is there any evidence than American or any other legal system is significantly better than chance at what it does? Or even not significantly worse than chance? (by being biased instead of just random)
That's the first question we should be asking, before concerning ourselves with minor issues about admissibility of evidence.
That's an excellent question. The answer depends on exactly what you mean by "better than chance." If you mean "more than half of those convicted of a crime are guilty of that crime," then I'd say yes, there is excellent reason to think that they are. Prosecutors usually have access to several times more reports of crime than they can afford to go out and prosecute. Prosecutors are often explicitly or implicitly evaluated on their win ratio -- they have strong incentives to pick the 'easy' cases where there is abundant evidence that the suspect is guilty. Most defense lawyers will cheerfully concede that the vast majority of their clients are guilty -- either the clients admit as much to their lawyers, or the clients insist on implausible stories that don't pass muster, which the lawyers have to disguise in order to get their clients to go free. Although as a matter of law and rhetoric people are presumed innocent until proven guilty, as a matter of cold statistics, someone who has been lawfully indicted in America is probably more likely to be guilty than innocent. In fact, there are probably so many guilty suspects in Court that the legal system does strictly ...
I think the intended question is whether the legal system adds anything beyond a pure chance element. Somehow we'd need a gold standard of actually guilty and innocent suspects, then we'd need to measure whether p(guilty|convicted) > 80%. You could also ask if p(innocent|acquitted) > 20%, but that's the same question.
It proves that mistakes have been made, but in the end, no, I don't think it's terribly useful evidence for evaluating the rate of wrongful convictions. Why not? There have been 289 post-conviction DNA exonerations in US history, mostly in the last 15 years. That gives a rate of under 20 per year. Suppose 10,000 people a year are incarcerated for the types of crime that DNA exoneration is most likely to be possible for, namely murder and rape (I couldn't find exact figures, but I suspect the real number is at least this big). Then considering DNA exonerations gives us a lower bound of something like .2% on the error rate of US courts.
That is only useful evidence about the error rate if your prior estimate of the inaccuracy was less than that, and I mean, come on, really? Only one conviction in 500 is a mistake?
DNA exoneration happens when one is innocent and combination of extremely lucky circumstances make retesting of evidence possible. The latter I would be shocked to find at higher than 1:100 chance.
Literally 100% of people who ever lived have done multiple things which unfriendly legal system might treat as crimes
Entirely for the sake of being pedantic, I'll point out that many people have avoided this, if only by dying very shortly after being born.
Unless you believe that young black men in US are the most criminal group in history of the world, most of them who are in prisons must be innocent by pure statistics.
This is by far your weakest point.
Men commit more crime than women. Young people commit more crime than the elderly. Black people commit more crime than white people.
Ergo yes young black males are probably one of the most criminal groups in the developed world. I bet Japanese American grandmothers are among the least, I can't imagine why,but somehow it just seems overwhelmingly likley.
If you want to quibble that government is more likley to make things that men, the young and black people do, illegal feel free to, but considering all three opening statements are also true for violent crimes and that victim reports basically match arrest ratios on all of them. I think all three are blatantly obviously true, but somewhat impolite to state.
Violent crime is something that for understandable reasons catches attention more than say white collar crime. It causes greater psychological distress to notice it or suspect you are vulnerable to it. This translates into greater pressure on politicians to make laws against it and provide law enforcement more resources to target it.
Unless you believe that young black men in US are the most criminal group in history of the world, most of them who are in prisons must be innocent by pure statistics.
Or there is a significant selection bias in the quasi-random selection process known as "arresting people."
By the way, have you read Bernard Harcourt's research, which suggests that the "imprisonment" rate in the United States has been constant over time, provided that you count commitment for mental illness as imprisonment. Thus, the recent growth in prison population in the United States reflects the shrinkage in long-term involuntary commitment of the mentally ill. In other words, a lot of the restrictions on the extremely mentally ill that used to be "provided" by dedicated institutions (i.e. mental hospitals) are now "provided" by jails and prisons.
An adult with the behavior of a very young child would be declared irresponsible by psychiatrists, and thus couldn't be tried (but could be arrested and committed). It seems reasonable to apply this to children, and they are automatically committed to restrictive institutions already.
By "better than chance" do you mean whether when investigating e.g. a murder, the American police and legal system have more than P(1/population of America) of locating and punishing the actual guilty party?
How does the legal system normally deal with cases where someone has a chain of logic where each link seems strong but there are a dangerously large number of links? This seems like a special case of a more general issue that the court must face regularly.
Would an argument to the judge like "even if each of these reports comes from a person trying to do a good job in passing along the truth, there are too many places where any of these people could have made a simple error" stand a chance?
A few comments:
It is somewhat confusing (at least to legal readers) that you use legal terms in non-standard ways. Conflating confrontation with hearsay issues is confusing because making people available for cross-examination solves the confrontation problem but not always the hearsay one.
I like your emphasis on the filtering function of evidentiary rules. Keep in mind, however, that these rules have little effect in bench trials (which are more common than jury trials in state courts of general jurisdiction). And relatively few cases reach trial
I like the idea of capping the length of an admissible chain of hearsay, but whenever I hear about a rule like that, I always think of the risk that you'll miss an obviously true conclusion just because the evidence wasn't admissible. Of course, that's a silly argument, since we have lots of such limits and they're not something I disagree with.
The obvious solution to this entire debate is to teach people a basic understanding of practical probability, but I guess you work with what you've got...
Incidentally, is the title a deliberate play on "Lies, damn lies, and statistics"? I couldn't work it out.
Fascinating article! I have to confess that I don't know a lot about the legal system or how it works. It strikes me as the kind of field that would be both useful to know in some detail, and interesting to learn about. So "study the modern legal system" is somewhere on my list of "random personal research projects."
The explanation of the current system, and how to view it in a rationalist manner was really interesting.
The problem as you state it seems to be that the court (and people in general) have a tendency to evaluate each link in a chain separately. For instance, if there was one link with an 80% chance of being valid, both a court and a bayesian would say "ok, lets accept it provisionally for now", but if there's three or four links, a court might say "each individual link seems ok, so the whole chain is ok" but a Bayesian would say "t...
Thanks for this article. I now finally start to understand the sense behind the judge/jury system, which I always found a little strange (compared to just a qualified judge making the whole decision).
The "legal system" is concerned, above all else, that citizens regard its workings as legitimate, The appearance of inevitability promotes the sense of legitimacy, and any procedures that appear arbitrary interfere with it. Thus, the law would exclude all "hearsay within hearsay" before it would impose a three-level limit. Statistical evidence might show that three levels is optimal (or that some other cutoff is), but the provision's artificiality is patent. "I was treated unjustly because my evidence consisted of four levels of hearsay" sounds unjust because "arbitrary" limitations denude the law of evidence of the sense that it's natural.
Use A <-> (A ^ B ^ C) are reliable.
This threw me a little. Those ^ characters look a lot like the logical conjunction ("and") operator ∧, but they also look like the exclusive-or operator in C-like programming languages. For clarity, maybe spell this out in plain English: "Use A if and only if A and B and C are reliable."
Why can't they just say that each additional layer makes it weaker evidence? For example, hearsay is 50% as strong as seeing it., double hearsay is 25% as strong, triple hearsay is 12.5% as strong, etc.
http://hanson.gmu.edu/extraord.pdf
"Extraordinary claims require extraordinary evidence. But on uninteresting topics, surprising claims usually are surprising evidence; we rarely make claims without sufficient evidence. On interesting topics, however, we can have interests in exaggerating or downplaying our evidence, and our actions often deviate from our interests. In a simple model of noisy humans reporting on extraordinary evidence, we find that extraordinary claims from low noise people are extraordinary evidence, but such claims from high noise people are not; their claims are more likely unusual noise than unusual truth. When people are organized into a reporting chain, noise levels grow exponentially with chain length; long chains seem incapable of communicating extraordinary evidence."
Lawyers don't calculate probabilities, juries don't understand them, so exact numerical values are irrelevant.
Also I would say people don't like probabilistic arguments used in justice. Punishing someone for a high probability that they did something, feels very unfair. But in this universe, this is all we can have.
Would it feel fair to imprison someone because there is a 50% probability they did something wrong? How about 80%? 90%? 99%? People like to pretend that there are some magical values called "reasonable doubt" and "beyond a shadow of doubt" where probabilities stop being probabilities and become a separate magisterium.
We are not good at dealing with probabilities and what we intuitively seek is probably a social consensus -- if everyone important thinks the guy is guilty, then it is safe to punish him. We are trying to be more fair than this, and partially we are succeeding, and partially we are trying to do the impossible, because we can never get beyond probabilities. But there are huge inferential gaps that prevent explaining this in the court.
It seems it is ensuring at each link no one has motivation to report wrongly, rather than noone would mess up.
to see if normal people can handle it
"Evidence? You can't handle the evidence!"
Application of: How Much Evidence Does It Take?
(trigger warning: some description of domestic violence)
Summary: I discuss the strengths and weaknesses of one way that the American legal system tries to assess and cope with the unreliability of certain kinds of evidence. After explaining the relevant rules with references to a few recent famous cases and a non-notable case that I'm working on now, I briefly consider whether this part of the evidence code is above or below the sanity waterline, and suggest an incremental improvement.
Recently, I got to the point in my legal career where people are trusting me to write evidentiary briefs, i.e., to argue in front of a judge about what kinds of evidence are reliable enough to be safely presented to a jury. There is an odd division of epistemological labor in the American court system: judges are thought [page 90] to be better than juries at resisting passionate or manipulative oratory, and juries are thought to be better than judges at resisting bribery and (pre-existing) personal hatred. As a result, potentially inflammatory or unreliable evidence is presented first to a judge, who (much like one of Eliezer's Confessors) is supposed to sift the exhibit to see if normal people can handle it without losing their tenuous grip on sanity. If and only if the evidence seems safe for ordinary human consumption, the judge will allow the lawyers to argue about that evidence in front of the jury. Otherwise, the evidence sits in a cardboard box in an unheated warehouse, safely away from the eyes of the jury, until it's time for an appeal.
The Hearsay Rule
By way of a concrete example, one famous recent case featured a recorded 911 call made by a domestic violence victim to the emergency phone operator. The operator asked questions about the location and identity of the person who was accused of beating the caller. The caller answered the questions on tape, explicitly identifying her abuser as Mr. Adrian Martell Davis, and the answers were used first to find and arrest the suspect, and ultimately to convict him. The victim was apparently too intimidated to testify in open court, and so her recorded statement as to the name of her abuser was absolutely necessary to support a conviction -- no recording, no conviction. Under the 400-year-old hearsay rule, recorded testimony typically is not allowed to be presented to a jury -- courts are concerned that the person giving the recorded statement might be pressured by the police in ways that wouldn't show up on tape, and that allowing a witness to testify without showing up in court unfairly deprives the defendant of a chance to (a) cross-examine the witness, and (b) have the jury see any facial tics, body language, etc. that undercut the witness's credibility. In the 911 case, though, the Court faced a straight choice between finding an exception to the hearsay rule and letting an apparent abuser go free.
In making this choice, the US Supreme Court managed to ignore a variety of emotionally salient but epistemologically irrelevant distractions, such as the seriousness of the crime, the relative helplessness of the victim, and the respectability of the 911 operator. Instead, the Court focused on the purpose for which the 911 statements were obtained. If the statements were obtained to help gather information needed to safely resolve an ongoing emergency, they could be used at trial. If the statements, however, were obtained to gather information about a past event, they could *not* be used at trial.
The theory supporting this distinction seems to have been that the right to cross-examine and the right to have the jury see body language are fungible elements of a more general reliability test. A stranger's assertion, without more, could be true or could be false. It doesn't count as very much evidence. To turn an assertion into enough evidence to convict someone beyond a reasonable doubt, you need to show that the assertion comes with "indicia of reliability." Two of these indicia are cross-examination and body language -- if a story checks out despite a vigorous unfriendly interview and the peer pressure of having to tell the story while physically in the room with other people from your community, then that's pretty good evidence. But you might have reasons to believe a story even if you don't get cross-examination or body language. In the case of the 911 call, one might think that the caller had a strong motive to tell the truth, because if she didn't, then the police would go looking for the wrong guy, and her abuser would come find her and continue hurting her. Similarly, one might think that the operators had a strong motive to ask fair, non-leading questions, because of they didn't get the right answer, then the police might show up in the wrong neighborhood or with the wrong expectations, and there could be an unnecessary firefight. Finally, one could argue that a recorded statement made as events were unfolding is inherently more reliable (in some ways) than a narrative given months or years after the event; human memory gets corrupted faster than 8-track tapes.
Some combination of these factors convinced the Court to admit the evidence. Other, very similar cases have been decided differently. Whether they got that particular decision right or wrong, though, the framework of "indicia of reliability" is hard-coded into American evidence law, especially for civil cases. If you want to present evidence to a jury based on a statement that was made outside of court, you have to give at least one reason why the statement is nevertheless reliable.
Double and Triple Hearsay
Here's where things really get interesting: if your out-of-court statement quotes another out-of-court statement, the evidence is called "double hearsay," and you need to independently verify each statement. If any link in the chain breaks, the whole document gets excluded. For example, in the case I'm working on now, the defendants want to show the jury a report filled out by California's Occupational Health and Safety Administration ("OSHA"). The OSHA report is based almost entirely on an accident report form filled out by a private corporation. That report form, in turn, is based almost entirely on an informal interview of the only eyewitness to an accident. So the defendants can use the OSHA report if and only if the OSHA report, the accident report, and the informal interview are all reliable. Use A ↔ (A ∧ B ∧ C) are reliable.
To try to qualify the OSHA report, the defendants are arguing that the OSHA report is reliable under the public record exception to the hearsay rule, meaning that the public officials who prepared it had a stronger interest in accurately reporting public information than they did in the outcome of the accident victim's private case. To get the accident report form in, the defendants are arguing that it is reliable under the business record exception to the hearsay rule, meaning that the corporate officials who prepared it had a stronger interest in making sure their company had access to accurate information about safety risks than they did in the outcome of any one customer's lawsuit. As for the informal interview...well, I honestly have no idea how they plan to justify its reliability. But, then again, I'm biased. My professional interest lies in making sure that the whole string of unhelpful quotations stays in a cardboard box in a dank garage, far away from any juries.
Do the Rules Work?
So far, I've been pleasantly surprised at how well the American legal system handles some of these challenges. The fact that we have a two-tiered system of evaluating evidence at all is a cut above average -- imagine, e.g., the doctor who examines you taking notes on your condition, filtering out any subjective comments you make about how you're sure it's just a cold, and reporting only your objective symptoms to a second doctor, who then renders a diagnosis. Or imagine a team of business consultants who interview a Fortune 500 company's leadership team, and then pass their written notes back to a team at HQ (who has never met the executives) so that HQ can catch any obvious mistakes in reasoning before sending out recommendations. We know, intellectually, that meeting people tends to make us friendlier toward them and more likely to adopt their point of view even if we encounter no Bayesian evidence that increases the plausibility of their opinions, but our institutions rarely take steps to guard against that bias.
I think my biggest criticism of the American evidence code is that it doesn't account for uncertainty in the model. For instance, if I read the headline on a piece of science journalism saying that (e.g.) coffee consumption reduces the risk of prostate cancer, or that receiving spankings in childhood is negatively correlated with conscientiousness as an adult, there are least six layers of 'hearsay' -- I might have misunderstood the headline, the headline might have mis-summarized the article, the article might have misquoted the scientist, the scientist might have misinterpreted the recorded data, the recorded data might not faithfully reflect what actually happened during the experiment, and the experiment might not faithfully replicate the real-world conditions that interest us.
Even if I can articulate plausible reasons why each step in the transmission of information was "reliable," I should be very skeptical that my *model* of the transmission is accurate. I only have to be wrong about one of the six steps for my estimate of the information's plausibility to be untrustworthy. If the information would only provide a few decibels of evidence even if it were perfectly reliable, then trying to calculate how many points a semi-reliable piece of evidence is worth can fail because of a low signal-to-noise ratio. E.g., suppose I learn that neither the suspect nor the actual criminal were redheads - I might be absolutely certain of this new piece of information, but that's still nowhere near enough evidence to support a conviction. If instead I learn that there is probably something like a 60% chance that neither the suspect nor the criminal had red hair, that datum really doesn't tell me anything at all -- the info shouldn't shift my prior enough for my prior to be noticeably different.
Although courts are allowed to consider the extent to which an unduly long chain of inferences makes evidence less "trustworthy," I think that on balance decisions would be more accurate if there were a firm limit -- say, three layers -- beyond which evidence was simply inadmissible as a matter of law. If A says that B says that C says that D shot someone, then no matter how reliable we think A, B, and C are, we should probably keep that evidence away from the jury unless we can haul at least one of B, C, or D into court to answer cross-examination.