From the original article:
A peer-reviewed, journal-published, replicated report is worth far more than what you see with your own eyes.
From the wiki page summary, quoted in this rerun post:
Publications in peer-reviewed scientific journals are more worthy of trust than what you detect with your own ears and eyes.
The word "replicated" seems to have disappeared from this paraphrasing, and that flips the paraphrased statement from true to false.
The word "replicated" seems to have disappeared from this paraphrasing, and that flips the paraphrased statement from true to false.
On the other hand the paraphrase also changed 'far more' to 'more' so technically it scrapes through. Even though the peer review process is only slightly better than chance it does add some value.
A single peer-reviewed paper is still not more evidence than one's own experiences. If a paper were published tomorrow, no matter how well peer-reviewed, saying, say, "chocolate is immediately lethal to humans" I would have ample reason to dismiss that, as I have seen many examples of people eating chocolate and not immediately dying. Were that paper replicated many times over, however, I'd have to start wondering about what was causing the discrepancy. But with one paper? Defy the data.
If the claim of the paper is so much extraordinary as your example, then it is very probably bullshit. But on average papers tend to be more reliable than one's own experiences. After all, there aren't many peer-reviewed papers about immediate lethality of chocolate out there, so the analogy is somewhat stretched. If a single paper claimed that chocolate increases the risk of dying from colon cancer by 5%, but all chocolate lovers you personally knew were absolutely healthy, would you also defy the data?
No, because without knowing how likely people are to die of colon cancer without eating chocolate, I would have no idea if that contradicted or confirmed my own experience. Which suggests to me that rather than being more reliable on average than one's own experience, the average paper is, in fact, talking about things that are outside the normal person's day to day experience. But in those rare cases when a single paper contradicts something I've seen myself, then I would have no problem at all in saying it's wrong.
It seems that we are using the phrase "one's own experience" in a different way. If I knew 100 people, 20 of whom ate much more chocolate than the rest, and out of those 20 noone had colon cancer, while five of the rest had one, I would say that my personal experience tells that chocolate consumption is anticorrelated with colon cancer. While you use "one's own experience" only to denote things which are really obvious.
The problem is that most people are far less cautious when creating hypotheses from own experience than you probably are. I have heard lots of statements roughly analogous to "although my doctor says otherwise, chocolate in fact cures common cold; normally it takes a weak to get rid of it, but last year I ate a lot of chocolate and was healthy in six days". Which is what the original article tries to warn against.
"While you use "one's own experience" only to denote things which are really obvious."
No, I use it to denote things I have experienced. For example, there is disagreement over whether vitamin C megadoses can help certain kinds of cancers. I've actually seen papers on both sides. However, had I only seen a single paper that said vitamin C doesn't help with cancer, I would have perfectly good grounds for dismissing it - because I have seen two people gain a significant number of QALYs from taking vitamin C when diagnosed with terminal, fast-acting, painful cancers. That's not a 'really obvious' statement - it's very far from an obvious statement - but "my grandfather is still alive, in no pain and walking eight miles a day, when six months ago he was given two months to live" is stronger evidence than a single unreplicated paper.
Is "my grandfather is still alive, in no pain and walking eight miles a day, when six months ago he was given two months to live" stronger evidence for vitamin C effectivity than a reviewed paper saying "we have conducted a study on 1000 patients with terminal cancer; the survival rate in the group treated with large doses of vitamin C was not greater than the survival rate in the control group"? If so, why?
It would depend on the methodology used. I have seen enough examples of horribly bad - not to say utterly fraudulent - science in medical journals that I would actually take publication in a medical journal of a single, unreplicated, study as being slight evidence against the conclusion it comes to. (As an example, the Mayo clinic published a study with precisely those results, claiming to be 'unable to replicate' a previous experiment, in the early 80s. Except that where the experiment they were trying to 'replicate' had used intravenous doses, they used oral ones. And used a lower dose. And spaced the doses differently. And ended the trial after a much shorter period.)
So my immediate conclusion would be "No they didn't" if I saw that result from a single paper. Because when you've seen people in agony, dying, and you've seen them walking around and healthy a couple of months later, and you see that happen repeatably, then that is very strong evidence. And something you can test yourself is always better than taking someone else's word for it.
However, if that study were replicated, independently, and had no obviously cretinous methodological flaws upon inspection, then it would be strong evidence. But if something I don't directly observe myself contradicts my own observations, then I will always put my own observations ahead of those of someone else.
I think this is actually a really dangerous meme, because it can be used to rationalize almost any belief. For example, a religious person could easily say, "Some claims are just too extraordinary! There's just no extraordinary evidence that could convince me that God doesn't exist; it just isn't going to happen." Many of the other ideas in the Sequences are powerful tools because they can't be applied to fake explanations. So while this post might be useful to people with a fair amount of rationality training, I think it's far too risky to give to people without any prerequisites.
Is there some level of evidence more convincing than study published in reputable, peer-reviewed journal with lots of replication?
I would say so. For instance, the claim that letting go of my pen in mid-air will cause it to fall onto the floor and not the ceiling. The above is backed up by so much evidence and in so many different ways* that it is even if a replicated study were to contradict it, I would continue to believe it (it's one of the few claims that really does deserve a subjective probability in the region of 99.9%).
I don't know about Yudkowsky, but the threshold for 'what would be enough evidence to convince me of the existence of ESP" lies somewhere between those two.
I initially took 'some claims are just too extraordinary' to be talking about the first level, while 'how to convince me that 2+2=3' talks about the second level, but maybe I'm being too charitable.
For evidence: * thousands of trials from my own experience, millions from other people, universal agreement from everyone I know or have heard, coherent scientific theory explaining how and why it happens etc...
(it's [that letting go of a pen will cause it to fall to the floor and not the ceiling] one of the few claims that really does deserve a subjective probability in the region of 99.9%).
Only three nines? This seems very underconfident. I would assign it more like twelve-nines confidence.
I was debating between that and 99.95% or even 99.99%, but 12 9s! Do you understand what that means! Do you really think that you cold make 1 trillion similar claims and only be wrong once? One trillion possible worlds, and it only goes up in one of them?
To put it differently, suppose Warren Buffet comes up to me, and suggests a game. I drop the pen, if it falls, I give him ten cents, otherwise, he gives me everything he owns (about $50 billion). By your estimate, he's ripping me off to the tune of about 5 cents, but I think I would accept the bet, which suggests my estimate is not as high.
Y'see, this sort of thing is why I distrust the betting model for extreme probabilities like this.
I mean, I would take that bet as well, but I'm pretty sure that my estimate under normal circumstances is closer to jimrandomh's... it's just that the offer itself is so outrageous that the circumstances stop being normal. I think my reasoning would be more along the lines of "what the hell, I'm willing to throw away ten cents just to get to tell this story," or possibly even "I no longer have any idea what's going on, but losing ten cents is no big deal and I'm curious to see what happens next."
Put another way: I very much doubt that my willingness to take that bet is the result of an EV calculation using my real prior probability of a randomly selected pen falling to the ground; therefore, I am disinclined to treat that willingness as significant evidence of that prior.
Would you take the offer if it was only $10? I wouldn't (I get to tell the story either way), which suggests my decision really does depend upon my prior.
Another interefering factor is the fact that my utility function is nowhere near monotonic in dollars, especially for large amount. I'd take a certainty of $1 million over a 5% chance of $1 billion, which suggests that $50 billion is worth a lot less than a trillion times as much as 5 cents to me, so my prior must be some way below 99.9999999999%
Bear in mind overconfidence bias, its very easy to get trigger happy with 9s, and forget that even 99.9% is very impressive in a world with as many unknowns and interfering factors as ours.
If it were Warren Buffet? Probably. "Warren Buffet offered me $50 billion if my pen falls to the ceiling" is a much cooler story than "Warren Buffet offered me $10 if my pen falls to the ceiling," but the latter is still easily worth ten cents.
OTOH, "Some guy on the Internet offered me $10 if my pen falls to the ceiling" is not so cool a story. I probably would turn that down.
Agreed that utility is radically nonmonotonic in dollars.
Agreed that it's easy to get overconfident with 9s. It's also easy to anchor on the integers.
All that said: roughly how many objects have you seen or otherwise had compelling experience of having been dropped in your lifetime, would you estimate? How many of those have fallen to the ground, and how many have floated to the ceiling? What happens to those numbers if we eliminating ones more theoretically likely to float than pens (e.g., helium balloons)?
At a guess, I'd say I've personally seen on the order of five thousand non-helium-balloon-like objects dropped in my life, and they've all fallen down. So just starting from there, I'd estimate a > 99.9998 chance that the next one will, too.
If I start additionally factoring in the additional evidentiary value of theoretical considerations, and the absence of other people's reports of such objects floating, and of stories people tell involving objects falling to the ground... man, the evidence starts to add up.
(We've established what I am, now we're haggling over the price...)
All that said: roughly how many objects have you seen or otherwise had compelling experience of having been dropped in your lifetime, would you estimate? How many of those have fallen to the ground, and how many have floated to the ceiling? What happens to those numbers if we eliminating ones more theoretically likely to float than pens (e.g., helium balloons)?
At a guess, I'd say I've personally seen on the order of five thousand non-helium-balloon-like objects dropped in my life, and they've all fallen down. So just starting from there, I'd estimate a > 99.9998 chance that the next one will, too.
This a good point, 99.9% probably is too low, although I'd be a little worried to go as high as 99.9998%. I'm not sure how you got that figure anyway, if you've seen five-thousand that means Laplace's law suggests about 99.98% doesn't it? I've probably seen a similar number (maybe a bit less), and I can remember one that failed to fall downward (although it was in very high winds so perhaps I should have been able to predict it).
The other evidence probably doesn't count for a Bayes factor of as much as 100:1, since by that point a good part of the remaining probability mass is concentrated in hypotheses like "pens always fall downwards except under this one specific rare circumstance" and "I am completely insane".
I distrust the betting model for extreme probabilities like this.
Can any of us really comprehend that much money? Frankly, I bet even Warren Buffet would have trouble; he has said after he gives away 99% of his wealth, he and his family will have all the money they will ever need. Does 100 times as much money as you'll ever need really feel different from 10x or 1000x?
The betting heuristic is useful because our intuitive sense for money has a much larger dynamic range than our intuitive sense of probability - it's easier to conceptualize "a subway ticket against a nice house" than "million to one odds", but "0.05 subway tickets against 25000 nice houses" doesn't really fit the way we think.
Do you really think that you cold make 1 trillion similar claims and only be wrong once?
No, I would run out of statements I was that confident in long before I reached a trillion.
To put it differently, suppose Warren Buffet comes up to me, and suggests a game. I drop the pen, if it falls, I give him ten cents, otherwise, he gives me everything he owns (about $50 billion). By your estimate, he's ripping me off to the tune of about 5 cents, but I think I would accept the bet, which suggests my estimate is not as high.
There are two problems with converting probability estimates like that into bets. First, there is more than a 1/10^12 chance of cheating in that game, by putting a strong magnet in the ceiling for example. That issue does not apply in non-game contexts. And second, utility is not linear in money over that interval; Warren Buffet would value a ten cent gain less than 1/10^12 as much as avoiding a $10^11 loss.
No, I would run out of statements I was that confident in long before I reached a trillion.
Nitpicking.
First, there is more than a 1/10^12 chance of cheating in that game, by putting a strong magnet in the ceiling for example.
You know that you're not cheating, and it doesn't seem likely that Buffet would cheat when doing so would make him less likely to win. Of course, maybe there's a 10^-10 chance that Buffet would go insane and cheat anyway, but can we just assume a least convenient possible world where we ignore those interfering issues.
Or come up with your own hypothetical if you don't like mine (you could use Omega instead of Buffet to eliminate cheating).
And second, utility is not linear in money over that interval; Warren Buffet would value a ten cent gain less than 1/10^12 as much as avoiding a $10^11 loss.
I don't care what Buffet values, the important thing is what I value, and think I actually value avoiding a ten cent loss a lot more than 10^-12 as much as achieving a $50billion gain.
No, I would run out of statements I was that confident in long before I reached a trillion.
Nitpicking.
Not at all. The feeling of impossibility of making trillion statements like that and never being wrong partly stems from our inability to conceive trillion distinct statements supported so much by evidence as the validity of gravitational laws. (The second reason for the intuition is imagining making mistake because of being tired of making one prediction after another.) Certainly there is far less than trillion independent statements of comparable trustworthiness that people can utter, which makes the general calibration approach to defining the subjective probability hard to use here.
You know that you're not cheating, and it doesn't seem likely that Buffet would cheat when doing so would make him less likely to win.
Someone else may put the magnet in the pen; this is the sort of concerns you cannot rule out in real life. Or perhaps Buffet is fed up with being rich and wants to award his possessions to some random person and do it in an unusual way. But I find most probable that your willingness to accept this very bet is caused by the same bias which makes people buy lottery tickets; except here you are able to rationalise the probabilities afterwards to justify your decision.
As for your estimate for pen falling down instead of up (no cheating assumed) being 99.99%: seriously? Does it mean that you expect the pen fall up once in every ten thousand trials? If 99.99% is your upper bound for probability in general (this may be a reasonable interpretation because the pen example is certainly one of the most certain predictions one can state), do you play any lottery where the jackpot is more than 10000 more worth the ticket?
Someone else may put the magnet in the pen; this is the sort of concerns you cannot rule out in real life. Or perhaps Buffet is fed up with being rich and wants to award his possessions to some random person and do it in an unusual way.
As I said, least convenient possible world.
As for your estimate for pen falling down instead of up (no cheating assumed) being 99.99%: seriously?
Firstly, as of my recent discussion with TOD my actual estimate is now more like 99.999% than 99.99%, but still way below 99.9999999999%. I do not assume no cheating in this estimate, I merely assume the amount of cheating you'd expect when idly tossing a pen in my room rather than the amount I'd expect when playing billion-dollar games with Warren Buffet.
Does it mean that you expect the pen fall up once in every ten thousand trials?
No. Most of that probability is concentrated in hypotheses where the pen never falls up. This is the difference between a Bayesian probability and a frequency.
I will point out though, that I have probably dropped fewer than 10000 pens in my life and one of them did go sideways (to the best I could tell there was no statistically significant vertical component to its movement), though I suppose I should have predicted that given the weather conditions at the time.
If 99.99% is your upper bound for probability in general (this may be a reasonable interpretation because the pen example is certainly one of the most certain predictions one can state), do you play any lottery where the jackpot is more than 10000 more worth the ticket?
Actually, this is one kind of claim I will happily assign probabilities like 99.9999999999% and higher to, which is the negation of a very specific claim (this is equivalent to the union of a huge number of other claims), for example:
"There is not currently a gang of 17643529 elephants, each painted with the national flag of Moldova, all riding the same giant unicycle around a 57km wide crater on the innermost planet of the 59th closest star to earth in the Andromeda galaxy".
Happy to go well above 99.9999999999% on that one, as there are easily 10^12 mutually exclusive alternatives, all at least as plausible (in addition, by allowing claims like this, but slightly more plausible, I bet I could find a trillion claims each about as plausible as the pen).
What is impressive about the pen, as well as about most scientific theories, is that it is a single, very specific, hypothesis which earned its high probability through accuracy rather than bulk.
I think you're definitely right about that; that would also explain why this post isn't in the Sequences.
I wonder what the base rate for insanity/fullblown visual hallucinations/etc. is?
And come to think of it, it's not necessarily what you perceive, but what you emit from other people's perspective, so you'd have to add up the base rates of things like Anton's syndrome as well.
A peer-reviewed, journal-published, replicated report is worth far more than what you see with your own eyes.
Including my viewing of the report itself? That would be silly. Later Eliezer says that the fact that it is a good idea to trust in science is "pragmatically true," but probably better to say it's a good rule of thumb. I agree with the spirit of the post, but it goes so far into the hyperbolic that it undermines some other aspects of rationality:
What about the claim that 2 + 2 = 5?
Science cannot prove a nonsensical claim. If I believe I read that as a proved result in an authoritative scientific paper, the probability that there was a miscommunication somewhere eclipses everything else. What about the claim "A and not-A"?
Today's post, Some Claims Are Just Too Extraordinary was originally published on 20 January 2007. A summary (taken from the LW wiki):
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was A Fable of Science and Politics, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.