...Charles Honorton and his colleagues drew together all the forced-choice experimental precognition experiments reported in English between 1935 and 1987, publishing their findings in the December 1989 Journal of Parapsychology. The combined results were impressive: 309 studies contributed to by 62 senior authors and their associates, nearly two million individual trials made by more than 30,000 subjects. (In a properly conservative culling, all the experimental work of both Rhine's chosen but subsequently disgraced successor, Walter J. Levy, and S.G. Soal, once a famous specialist in time-displacement psi tests, was excluded; both were known to have cheated in at least some experiments.) Overall, the cumulation is highly significant - 30 percent of studies provided by 40 investigators were independently significant at the 5 percent level. Yet this was not due to a suspicious handful of successful researchers: 23 of the 62 (37 percent) found overall significant scoring.
By the same token, admittedly, this means 63 percent failed to show significant psi. But [...] [i]f one hundred studies are done, averaging as many as thirty-eight correct calls instead of the twenty-five due to chance, then, surprisingly, we should only expect to find among that one hundred "about 33 [statistically] significant studies ... and a 30% chance that there would be 30 or fewer!" Here's why: The scattergun variance that arises simply from chance would mask most of the extra correct calls. This fact would remain in force even if the responders were picking up their extra hits through hidden radio receivers rather than psi! It's just what happens with the statistics of phenomena that have low power. [...]
Well, could this 37 percent success rate be due to the "file drawer"? Hardly. Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis. It seems highly unlikely that such a trove of dull experiments exists [...] Nor were the results due to an excessive contribution from a few specialist parapsychologists doing so many precognition studies that their non-scoring rivals were swamped. Strikingly, if all the investigators "contributing more than three studies are eliminated, leaving 33 investigators, the combined z [number of standard deviations found] is still 6.00" - with an associated probability of chance coincidence of somewhat more than one in a billion.
The individual effect sizes were all over the place, so Honorton and his coauthor, Diane C. Ferrari, unceremoniously threw out all the studies with unusually large deviations from the mean. [...] "Outcomes remain highly significant. Twenty-five percent of the studies (62/248) show overall significant hitting at the 5% level." Maybe the quality of studies explains the persistance of apparent anomalies? [...] if anything, the significance of the results climbed as quality improved. [...] What's more, the "effect size" had persisted over more than fifty years. This measure compensates for the different sample sizes in various studies: technically, it divides the z score by the square root of the number of trials in each study.
-- Damien Broderick, Outside the Gates of Science
Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis.
Note that this is a bogus calculation: it says that if there was no publication bias, so that unpublished studies were just as likely to show positive results as published ones, then adding the stated number of chance studies would "dilute" the results below a threshold significance level. But of course the whole point of publication bias is th...
Parapsychologists are constantly protesting that they are playing by all the standard scientific rules, and yet their results are being ignored - that they are unfairly being held to higher standards than everyone else. I'm willing to believe that. It just means that the standard statistical methods of science are so weak and flawed as to permit a field of study to sustain itself in the complete absence of any subject matter.
— Eliezer Yudkowsky, Frequentist Statistics are Frequently Subjective
Imagine if, way back at the start of the scientific enterprise, someone had said, "What we really need is a control group for science - people who will behave exactly like scientists, doing experiments, publishing journals, and so on, but whose field of study is completely empty: one in which the null hypothesis is always true.
"That way, we'll be able to gauge the effect of publication bias, experimental error, misuse of statistics, data fraud, and so on, which will help us understand how serious such problems are in the real scientific literature."
Isn't that a great idea?
By an accident of historical chance, we actually have exactly such a control group, namely parapsychologists: people who study extra-sensory perception, telepathy, precognition, and so on.
There's no particular reason to think parapsychologists are doing anything other than what scientists would do; their experiments are similar to those of scientists, they use statistics in similar ways, and there's no reason to think they falsify data any more than any other group. Yet despite the fact that their null hypotheses are always true, parapsychologists get positive results.
This is disturbing, and must lead us to wonder how many positive results in real science are actually wrong.
The point of all this is not to mock parapsychology for the sake of it, but rather to emphasise that parapsychology is useful as a control group for science. Scientists should aim to improve their procedures to the point where, if the control group used these same procedures, they would get an acceptably low level of positive results. That this is not yet the case indicates the need for more stringent scientific procedures.
Acknowledgements
The idea for this mini-essay and many of its actual points were suggested by (or stolen from) Eliezer Yudkowsky's Frequentist Statistics are Frequently Subjective, though the idea might have originated with Michael Vassar.
This was originally published at a different location on the web, but was moved here for bandwidth reasons at Eliezer's suggestion.
Comments / criticisms
A discussion on Hacker News contained one very astute criticism: that some things which may once have been considered part of parapsychology actually turned out to be real, though with perfectly sensible, physical causes. Still, I think this is unlikely for the more exotic subjects like telepathy, precognition, et cetera.