Parapsychologists are constantly protesting that they are playing by all the standard scientific rules, and yet their results are being ignored - that they are unfairly being held to higher standards than everyone else. I'm willing to believe that. It just means that the standard statistical methods of science are so weak and flawed as to permit a field of study to sustain itself in the complete absence of any subject matter.
— Eliezer Yudkowsky, Frequentist Statistics are Frequently Subjective
Imagine if, way back at the start of the scientific enterprise, someone had said, "What we really need is a control group for science - people who will behave exactly like scientists, doing experiments, publishing journals, and so on, but whose field of study is completely empty: one in which the null hypothesis is always true.
"That way, we'll be able to gauge the effect of publication bias, experimental error, misuse of statistics, data fraud, and so on, which will help us understand how serious such problems are in the real scientific literature."
Isn't that a great idea?
By an accident of historical chance, we actually have exactly such a control group, namely parapsychologists: people who study extra-sensory perception, telepathy, precognition, and so on.
There's no particular reason to think parapsychologists are doing anything other than what scientists would do; their experiments are similar to those of scientists, they use statistics in similar ways, and there's no reason to think they falsify data any more than any other group. Yet despite the fact that their null hypotheses are always true, parapsychologists get positive results.
This is disturbing, and must lead us to wonder how many positive results in real science are actually wrong.
The point of all this is not to mock parapsychology for the sake of it, but rather to emphasise that parapsychology is useful as a control group for science. Scientists should aim to improve their procedures to the point where, if the control group used these same procedures, they would get an acceptably low level of positive results. That this is not yet the case indicates the need for more stringent scientific procedures.
Acknowledgements
The idea for this mini-essay and many of its actual points were suggested by (or stolen from) Eliezer Yudkowsky's Frequentist Statistics are Frequently Subjective, though the idea might have originated with Michael Vassar.
This was originally published at a different location on the web, but was moved here for bandwidth reasons at Eliezer's suggestion.
Comments / criticisms
A discussion on Hacker News contained one very astute criticism: that some things which may once have been considered part of parapsychology actually turned out to be real, though with perfectly sensible, physical causes. Still, I think this is unlikely for the more exotic subjects like telepathy, precognition, et cetera.
The idea is good. I'm afraid that it may be interpreted as meaning that we need to increase our publication standards from 95% confidence intervals to 98% confidence intervals. I think scientists already have a dangerously strong bias to reject anything that fails to meet a 95% confidence interval. If someone has a good idea, with good theoretical reasoning behind it; and they run some experiments but don't hit 95%, it's still worth considering.
There are also all sorts of data-collection tasks which are routinely thrown out if they fall below 95% confidence, when they shouldn't be. People doing any sort of genomics work routinely fail to report gene associations at less than 95% confidence. The fact is that, when we're taking millions of pieces of data and putting them into a computer program to compute reliability scores, ALL data should be saved and used. Most of the information scientists produce is in the large mass of low-confidence predictions. There is much more information in 100,000 50%-confidence predictions than in a dozen 95%-confidence predictions.
I agree that all data should be saved, and that there's much more information in 100,000 50%-confidence predictions than in a dozen 95%-confidence predictions. But ask a biologist which they'd prefer (ETA: I have actually done this, more or less) and they'll take the dozen 95%-confidence predictions, because they're just going to turn around and use bog-standard low-throughput experimental techniques to dig deeper. From the biologists' decision theory perspective, false positives are a lot more costly than false negatives.