Parapsychology: the control group for science

AllanCrossman

Parapsychologists are constantly protesting that they are playing by all the standard scientific rules, and yet their results are being ignored - that they are unfairly being held to higher standards than everyone else. I'm willing to believe that. It just means that the standard statistical methods of science are so weak and flawed as to permit a field of study to sustain itself in the complete absence of any subject matter.

— Eliezer Yudkowsky, Frequentist Statistics are Frequently Subjective

Imagine if, way back at the start of the scientific enterprise, someone had said, "What we really need is a control group for science - people who will behave exactly like scientists, doing experiments, publishing journals, and so on, but whose field of study is completely empty: one in which the null hypothesis is always true.

"That way, we'll be able to gauge the effect of publication bias, experimental error, misuse of statistics, data fraud, and so on, which will help us understand how serious such problems are in the real scientific literature."

Isn't that a great idea?

By an accident of historical chance, we actually have exactly such a control group, namely parapsychologists: people who study extra-sensory perception, telepathy, precognition, and so on.

There's no particular reason to think parapsychologists are doing anything other than what scientists would do; their experiments are similar to those of scientists, they use statistics in similar ways, and there's no reason to think they falsify data any more than any other group. Yet despite the fact that their null hypotheses are always true, parapsychologists get positive results.

This is disturbing, and must lead us to wonder how many positive results in real science are actually wrong.

The point of all this is not to mock parapsychology for the sake of it, but rather to emphasise that parapsychology is useful as a control group for science. Scientists should aim to improve their procedures to the point where, if the control group used these same procedures, they would get an acceptably low level of positive results. That this is not yet the case indicates the need for more stringent scientific procedures.

Acknowledgements

The idea for this mini-essay and many of its actual points were suggested by (or stolen from) Eliezer Yudkowsky's Frequentist Statistics are Frequently Subjective, though the idea might have originated with Michael Vassar.

This was originally published at a different location on the web, but was moved here for bandwidth reasons at Eliezer's suggestion.

Comments / criticisms

A discussion on Hacker News contained one very astute criticism: that some things which may once have been considered part of parapsychology actually turned out to be real, though with perfectly sensible, physical causes. Still, I think this is unlikely for the more exotic subjects like telepathy, precognition, et cetera.

— Eliezer Yudkowsky, Frequentist Statistics are Frequently Subjective

Isn't that a great idea?

By an accident of historical chance, we actually have exactly such a control group, namely parapsychologists: people who study extra-sensory perception, telepathy, precognition, and so on.

This is disturbing, and must lead us to wonder how many positive results in real science are actually wrong.

Acknowledgements

This was originally published at a different location on the web, but was moved here for bandwidth reasons at Eliezer's suggestion.

Comments / criticisms

...Charles Honorton and his colleagues drew together all the forced-choice experimental precognition experiments reported in English between 1935 and 1987, publishing their findings in the December 1989 Journal of Parapsychology. The combined results were impressive: 309 studies contributed to by 62 senior authors and their associates, nearly two million individual trials made by more than 30,000 subjects. (In a properly conservative culling, all the experimental work of both Rhine's chosen but subsequently disgraced successor, Walter J. Levy, and S.G. Soal, once a famous specialist in time-displacement psi tests, was excluded; both were known to have cheated in at least some experiments.) Overall, the cumulation is highly significant - 30 percent of studies provided by 40 investigators were independently significant at the 5 percent level. Yet this was not due to a suspicious handful of successful researchers: 23 of the 62 (37 percent) found overall significant scoring.

By the same token, admittedly, this means 63 percent failed to show significant psi. But [...] [i]f one hundred studies are done, averaging as many as thirty-eight correct calls instead of the twenty-five due to chance, then, surprisingly, we should only expect to find among that one hundred "about 33 [statistically] significant studies ... and a 30% chance that there would be 30 or fewer!" Here's why: The scattergun variance that arises simply from chance would mask most of the extra correct calls. This fact would remain in force even if the responders were picking up their extra hits through hidden radio receivers rather than psi! It's just what happens with the statistics of phenomena that have low power. [...]

Well, could this 37 percent success rate be due to the "file drawer"? Hardly. Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis. It seems highly unlikely that such a trove of dull experiments exists [...] Nor were the results due to an excessive contribution from a few specialist parapsychologists doing so many precognition studies that their non-scoring rivals were swamped. Strikingly, if all the investigators "contributing more than three studies are eliminated, leaving 33 investigators, the combined z [number of standard deviations found] is still 6.00" - with an associated probability of chance coincidence of somewhat more than one in a billion.

The individual effect sizes were all over the place, so Honorton and his coauthor, Diane C. Ferrari, unceremoniously threw out all the studies with unusually large deviations from the mean. [...] "Outcomes remain highly significant. Twenty-five percent of the studies (62/248) show overall significant hitting at the 5% level." Maybe the quality of studies explains the persistance of apparent anomalies? [...] if anything, the significance of the results climbed as quality improved. [...] What's more, the "effect size" had persisted over more than fifty years. This measure compensates for the different sample sizes in various studies: technically, it divides the z score by the square root of the number of trials in each study.

-- Damien Broderick, Outside the Gates of Science

Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis.

Note that this is a bogus calculation: it says that if there was no publication bias, so that unpublished studies were just as likely to show positive results as published ones, then adding the stated number of chance studies would "dilute" the results below a threshold significance level. But of course the whole point of publication bias is th... (read more)

1LauraABJ16y

"Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis." Why is this at all unlikely? This is a 52 year span of time, and who knows how many times each of these (only 62) 'scientists' ran the trials or tweaked the procedure before they decided they had a set of data worth submitting. Who knows how many people looked for these phenomena, didn't find them, and gave up without submission? Even without outright fraud (which I wouldn't doubt), people lie to themselves. I've worked with scientists who had evidence that their previously obtained results were bunk and submitted them anyway... 'maybe the retest was flawed...' The significant effect that was found may just be the threshold at which an investigator needs to see (or fake) results to submit a paper. There's the answer to the question Allan originally posed... Also, on another note, not all 'forced choice' tests are conducted in the same way. Some of them involve the person looking at the card being in the same room as the guesser, and well, it's not hard to imagine ways of getting a score above chance like that.

3Jack16y

Would it really surprise anyone here if, say, 10 percent of parapsychologists are either rigging experiments, hiding negative results or falsifying data? 20%? Thirty-seven percent.

99

Parapsychology: the control group for science

99

Acknowledgements

Comments / criticisms

99

99

Parapsychology: the control group for science

99

Acknowledgements

Comments / criticisms

99