CarlShulman comments on Parapsychology: the control group for science - Less Wrong

62 Post author: AllanCrossman 05 December 2009 10:50PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (184)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 07 December 2009 09:51:56AM *  3 points [-]

...Charles Honorton and his colleagues drew together all the forced-choice experimental precognition experiments reported in English between 1935 and 1987, publishing their findings in the December 1989 Journal of Parapsychology. The combined results were impressive: 309 studies contributed to by 62 senior authors and their associates, nearly two million individual trials made by more than 30,000 subjects. (In a properly conservative culling, all the experimental work of both Rhine's chosen but subsequently disgraced successor, Walter J. Levy, and S.G. Soal, once a famous specialist in time-displacement psi tests, was excluded; both were known to have cheated in at least some experiments.) Overall, the cumulation is highly significant - 30 percent of studies provided by 40 investigators were independently significant at the 5 percent level. Yet this was not due to a suspicious handful of successful researchers: 23 of the 62 (37 percent) found overall significant scoring.

By the same token, admittedly, this means 63 percent failed to show significant psi. But [...] [i]f one hundred studies are done, averaging as many as thirty-eight correct calls instead of the twenty-five due to chance, then, surprisingly, we should only expect to find among that one hundred "about 33 [statistically] significant studies ... and a 30% chance that there would be 30 or fewer!" Here's why: The scattergun variance that arises simply from chance would mask most of the extra correct calls. This fact would remain in force even if the responders were picking up their extra hits through hidden radio receivers rather than psi! It's just what happens with the statistics of phenomena that have low power. [...]

Well, could this 37 percent success rate be due to the "file drawer"? Hardly. Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis. It seems highly unlikely that such a trove of dull experiments exists [...] Nor were the results due to an excessive contribution from a few specialist parapsychologists doing so many precognition studies that their non-scoring rivals were swamped. Strikingly, if all the investigators "contributing more than three studies are eliminated, leaving 33 investigators, the combined z [number of standard deviations found] is still 6.00" - with an associated probability of chance coincidence of somewhat more than one in a billion.

The individual effect sizes were all over the place, so Honorton and his coauthor, Diane C. Ferrari, unceremoniously threw out all the studies with unusually large deviations from the mean. [...] "Outcomes remain highly significant. Twenty-five percent of the studies (62/248) show overall significant hitting at the 5% level." Maybe the quality of studies explains the persistance of apparent anomalies? [...] if anything, the significance of the results climbed as quality improved. [...] What's more, the "effect size" had persisted over more than fifty years. This measure compensates for the different sample sizes in various studies: technically, it divides the z score by the square root of the number of trials in each study.

-- Damien Broderick, Outside the Gates of Science

Comment author: CarlShulman 16 March 2012 01:24:02AM *  1 point [-]

Honorton's estimate required fourty-six unreported chance-level experiments for each of those in the meta-study, including those that themselves gave no significant support for the paranormal hypothesis.

Note that this is a bogus calculation: it says that if there was no publication bias, so that unpublished studies were just as likely to show positive results as published ones, then adding the stated number of chance studies would "dilute" the results below a threshold significance level. But of course the whole point of publication bias is the enrichment of the file-drawer with negative results. See this paper by Scargle. You need far fewer studies in the file-drawer given the presence of bias. Further, various positive biases will be focused in the published literature, e.g. people doing outright fraud will normally do it for an audience.

The number of studies needed also collapses if various questionable research practices (optional stopping, post hoc reporting of subgroups as separate experiments, etc) are used to concentrate 'hits' into some experiments while misses can be concentrated in a small file drawer.

Parapsychologists counter that the few attempts to audit for unpublished studies (which would not catch everything) have not found large skew in the unpublished studies, but these inflated "fail-safe" statistics are misleadingly large regardless.