Follow-up on ESP study: "We don't publish replications"

CarlShulman

Some of you may remember past Less Wrong discussion of the Daryl Bem study, which claimed to show precognition, and was published with much controversy in a top psychology journal, JPSP. The editors and reviewers explained their decision by saying that the paper was clearly written and used standard experimental and statistical methods so that their disbelief in it (driven by physics, the failure to show psi in the past, etc) was not appropriate grounds for rejection.

Because of all the attention received by the paper (unlike similar claims published in parapsychology journals) it elicited a fair amount of both critical review and attempted replication. Critics pointed out that the hypotheses were selected and switched around 'on the fly' during Bem's experiments, with the effect sizes declining with sample size (a strong signal of data mining). More importantly, Richard Wiseman established a registry for advance announcement of new Bem replication attempts.

A replication registry guards against publication bias, and at least 5 attempts were registered. As far as I can tell, at the time of this post the subsequent replications have, unsurprisingly, failed to replicate Bem's results.¹ However, JPSP and the other high-end psychology journals refused to publish the results, citing standing policies of not publishing straight replications.

From the journals' point of view, this (common) policy makes sense: bold new claims will tend to be cited more and raise journal status (which depends on citations per article), even though this means most of the 'discoveries' they publish will be false despite their p-values. However, this means that overall the journals are giving career incentives for scientists to massage and mine their data for bogus results, but not to challenge bogus results by others. Alas.

¹A purported "successful replication" by a pro-psi researcher in Vienna turns out to be nothing of the kind. Rather, it is a study conducted in 2006 and retitled to take advantage of the attention on Bem's article, selectively pulled from the file drawer.

ETA: The wikipedia article on Daryl Bem makes an unsourced claim that one of the registered studies has replicated Bem.

ETA2: Samuel Moulton, who formerly worked with Bem, mentions an unpublished (no further details) failed replication of Bem's results conducted before Bem submitted his article (the failed replication was not mentioned in the article).

ETA3: There is mention of a variety of attempted replications at this blog post, with 6 failed replications, and 1 successful replication from a pro-psi researcher (not available online). It is based on this ($) New Scientist article.

ETA4: This large study performs an almost straight replication of Bem (same methods, same statistical tests, etc) and finds the effect vanishes.

ETA5: Apparently, the mentioned replication was again submitted to the British Journal of Psychology:

When we submitted it to the British Journal of Psychology, it was finally sent for peer review. One referee was very positive about it but the second had reservations and the editor rejected the paper. We were pretty sure that the second referee was, in fact, none other than Daryl Bem himself, a suspicion that the good professor kindly confirmed for us. It struck us that he might possibly have a conflict of interest with respect to our submission. Furthermore, we did not agree with the criticisms and suggested that a third referee be brought in to adjudicate. The editor rejected our appeal.

ETA: The wikipedia article on Daryl Bem makes an unsourced claim that one of the registered studies has replicated Bem.

ETA4: This large study performs an almost straight replication of Bem (same methods, same statistical tests, etc) and finds the effect vanishes.

ETA5: Apparently, the mentioned replication was again submitted to the British Journal of Psychology:

When we submitted it to the British Journal of Psychology, it was finally sent for peer review. One referee was very positive about it but the second had reservations and the editor rejected the paper. We were pretty sure that the second referee was, in fact, none other than Daryl Bem himself, a suspicion that the good professor kindly confirmed for us. It struck us that he might possibly have a conflict of interest with respect to our submission. Furthermore, we did not agree with the criticisms and suggested that a third referee be brought in to adjudicate. The editor rejected our appeal.

To put the Bem study in perspective, keep in mind that a hundred years ago, psychology wasn't even trying to use statistical methods; look at how Freud and Jung's ideas were viewed. Areas like sociology and psychology have if anything become more scientific over time. From that standpoint, a paper that uses statistics in a flawed fashion is indicative of how much progress the soft sciences have made in terms of being real sciences in that one needs bad stats to get bad ideas through rather than just anecdotal evidence.

That's not really true. Experimental, quantitative, and even fairly advanced statistical methods were definitely used in psychology a century ago. (As a notable milestone, Spearman's factor analysis that started the still ongoing controversy over the general factor of intelligence was published in 1904.) My impression is that ever since Wilhelm Wundt's pioneering experimental work that first separated psychology from philosophy in the late 19th century, psychology has been divided between quantitative work based on experiment and observation, which makes at least some pretense of real science, and quack soft stuff that's usually presented in a medical or ideological context (or some combination thereof). Major outbursts of the latter have happened fairly recently -- remember the awful "recovered memories" trend in the 1980s and 1990s (and somewhat even in the 2000s) and its consequences.

But more importantly, I'm not at all sure that the mathematization of soft fields has made them more scientific. One could argue that the contemporary standards for using statistics in soft fields only streamline the production of plausible-looking nonsense. Even worse, sometimes mathematization leads to pseudoscience that has no more connection to reality than mere verbal speculations and sophistries, but looks so impressive and learned that a common-sense criticism can be effectively met with scorn and stonewalling. As the clearest example, it appears evident that macroeconomics is almost complete quackery despite all the abstruse statistics and math used in it, and I see no evidence that the situation in other wannabe-exact soft fields is much better. Or to take another example, at one point I got intensely interested in IQ-related controversies and read a large amount of academic literature in the area -- eventually finding that the standards of statistics (and quantitative reasoning in general) on all sides in the controversy are just depressingly bad, often hiding awful lapses of reasoning that would be unimaginable in a real hard science behind a veneer of seeming rigor.

(And ultimately, I notice that your examples of recent discoveries are from biology, astronomy/physics, and math -- fields whose basic soundness has never been in doubt. But what non-trivial, correct, and useful insight has come from all these mathematized soft fields?)

This is a very good point. You make a compelling case that the use of careful statistics is not a recent trend in psychology. In that regard, my penultimate paragraph is clearly just deeply and irrecoverably wrong.

(And ultimately, I notice that your examples of recent discoveries are from biology, astronomy/physics, and math -- fields whose basic soundness has never been in doubt. But what non-trivial, correct, and useful insight has come from all these mathematized soft fields?)

Well, I was responding to Eliezer's claim about a general lack of a scien... (read more)

113

Follow-up on ESP study: "We don't publish replications"

113

113

113

Follow-up on ESP study: "We don't publish replications"

113

113