Related to: Parapsychology: the control group for science, Dealing with the high quantity of scientific error in medicine

Some of you may remember past Less Wrong discussion of the Daryl Bem study, which claimed to show precognition, and was published with much controversy in a top psychology journal, JPSP. The editors and reviewers explained their decision by saying that the paper was clearly written and used standard experimental and statistical methods so that their disbelief in it (driven by physics, the failure to show psi in the past, etc) was not appropriate grounds for rejection. 

Because of all the attention received by the paper (unlike similar claims published in parapsychology journals) it elicited a fair amount of both critical review and attempted replication. Critics pointed out that the hypotheses were selected and switched around 'on the fly' during Bem's experiments, with the effect sizes declining with sample size (a strong signal of data mining). More importantly, Richard Wiseman established a registry for advance announcement of new Bem replication attempts.

A replication registry guards against publication bias, and at least 5 attempts were registered. As far as I can tell, at the time of this post the subsequent replications have, unsurprisingly, failed to replicate Bem's results.1 However, JPSP and the other high-end psychology journals refused to publish the results, citing standing policies of not publishing straight replications.

From the journals' point of view, this (common) policy makes sense: bold new claims will tend to be cited more and raise journal status (which depends on citations per article), even though this means most of the 'discoveries' they publish will be false despite their p-values. However, this means that overall the journals are giving career incentives for scientists to massage and mine their data for bogus results, but not to challenge bogus results by others. Alas.

 


 

A purported  "successful replication" by a pro-psi researcher in Vienna turns out to be nothing of the kind. Rather, it is a study conducted in 2006 and retitled to take advantage of the attention on Bem's article, selectively pulled from the file drawer.

ETA: The wikipedia article on Daryl Bem makes an unsourced claim that one of the registered studies has replicated Bem.

ETA2: Samuel Moulton, who formerly worked with Bem, mentions an unpublished (no further details) failed replication of Bem's results conducted before Bem submitted his article (the failed replication was not mentioned in the article).

ETA3: There is mention of a variety of attempted replications at this blog post, with 6 failed replications, and 1 successful replication from a pro-psi researcher (not available online). It is based on this ($) New Scientist article.

ETA4: This large study performs an almost straight replication of Bem (same methods, same statistical tests, etc) and finds the effect vanishes.

ETA5: Apparently, the mentioned replication was again submitted to the British Journal of Psychology:

When we submitted it to the British Journal of Psychology, it was finally sent for peer review. One referee was very positive about it but the second had reservations and the editor rejected the paper. We were pretty sure that the second referee was, in fact, none other than Daryl Bem himself, a suspicion that the good professor kindly confirmed for us. It struck us that he might possibly have a conflict of interest with respect to our submission. Furthermore, we did not agree with the criticisms and suggested that a third referee be brought in to adjudicate. The editor rejected our appeal.

New Comment
50 comments, sorted by Click to highlight new comments since: Today at 11:50 AM

I'm at a loss of words at the inanity of this policy.

It is a policy that doesn't just exist in psychology. Some journals in other fields have similar policies requiring that the work include something more than just a replication of the study in question, but my impression is that this is much more common in the less rigorous areas like psychology. Journals probably do this because they want to be considered cutting edge and they get less of that if they publish replication attempts. Given that it makes some sense to reject both successful and unsuccessful replications, since if one only included unsuccessful replications then there would be a natural publication bias. So they more or less successfully fob the whole thing off on other journals. (There's something like an n player's prisoner dilemma here with journals as the players trying to decide if they accept replications in general.) So this is bad, but it is understandable when one remembers that journals are driven by selfish, status-driven humans, just like everything else in the world.

Yes, this is a standard incentives problem. But one to keep in mind when parsing the literature.

What rules of thumb do you use to 'keep this in mind'? I generally try to never put anything in my brain that just has one or two studies behind it. I've been thinking of that more as 'it's easy to make a mistake in a study' and 'maybe this author has some bias that I am unaware of', but perhaps this cuts in the opposite direction.

Actually, even with many studies and a meta-analysis, you can still get blindsided by publication bias. There are plenty of psi meta-analyses showing positive effects (with studies that were not pre-registered, and are probably very selected), and many more in medicine and elsewhere.

If it's something I trust an idiot to make the right conclusion on with good data, I'll look for meta-analyses, p<<0.05, or do a quick and dirty meta analysis myself if the number of studies is sufficiently small. If it's something I'm surprised has even been tested, I'll give one study more weight. If it's something that I'd expect to be tested a lot, I'd give it less. If the data I'm looking for is orthogonal to the data they're being published for, it probably doesn't suffer from selection bias so I'll take it at face value. If the studies result is 'convenient' in some way for the source that showed it to me, I'll be more skeptical of selection bias and misinterpretation.

If it's a topic where I see very easy to make methodological flaws or interpretation errors, then I'll try to actually dig in and look for them and see if there's a new obvious set of conclusions to draw.

Separately from determining how strong the evidence is, I'll try to 'put it in my brain' if there's only a study or two if it's testing a hypothesis I already suspected of being true, or if it makes too much sense in hindsight (aka high priors), or put it in my brain with a 'probably untrue but something to watch out for' tag otherwise.

How much money do you think it would take to give replications a journal with status on par with the new-studies-only ones?

Or alternately, how much advocacy of what sort? Is there someone in particular to convince?

It's not something you can simply buy with money. It's about getting scientists to cite papers in the replications journal.

What about influencing high-status actors (e.g. prominent universities)? I don't know what the main influence points are for an academic journal, and I don't know what things it's considered acceptable for a university to accept money for, but it seems common to endow a professorship or a (quasi-academic) program.

Probably this method would cost many millions of dollars, but it would be interesting to know the order of magnitude required.

It's clear that the incentives for journals are terrible. We should be looking to fix this. We seem to have a Goodhart's Law problem, where credibility is measured in citations, but refutations count in the wrong direction. Right now, there are a bunch of web sites that collect abstracts and metadata about citations, but none of them include commenting, voting, or any sort of explicit reputation system. As a result, discussions about papers ends up on blogs like this one, where academics are unlikely to ever see them.

Suppose we make an abstracts-and-metadata archive, along the lines of CiteSeer, but with comments and voting. This would give credibility scores, similar to impact ratings, but also accounting for votes. The reputation system could be refined somewhat beyond that (track author credibility by field and use it to weight votes, collect metadata about what's a replication or refutation, etc.)

Academics know, or at least ought to know, that most new publications are either wrong or completely uninteresting. This is the logical side-effect from the publish-or-perish policy in academia and the increase of PhD students worldwide. Estimated 1.346 million papers per year are published in journals alone[1]. If humanity produced interesting papers at that rate scientific progress would go a lot quicker!

So if it's true that most publications are uninteresting and if it's true that most academics have to publish at a high rate in order to protect their career and send the right signals we don't want to punish and humiliate academics for publishing stupid ideas or badly executed experiments. And when you publish a paper that demonstrates the other party did a terrible job it does exactly that. The signal to noise ratio in academic journals wouldn't increase by much but suddenly academics can simply reach their paper quota by picking the ideas of other academics apart. You'd get an even more poisonous environment as a result!

In our current academic environment (or at least my part of it) most papers without a large number of citations are ignored. A paper without any citations is generally considered such a bad source that it's only one step up from wikipedia. You can cite it, if you must, but you better not base your research on it. So in practice I don't think it's a big deal that mistakes aren't corrected and that academics typically aren't expected to publicly admit that they were wrong. It's just not necessary.

A paper without any citations is generally considered such a bad source that it's only one step up from wikipedia. You can cite it, if you must, but you better not base your research on it. So in practice I don't think it's a big deal that mistakes aren't corrected and that academics typically aren't expected to publicly admit that they were wrong. It's just not necessary

Suppose the paper supposedly proves something that lots of people wish was true. Surely it is likely to get an immense number of citations.

For example,the paper supposedly proves that America always had strict gun control, or that the world is doomed unless government transfers trillions of dollars from group A to group B, by restricting the usage of evil substance X, where group A tends to have rather few academics, and group B tends to have rather a lot of academics.

So if it's true that most publications are uninteresting and if it's true that most academics have to publish at a high rate in order to protect their career and send the right signals we don't want to punish and humiliate academics for publishing stupid ideas or badly executed experiments. And when you publish a paper that demonstrates the other party did a terrible job it does exactly that. The signal to noise ratio in academic journals wouldn't increase by much but suddenly academics can simply reach their paper quota by picking the ideas of other academics apart

Surely it's better to have academics picking apart crap than producing crap.

Not necessarily. Ignoring crap may be a better strategy than picking it apart.

Cooperation is also easier when different groups in the same research area don't try too hard to invalidate each other's claims. If the problem in question is interesting you're much better off writing your own paper on it with your own claims and results. You can dismiss the other paper with a single paragraph: "Contrary to the findings of I.C. Wiener in [2] we observe that..." and leave it at that.

The system is entirely broken but I don't see an easy way to make it better.

A paper without any citations is generally considered such a bad source that it's only one step up from wikipedia. You can cite it, if you must, but you better not base your research on it.

If this were true how would anyone ever get the first citation?

(Incidentally in my own field, there are a lot of papers that don't get cited. It isn't because the papers are wrong (although some very small fraction of them have that problem) but that they just aren't interesting. But math is very different from most other fields.)

If this were true how would anyone ever get the first citation?

Some papers (those written by high status authors) are ones that everyone knows will get citations soon after they are published, and so they feel safe in citing them since others are soon to do so. Self-fulfilling prophecy.

If this were true how would anyone ever get the first citation?

Because the policy wasn't applied until after a cutoff date, so the recursion bottoms out at an author from before the cutoff. Obviously. Edit: Non-obviously. Edit2: HOW AM I SUPPOSED TO END THIS COMMENT FOR YOU MEATBAGS NOT TO VOTE ME DOWN???

Edit2: HOW AM I SUPPOSED TO END THIS COMMENT FOR YOU MEATBAGS NOT TO VOTE ME DOWN???

I don't know, but I'm pretty sure that's not it.

I think your comment is getting voted down because it doesn't actually answer the issue in question. It does allow there to be a set of citable papers, but it doesn't deal with the actual question which is how any given paper would ever get its first citation.

Yes, it does, because paper B, from after the cutoff, cites a cite-less paper A, from before the cutoff. Then a paper C can cite B (or A), as B cites a previous paper, and A is from a time for which the standard today is not applied. (Perhaps I wasn't clear that the cutoff also applies to citable papers -- papers from before the cutoff don't themselves need citations in them to be citable.)

Edit: Also, papers from before the cutoff cited other prior papers.

It's not citing but being cited, I think. So if A and B are both before the cutoff, and A cites B, then C from after the cutoff can cite B (but not necessarily A).

Personally I thought it was a good comment even before the edit.

If this were true how would anyone ever get the first citation?

Zed didn't say you should never cite a previously uncited paper, only that you shouldn't invest time and effort into work that depends on the assumption that its conclusions are sound. There are many possible reasons why you might nevertheless want to cite it, and perhaps even give it some lip service.

Especially if it's your own.

Self-citations are usually counted separately (both for formal purposes and in informal assessments of this sort).

I'm confused. I parsed this as "papers which contain no citations are considered bad sources," but it seems that everyone else is parsing it as "papers which have not been cited are considered bad sources." Am I making a mistake here? The latter doesn't make much sense to me, but Zed hasn't stepped in to correct that interpretation.

Look at the context of the first two paragraphs and the comment that Zed was replying to. The discussion was about how many papers never get cited at all. In that context, he seems to be talking about people not citing papers unless they have already been cited.

It's not clear to me that he was talking about studies being ignored because they're not interesting enough to cite, rather than studies being ignored because they're not trustworthy enough to cite.

In any case, I think both are dubious safety mechanisms. John Ioannidis found that even most the most commonly cited studies in medical research are highly likely to be false. If researchers are basing their trust in studies on the rate at which they're cited, they're likely to be subject to information cascades, double counting the information that led other researchers to cite the same study.

In the case of only citing papers that contain numerous citations, this is helpful if the papers contain many redundant citations, demonstrating that the factual claims have been replicated, but if a paper relies on many uncertain findings, then its own uncertainty will be multiplied. The conclusion is at most as strong as its weakest link.

If this were true how would anyone ever get the first citation?

I think that you can cite it if you really think that it's good (and perhaps this is what Zed meant by "if you must"), but you'd better be citing something more widely accepted too. Then if lots of people think that it's really good, it will join the canon of widely cited papers.

Also, people outside of the publish-or-perish framework (respected elders, hobbyists outside of the research track, grad students in some fields) can get the ball rolling.

One idea to compensate for such effects: journals dedicated to negative results. Here is one I know of in the computer science field, with links to more.

I got an advertisement for the All Results Journals the other day.

The problem isn't so much about being published but about being published in a journal with a decent impact factor. It's likely that your journal for negative results gets few citations and there has a low impact factor.

A better way for JPSP to have handled Bem's paper would be to invite comments from reputable scholars before publishing it, and then print the comments (and replies from Bem et al.) in the same issue. Other journals have followed this strategy with particularly controversial articles.

As it stands now, JPSP (the premier social psych journal) just looks ridiculous.

I believe they did publish a rebuttal in the same issue, but that didn't allow the time needed for replications.

As I cynically comment on the DNB ML: 'Summary: Bem proves ESP using standard psychology methods; prestigious journal refuses, in a particularly rude fashion, to publish both a failure and a successful replication. You get what you pay for; what are you paying for here?'

This is short and linky but I suspect it belongs in Main for increased audience. Upvoted, as it should be.

Thanks for the reminder to upvote; it didn't occur to me to do so, because this news (about refusal to publish replications) annoyed me, and upvoting is associated with positive affect. Oops!

A former professor and co-author of mine has a paper about publication bias in PLoS Medicine: http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0050201

He has a number of suggestions for fixing things, but the main thrust appears to be that in a digital world, there is no longer any reason for journals to only publish papers that are "interesting" as well as methodologically decent, and they have no excuse not to adopt a policy of publishing all papers that appear correct. But he has a number of other suggestions too.