According to the New Scientist, Daryl Bern has a paper to appear in Journal of Personality and Social Psychology, which claims that the participants in psychological experiments are able to predict the future. A preprint of this paper is available online. Here's a quote from the New Scientist article:

In one experiment, students were shown a list of words and then asked to recall words from it, after which they were told to type words that were randomly selected from the same list. Spookily, the students were better at recalling words that they would later type.

In another study, Bem adapted research on "priming" – the effect of a subliminally presented word on a person's response to an image. For instance, if someone is momentarily flashed the word "ugly", it will take them longer to decide that a picture of a kitten is pleasant than if "beautiful" had been flashed. Running the experiment back-to-front, Bem found that the priming effect seemed to work backwards in time as well as forwards.

Question: even assuming the methodology is sound, given experimenter bias, publication bias and your priors on the existence of psi, what sort of p-values would you need to see in that paper in order to believe with, say, 50% probability that the effect measured is real?

New Comment
26 comments, sorted by Click to highlight new comments since:

"If you believe A => B, then you have to ask yourself: which do I believe more? A, or not B?"

-- Hal Daume III

I'm definitely going to use that in the future.

[-]ata110

To rip off Steven Kaas for a moment, I wonder if rather than saying "Study shows existence of psychic powers" it might be better to say "Nonexistence of psychic powers shows a study was wrong".

What sort of hypothetical evidence would convince you that psychic powers existed?

I'm not sure I can actually come up with any, because I know how frail human minds can be. My prior that I'm insane is higher than my prior that magic exists, and I can't think of any evidence for the second that isn't at least as strong evidence for the first.

(Here, I am assuming that by psychic powers we're talking about magic, rather than unarticulated intuition, which I believe can exist; I also expect I would adapt to being insane pretty quickly, and would react as if it were reality, but would expect it more likely that I'm gibbering in a mental institution than that I've been transported to Narnia.)

[-][anonymous]70

Paper criticizing the statistical analysis here:

http://www.ruudwetzels.com/articles/Wagenmakersetal_subm.pdf

From the conclusion:

In eight out of nine studies, Bem reported evidence in favor of precognition. As we have argued above, this evidence may well be illusory; in several experiments it is evident that Bem’s Exploration Method should have resulted in a correction of the statistical results. Also, we have provided an alternative, Bayesian reanalysis of Bem’s experiments; this alternative analysis demonstrated that the statistical evidence was, if anything, slightly in favor of the null hypothesis. One can argue about the relative merits of classical t-tests versus Bayesian t-tests, but this is not our goal; instead, we want to point out that the two tests yield very different conclusions, something that casts doubt on the conclusiveness of the statistical findings.

Having read about a third of Bem's paper, and about a third of the critical review mentioned here, I have to agree that the critics are right. This is an exploration study rather than a confirmation study, and as such would require a much higher standard of statistical significance before anyone at all skeptical would be forced to rethink their stance.

To answer the OP's question, I would want p<0.002 before I would say "It is probably either fraud or real ESP, rather than a statistical fluke."

To be fair, though, Bem was quite upfront about the exploratory nature of his methodology. His purpose, he claimed, was to invent experimental protocols that would be easy to carry out and easy to analyze. He is making the software that he used to control the experiment publicly available, and is apparentlly hoping that researchers in dozens of psych labs around the country will attempt to replicate his findings. If anyone takes him up on that, those studies will be confirmatory, not exploratory. And if enough of them can duplicate his results, even using Bem's statistical methods, then his results will be worth thinking about.

[-][anonymous]00

I think that paper conclusively shows that Bem's methods are incorrect; even if it doesn't, it was a really interesting read.

[-]Jack60

I would need considerably more than one study. That said, I think it is really good news this is getting published in a real journal. Parapsychologists have been publishing interesting results for years at strong enough levels that the publication bias would have to be really high to explain it. On the recommendation of someone here I read Outside the Gates over the summer which makes a moderately convincing case something weird is going on. I don't assign nearly the same credence to the results that the author does but it did convince me that mainstream science should be looking at it. At worst this will help force psychology to confront publication bias and some of the statistical issues that plague the field generally. And at least we should see some non-parapsychologists attempting to replicate this.

I've been meaning to write a book review.

Part of the issue is that when we do see events that look like psy they aren't ever at the p-values that would be conclusive. If there is something like psy it isn't that strong so you need replication.

It would also be good to get the studies out of the hands of the New Age crazies and into the hands of some reductionists who could go to work theorizing. Though of course the most likely explanation remains publication bias/fraud/methodological issues.

I'll look over the study later tonight. Thanks for posting it.

[-]Kevin-10

It seems like psi has been consistently understudied given the possibly profound consequences of understanding not-completely mundane psi. For one thing, I would expect it's a lot easier to build an FAI in a psi-enabled universe versus a no-psi universe.

For one thing, I would expect it's a lot easier to build an FAI in a psi-enabled universe versus a no-psi universe.

What's your line of thought?

what sort of p-values would you need to see in that paper in order to believe with, say, 50% probability that the effect measured is real?

P-values won't do it. Psi experiments consistently have high sounding significance because the trials are so large. Mass replication is what will make me believe.

That critique doesn't really work for t-tests though does it? Sure, as n increases so does your chance that the finding is statistically significant, but it also reduces the chance of the data being a fluke. If you flip a fair coin a million times holding a banana in your left hand and it comes up heads 55% of the time... there's some explaining to do. Even if the explanation is that it wasn't a fair coin.

Failures to set up or follow proper experimental procedures (giving hints, not fully random presentation, etc) or otherwise introducing a slight biasing effect will show an effect which is puny. With low n, this won't be statistically significant, but with high n it will appear very statistically significant.

That's true, statistical significance isn't the most sophisticated statistic. My rule of thumb is looking at the p and d values.

One experiment can't convince my of psi, the chance that they might have done the experiment badly in some way puts an upper bound on how much evidence it can provide. Only multiple independent verifications could do it, and even then I'd still feel massively confused.

His name is Bem by the way, as opposed to Bern (although I've seen that typo in multiple places).

[-]Jack10

Great name for a parapsychologist.

This has been discussed a few places on LW already. See for example here.

For this particular case I would also point people to the discussion here: http://news.ycombinator.com/item?id=1878160

This comment on HN, if true, seems pretty damning (emphasis added):

I went to Cornell and I'm one of the many students that participated in this guy's experiments (although not this particular one with the erotic pictures. I got regular pictures.)

I can tell you that every semester that I was there he was running a version of the "Are you psychic?" experiment. I'm sure he's been doing it every semester for a very long time. Undoubtedly there have been loads of experiments where it didn't pan out. (If you're curious about my results, I got 54% and a cheerful grad student greeted me after the fact by saying "congrats! you're psychic!")

The fact is, if you run an experiment like this enough times you are going to get a significant result eventually. That's why you have alpha values. If it's at .05, that means that 5% of the time you're going to get a false positive. I think that's what this is

Oh, wow. I read the article and the bit where he said "I waited for eight years so I'd have enough data to be sure it wasn't a fluke" sounded to me like it took him eight years to find a fluke big enough that it fell within the publishable p-value range - if this comment is true then he either doesn't understand statistics (bad), or is manipulating the statistics (very bad). One possibility is that he's doing this as a proof of concept that the p-value criteria is flawed: cognitive dissonance in academics trying to disbelieve a sound study showing psychic phenomenon would be a powerful force indeed to enact change.

what sort of p-values would you need to see in that paper in order to believe with, say, 50% probability that the effect measured is real?

A lot more than one study. But the other issue is, if the measured effect is real, the correct response is, "I see that I am confused," not, "They must have psychic powers!" Unexpected stuff happening in an experiment is a rather more probable prior than people having psychic powers. Moreover, you'd need very precise experiments to figure out exactly what's going on. This experiment may in fact show more about the invalidity of priming - if priming works in either direction, and people don't have psychic powers, then something other than the priming is probably responsible for their reactions.

It is a very, very common error in psychology to take limited experimental evidence with "statistical significance" and then infer extremely complicated and precise notions of human psychology from those, despite the fact that such precise explanations would require vastly more evidence. (A hypothetical example would be: Men respond to red letters at a faster rate than women, as compared with blue letters. Therefore, the color red must have had some specific importance in hunting during our evolutionary past. The latter observation is an arbitrarily privileged hypothesis with no real evidenciary support over thousands of other hypotheses. I've seen more absurd jumps, as well as people simply ignoring evidence that did not conform to their prior theory.)

Voted up because I agree that there's way too much inference from very limited psychological experiments.

I skimmed the paper's section on randomness and random number generators. I don't think they're making mistakes there.

More critique at the James Randi forums.