Paper criticizing the statistical analysis here:
http://www.ruudwetzels.com/articles/Wagenmakersetal_subm.pdf
From the conclusion:
In eight out of nine studies, Bem reported evidence in favor of precognition. As we have argued above, this evidence may well be illusory; in several experiments it is evident that Bem’s Exploration Method should have resulted in a correction of the statistical results. Also, we have provided an alternative, Bayesian reanalysis of Bem’s experiments; this alternative analysis demonstrated that the statistical evidence was, if anything, slightly in favor of the null hypothesis. One can argue about the relative merits of classical t-tests versus Bayesian t-tests, but this is not our goal; instead, we want to point out that the two tests yield very different conclusions, something that casts doubt on the conclusiveness of the statistical findings.
Having read about a third of Bem's paper, and about a third of the critical review mentioned here, I have to agree that the critics are right. This is an exploration study rather than a confirmation study, and as such would require a much higher standard of statistical significance before anyone at all skeptical would be forced to rethink their stance.
To answer the OP's question, I would want p<0.002 before I would say "It is probably either fraud or real ESP, rather than a statistical fluke."
To be fair, though, Bem was quite upfront about th...
According to the New Scientist, Daryl Bern has a paper to appear in Journal of Personality and Social Psychology, which claims that the participants in psychological experiments are able to predict the future. A preprint of this paper is available online. Here's a quote from the New Scientist article:
Question: even assuming the methodology is sound, given experimenter bias, publication bias and your priors on the existence of psi, what sort of p-values would you need to see in that paper in order to believe with, say, 50% probability that the effect measured is real?