Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

benelliott comments on How Much Evidence Does It Take? - Less Wrong

34 Post author: Eliezer_Yudkowsky 24 September 2007 04:06AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: Anders_Sandberg 25 September 2007 10:23:39AM 2 points [-]

Yes, publication bias matters. But it also applies to the p<0.001 experiment - if we have just a single publication, should we believe that the effect is true and just one group has done the experiment, or that the effect is false and publication bias has prevented the publication of the negative results? If we had a few experiments (even with different results) it would be easier to estimate this than in the one published experiment case.

Comment author: benelliott 15 February 2011 01:59:39PM *  10 points [-]

Lets do a check. Assume a worst case scenario where nobody publishes false results at all.

To get three p < 0.05 studies if the hypothesis is false requires on average 60 experiments. This is a lot but is within the realms of possibility if the issue is one which many people are interested in, so there is still grounds for scepticism of this result.

To get one p < 0.001 study if the hypothesis is false requires on average 1000 experiments. This is pretty implausible, so I would be much happier to treat this result as an indisputable fact, even in a field with many vested interests (assuming everything else about the experiment is sound).

Comment author: wedrifid 15 February 2011 02:06:18PM 1 point [-]

To get one p < 0.0001 study if the hypothesis is false requires on average 1000 experiments

One too many zeros in the p value there. The 1,000 figure matches p<0.001, which is also what Anders mentioned. (So your point is fine.)

Comment author: benelliott 15 February 2011 02:07:34PM 0 points [-]


Comment author: Desrtopa 01 April 2012 03:48:03PM *  2 points [-]

This is assuming proper methodology and statistics so that the p-value actually matches the chance of the result arising by chance. In practice, since even your best judgment of the methodology is not going to account for certainty in the soundness of the experiment, I would say that a p-value of 0.001 constitutes considerably less than 10 bits of evidence, because the odds that something was wrong with the experiment are better than the odds that the results were coincidental. Multiple experiments with lower cumulative p-value can still be stronger evidence if they all make adjustments to account for possible sources of error.

Comment author: jkaufman 18 March 2017 02:38:50PM 0 points [-]

Running "1000 experiments" if you don't have to publish negative results, can mean just slicing data until you find something. Someone with a large data set can just do this 100% of the time.

A replication is more informative, because it's not subject to nearly as much "find something new and publish it" bias.