Yes, publication bias matters. But it also applies to the p<0.001 experiment - if we have just a single publication, should we believe that the effect is true and just one group has done the experiment, or that the effect is false and publication bias has prevented the publication of the negative results? If we had a few experiments (even with different results) it would be easier to estimate this than in the one published experiment case.

Comment author:benelliott
15 February 2011 01:59:39PM
*
10 points
[-]

Lets do a check. Assume a worst case scenario where nobody publishes false results at all.

To get three p < 0.05 studies if the hypothesis is false requires on average 60 experiments. This is a lot but is within the realms of possibility if the issue is one which many people are interested in, so there is still grounds for scepticism of this result.

To get one p < 0.001 study if the hypothesis is false requires on average 1000 experiments. This is pretty implausible, so I would be much happier to treat this result as an indisputable fact, even in a field with many vested interests (assuming everything else about the experiment is sound).

Comment author:Desrtopa
01 April 2012 03:48:03PM
*
2 points
[-]

This is assuming proper methodology and statistics so that the p-value actually matches the chance of the result arising by chance. In practice, since even your best judgment of the methodology is not going to account for certainty in the soundness of the experiment, I would say that a p-value of 0.001 constitutes considerably less than 10 bits of evidence, because the odds that something was wrong with the experiment are better than the odds that the results were coincidental. Multiple experiments with lower cumulative p-value can still be stronger evidence if they all make adjustments to account for possible sources of error.

Comment author:jkaufman
18 March 2017 02:38:50PM
0 points
[-]

Running "1000 experiments" if you don't have to publish negative results, can mean just slicing data until you find something. Someone with a large data set can just do this 100% of the time.

A replication is more informative, because it's not subject to nearly as much "find something new and publish it" bias.

## Comments (30)

OldYes, publication bias matters. But it also applies to the p<0.001 experiment - if we have just a single publication, should we believe that the effect is true and just one group has done the experiment, or that the effect is false and publication bias has prevented the publication of the negative results? If we had a few experiments (even with different results) it would be easier to estimate this than in the one published experiment case.

*10 points [-]Lets do a check. Assume a worst case scenario where nobody publishes false results at all.

To get three p < 0.05 studies if the hypothesis is false requires on average 60 experiments. This is a lot but is within the realms of possibility if the issue is one which many people are interested in, so there is still grounds for scepticism of this result.

To get one p < 0.001 study if the hypothesis is false requires on average 1000 experiments. This is pretty implausible, so I would be much happier to treat this result as an indisputable fact, even in a field with many vested interests (assuming everything else about the experiment is sound).

One too many zeros in the p value there. The 1,000 figure matches p<0.001, which is also what Anders mentioned. (So your point is fine.)

Thanks

*2 points [-]This is assuming proper methodology and statistics so that the p-value actually matches the chance of the result arising by chance. In practice, since even your best judgment of the methodology is not going to account for certainty in the soundness of the experiment, I would say that a p-value of 0.001 constitutes considerably less than 10 bits of evidence, because the odds that something was wrong with the experiment are better than the odds that the results were coincidental. Multiple experiments with lower cumulative p-value can still be stronger evidence if they all make adjustments to account for possible sources of error.

Running "1000 experiments" if you don't have to publish negative results, can mean just slicing data until you find something. Someone with a large data set can just do this 100% of the time.

A replication is more informative, because it's not subject to nearly as much "find something new and publish it" bias.