gwern comments on Open Thread, January 16-31, 2013 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (221)
I don't think it works in the sense of refuting the earlier results by Ioannidis etc.
Remember that much of that previous work is based on looking at replication rates and changes as sample sizes increase - so actually empirical in a meaningful way.
This simply aggregates all p-values, takes them at face value, and tries to infer what the false positive rate 'should' be. It doesn't seem to account in any way for the many systematic errors involved or biases or problems in the process, only covers false positives and not false negatives (so ignores issues of statistical power, which is a serious problem in psychology, anyway, although I think medical trials are better powered).
I'd take their estimate of a 17% false positive rate as a lower bound.
I also question some other aspects; for example, they dismiss the idea that the false positive rate is increasing because it hits p=0.18 - but if you look at pg11, every journal sees a net increase in false positive rates from the beginning of their sample to the end, although there's enough variation that the beginning/end difference doesn't hit 0.05. So there is a clear trend here, and I have to wonder: if they looked at more than 5 journals over a decade, would the extra data make it hit significance? (A 0.5% increase each year is very troubling, since that implies very bad things for the long-term.)
I liked their data collection strategy, though; scraping - not just for hackers!
Yep, I agree. This is definitely an (optimistic) lower limit. Good that these studies are gaining attention, though a systemic change would be needed to get us out of this.