gwern comments on Rationality Quotes May 2013 - Less Wrong

6 Post author: katydee 03 May 2013 08:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (387)

You are viewing a single comment's thread. Show more comments above.

Comment author: satt 08 May 2013 11:31:52PM 5 points [-]

If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn't even matter that it is positive.

I strongly disagree.

An old comment of mine gives us a counterexample. A couple of years ago, a meta-analysis of RCTs found that taking aspirin daily reduces the risk of dying from cancer by ~20% in middle-aged and older adults. This is very much a practically significant effect, and it's probably an underestimate for reasons I'll omit for brevity — look at the paper if you're curious.

If you do look at the paper, notice figure 1, which summarizes the results of the 8 individual RCTs the meta-analysis used. Even though all of the RCTs had sample sizes in the thousands, 7 of them failed to show a statistically significant effect, including the 4 largest (sample sizes 5139, 5085, 3711 & 3310). The effect is therefore "so small that a sample of several thousand is not sufficient to reliably observe it", but we would be absolutely wrong to infer that "it doesn't even matter that it is positive"!

The heuristic that a hard-to-detect effect is probably too small to care about is a fair rule of thumb, but it's only a heuristic. EHeller & Unnamed are quite right to point out that statistical significance and practical significance correlate only imperfectly.

Comment author: gwern 01 February 2014 12:45:04AM *  2 points [-]

tl;dr: NHST and Bayesian-style subjective probability do not mix easily.

Another example of this problem: http://slatestarcodex.com/2014/01/25/beware-mass-produced-medical-recommendations/

Does vitamin D reduce all-cause mortality in the elderly? The point-estimates from pretty much all of the various studies are around a 5% reduction in risk of dying for any reason - pretty nontrivial, one would say, no? Yet the results are almost all not 'statistically significant'! So do we follow Rolf and say 'fans of vitamin D ought to update on vitamin D not helping overall'... or do we, applying power considerations about the likelihood of making the hard cutoffs at p<0.05 given the small sample sizes & plausible effect sizes, note that the point-estimates are in favor of the hypothesis? (And how does this interact with two-sided tests - vitamin D could've increased mortality, after all. Positive point-estimates are consistent with vitamin D helping, and less consistent with no effect, and even less consistent with it harming; so why are we supposed to update in favor of no help or harm when we see a positive point-estimate?)

If we accept Rolf's argument, then we'd be in the odd position of, as we read through one non-statistically-significant study after another, decreasing the probability of 'non-zero reduction in mortality'... right up until we get the Autier or Cochrane data summarizing the exact same studies & plug it into a Bayesian meta-analysis like Salvatier did & abruptly flip to '92% chance of non-zero reduction in mortality'.