AnnaSalamon comments on Poll results: LW probably doesn't cause akrasia - Less Wrong

47 Post author: AnnaSalamon 16 November 2011 06:03PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread. Show more comments above.

Comment author: AnnaSalamon 16 November 2011 09:21:45PM 2 points [-]

Also, I'm not sure what to make of it, but it looks like the Anne question is not significantly correlated with the exercise question.

Well, the dataset was small, and the correlations were all fairly weak (which is normal in social science; high-validity tests are generally made from multiple questions, and the correlation between individual questions on e.g. an IQ test is I think similarly weak, though I don't remember). The Anne and exercise questions kind of correlated in the data set, but not at the significance level I was using as cutoff: c=.2, p=.12 (i.e., there is a 12% chance of seeing a correlation that strong by chance).

It's possible that both correlates these of caffeine were due to chance (p=.004 is significant in most contexts, but less so when one is comparing 25 questions against one another); but also possible that they are real, and I find I'm tempted to start using caffeine again; anyone want to do some literature searches and find corroborating or anti-corroborating evidence for us?

Comment author: steven0461 16 November 2011 11:55:53PM *  3 points [-]

there is a 12% chance of seeing a correlation that strong by chance

No, a 12% chance of seeing a correlation at least as strong. Confusion about p-values is endemic! Please be super-careful explaining what they mean! (Specifically, in this case, you don't want people thinking something like P(result | effect is real) = 1 and P(result | effect is false) = .12; I think that would be overstating the evidence.)

p=.004 is significant in most contexts, but less so when one is comparing 25 questions against one another

The issue isn't how many questions one is comparing, but rather what is the prior for this specific correlation. I don't think the prior for caffeine correlating with productivity is that low, and the .004 probably translates to pretty strong evidence. Of course, separately from that, you have to worry whether the correlation represents coffee -> productivity causation.

Comment author: AnnaSalamon 17 November 2011 01:12:42AM 4 points [-]

The issue isn't how many questions one is comparing, but rather what is the prior for this specific correlation.

Good clarifications; thanks. Also, I hadn't noticed that the p-value for the caffeine/exercise comparison was also small (p=.008); on reflection, I agree with you that one or both correlations is likely to be real. Maybe I really will start drinking coffee again. (Though note that caffeine did not correlate with self-reported procrastination levels, nor with income, nor happiness (which is otherwise a strong anti-procrastination predictor).

Huh; I think I was in fact making thinking errors here, even though I understood your points enough to have explained them many times to others. My thought had been that, while it would be better to directly estimate a prior if I could do so accurately, doing so would be hard for two reasons:

  1. Hindsight bias (plus the fact that I wrote no such priors down ahead of time);
  2. Lack of practice with questions generated in this manner. In daily life (or while playing calibration games with trivia cards), questions are selected so that quite a high frequency have both "yes" and "no" as answers. Given the prior one should have over a randomly selected such question, I am therefore usually hesitant to assign < a 5% probability to anything that doesn't make me think "no way could that possibly happen", because, when I've done calibration practice, I've found that that's what "only a 5% chance" feels like from the inside. But when one's questions are generated from a process of automatic comparisons rather than a process of deliberate conjecture, the priors can be lower.

I was hoping, by attention to the number of questions involved, to get a feel for the latter effect. But on reflection it seems you're right and I would have done better, in practice, to have thought about the odds of the kinds of comparisons I was actually running being true; hindsight bias or not, caffeine helping with System II overrides is clearly in a different hypothesis category than e.g. the sorts of compound / health value correlations that are sometimes mass-evaluated in medical research.