(An idea I had while responding to this quotes thread)
"Correlation does not imply causation" is bandied around inexpertly and inappropriately all over the internet. Lots of us hate this.
But get this: the phrase, and the most obvious follow-up phrases like "what does imply causation?" are not high-competition search terms. Up until about an hour ago, the domain name correlationdoesnotimplycausation.com was not taken. I have just bought it.
There is a correlation-does-not-imply-causation shaped space on the internet, and it's ours for the taking. I would like to fill this space with a small collection of relevant educational resources explaining what is meant by the term, why it's important, why it's often used inappropriately, and the circumstances under which one may legitimately infer causation.
At the moment the Wikipedia page is trying to do this, but it's not really optimised for the task. It also doesn't carry the undercurrent of "no, seriously, lots of smart people get this wrong; let's make sure you're not one of them", and I think it should.
The purpose of this post is two-fold:
Firstly, it lets me say "hey dudes, I've just had this idea. Does anyone have any suggestions (pragmatic/technical, content-related, pointing out why it's a terrible idea, etc.), or alternatively, would anyone like to help?"
Secondly, it raises the question of what other corners of the internet are ripe for the planting of sanity waterline-raising resources. Are there any other similar concepts that people commonly get wrong, but don't have much of a guiding explanatory web presence to them? Could we put together a simple web platform for carrying out this task in lots of different places? The LW readership seems ideally placed to collectively do this sort of work.
Awesome idea.
As far as I understand it, if variables A and B are correlated, then we can be pretty damn sure that either:
(Am I right about this or is this an oversimplification?)
A good way to grab attention might be to deny a commonly believed fact in a way that promises intelligent elaboration. So the website could start with a huge 'Correlation does not imply causation' banner and then go like 'well, actually, it kind of does'. And then explain how going from not knowing anything at all to knowing that one of three causal hypotheses is correct is pretty damn informative even if we don't immediately know which of the hypotheses is correct.
Then it would probably be useful to go all Bayesian and talk about priors, Ockham's razor and how it's a rare situation where we cannot distinguish between hypotheses at all. A good example might be to tell the story of how R. A. Fisher used the 'correlation does not imply causation' platitude to shoot down research connecting smoking to lung cancer and explain that it should have been clear that the hypothesis 'smoking causes cancer' was much more reasonable at that time than the hypothesis 'there's a common factor causing both smoking and cancer'. (On the other hand, this could turn political. I don't know whether the smoking and lung cancer issue is still contested.)
There's also e): A causes B within our sample, but A does not cause B generally, or in the sense that we care about.
For example, suppose a teacher gives out a gold star whenever a pupil does a good piece of work, and this causes the pupil to work harder. Suppose also that this effect is greatest on mediocre pupils and least on the best pupils - but the best pupils get most of the gold stars, naturally.
Now suppose an educational researcher observes the class, and notes the correlation between receiving a gold star, and increased effort. This is genuine caus... (read more)