(An idea I had while responding to this quotes thread)
"Correlation does not imply causation" is bandied around inexpertly and inappropriately all over the internet. Lots of us hate this.
But get this: the phrase, and the most obvious follow-up phrases like "what does imply causation?" are not high-competition search terms. Up until about an hour ago, the domain name correlationdoesnotimplycausation.com was not taken. I have just bought it.
There is a correlation-does-not-imply-causation shaped space on the internet, and it's ours for the taking. I would like to fill this space with a small collection of relevant educational resources explaining what is meant by the term, why it's important, why it's often used inappropriately, and the circumstances under which one may legitimately infer causation.
At the moment the Wikipedia page is trying to do this, but it's not really optimised for the task. It also doesn't carry the undercurrent of "no, seriously, lots of smart people get this wrong; let's make sure you're not one of them", and I think it should.
The purpose of this post is two-fold:
Firstly, it lets me say "hey dudes, I've just had this idea. Does anyone have any suggestions (pragmatic/technical, content-related, pointing out why it's a terrible idea, etc.), or alternatively, would anyone like to help?"
Secondly, it raises the question of what other corners of the internet are ripe for the planting of sanity waterline-raising resources. Are there any other similar concepts that people commonly get wrong, but don't have much of a guiding explanatory web presence to them? Could we put together a simple web platform for carrying out this task in lots of different places? The LW readership seems ideally placed to collectively do this sort of work.
Pet peeve:
The saying should be: "statistical dependence does not imply causality." Correlation is a particular measure of a linear relationship. A lack of correlation can happily coexist with statistical dependence if variables are related in a complicated non-linear way. This "correlation" business needlessly emphasizes linear models (prevalent in Stat at the time of Pearson et al.) See also this: http://en.wikipedia.org/wiki/Correlation_and_dependence
Also, this is true: "lack of statistical dependence does not imply lack of causality" (due to effect cancellation).
They can also be completely independant and have causation, but that's not something that would happen by chance. The only time I know of where something like that will happen is if the cause is designed to regulate whatever it's independant of. For example, the temperature doesn't correlate to the power going through the heater or air conditioner, since it's always constant, which is because the heater and air conditioner keep it constant.