Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Manfred comments on Taking "correlation does not imply causation" back from the internet - Less Wrong

41 Post author: sixes_and_sevens 03 October 2012 12:18PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (70)

You are viewing a single comment's thread. Show more comments above.

Comment author: Manfred 03 October 2012 03:20:56PM *  17 points [-]

if variables A and B are correlated, then we can be pretty damn sure that either: a) A causes B b) B causes A c) there's a third variable affecting both A and B.

There is in fact a d) A and not-B both can cause some condition C that defines our sample.

Example: Sexy people are more likely to be hired as actors. Good actors are also more likely to be hired as actors. So if we look at "people who are actors," then we'll get people who are sexy but can't really act, people who are sexy and can act, and people who can act and aren't really sexy. If sexiness and acting ability are independent, these three groups will be about equally full.

Thus if we look at actors in general in our simple model, 2/3 of them will be sexy and 2/3 of them will be good actors. But of the ones who are sexy, only 1/2 will be good actors. So being sexy is correlated with being a bad actor! Not because sexiness rots your brain (a), or because acting well makes you ugly (b), and not because acting classes cause both good acting and ugliness, or diet pills cause both beauty and bad acting (c). Instead, it's just because how we picked actors made sexiness and acting ability "compete for the same niche."

Similar examples would be sports and academics in college, different sorts of skills in people promoted in the workplace, UI design versus functionality in popular programs, and so on and so on.

Comment author: [deleted] 03 October 2012 05:47:41PM 3 points [-]

That's known as Berkson's paradox.

Comment author: shokwave 03 October 2012 04:12:22PM 3 points [-]

I feel like this example should go on the doesnotimply website.

Comment author: IlyaShpitser 03 October 2012 04:20:11PM *  9 points [-]

If you are familiar with d-separation (http://en.wikipedia.org/wiki/D-separation#d-separation), we have:

if A is dependent on B, and there's some unobserved C involved, then:

(1) A <- C -> B, or

(2) A -> C -> B, or

(3) A <- C <- B

(this is Reichenbach's common cause principle: http://plato.stanford.edu/entries/physics-Rpcc/)

or

(4) A -> C <- B

if C or its effect attains a particular (not necessarily recorded) value. Statisticians know this as Berkson's bias, which is a form of selection bias. In AI, this is known as "explaining away." Manfred's excellent example falls into category (4), with C observed to equal "hired as actor."


Beware: d-separation applies to causal graphical models, and Bayesian networks (which are statistical and not causal models). The meaning of arrows is different in these two kinds of models. This is actually a fairly subtle issue.

Comment author: shokwave 03 October 2012 07:58:55PM 0 points [-]

Odd - I always felt like d-separation was the same thing on causal diagrams and on Bayes networks. Although, I also understood Bayes network as being a model of the causal directions in a situation, so perhaps that's why.

Manfred's excellent example needs equally excellent counterparts for other possibilities.

Comment author: IlyaShpitser 03 October 2012 08:22:35PM *  2 points [-]

Sorry for not being clear. The d-separation criterion is the same in both Bayesian networks and causal diagrams, but its meaning is not the same. This is because an arrow A -> B in a causal diagram means (loosely) that A is a direct cause of B at the level of granularity of the model, while an arrow A -> B in a Bayesian network has a more complicated to explain meaning having to do with the Markov factorization and conditional independence. D-separation talks about arrows in both cases, but asserts different things due to a difference in the meaning of those arrows.

A Bayesian network model is just a statistical model (a set of joint distributions) associated with a directed acyclic graph. Specifically it's all distributions p(x1, ..., xk) that factorize as a product of terms of the form p(xi | parents(xi)). Nothing more, nothing less. Nothing about causality in that definition.


I think examples for (1),(2),(3) are simpler than Manfred's Berkson's bias example.

(1) A <- C -> B

Most clearly non-causal associations go here: "shoe size correlates with IQ" and its kin.

(2) A -> C -> B, and (3) A <- C <- B

Classic scientific triumphs go here: "smoking causes cancer." Of note here is that if we can find an observable unconfounded C that intercepts all/most of the causal pathway, this is extremely valuable for estimating effects. If you can design an experiment with such a C, you don't even have to randomize A.

Comment author: Antisuji 03 October 2012 06:43:07PM 2 points [-]

I first heard of this idea a few months ago in a blog post at The Atlantic.

Comment author: Manfred 03 October 2012 08:52:07PM *  1 point [-]

Aha, yes - which I think I in turn was linked to by Ben Goldacre. But the reason I was quickly able to enumerate this as a separate kind of correlation is because the causal graph is different, which would be Judea Pearl.

Comment author: Antisuji 04 October 2012 12:23:19AM 0 points [-]

Yup. I'm reading the link from this post and just got to the discussion of Berkson's paradox, which seems to be the same effect.

Comment author: prase 03 October 2012 08:42:08PM 0 points [-]

If sexiness and acting ability are independent, these three groups will be about equally full.

What do you mean by "equally full"?

Comment author: Manfred 03 October 2012 08:45:19PM *  1 point [-]

I mean "I'm about to pretend that 'sexy' and 'good actor' are binary variables centered to make the math super easy." If you would like less pretending, read the Atlantic article linked by a thoughtful replier, since the author draws the nice graph to prove the general case.

Comment author: prase 03 October 2012 09:13:02PM 0 points [-]

I wouldn't like less pretending and 'sexy'/'good actor' being binary variables is fine with me (and I understand your comment overall), but still I don't know what does it mean that the groups are equally full. (Equal size? That doesn't follow from independence.)

Comment author: Manfred 03 October 2012 09:29:25PM 1 point [-]

Right, so I make the math-light but false assumption that casting directors will take above-average applicants, and also that you aren't more likely to eventually become an actor if you're sexy and can act well.

Comment author: prase 03 October 2012 10:15:33PM -2 points [-]

above-average

If you mean "above median", I see.