Understanding Simpson's Paradox

Vaniver

19 Understanding Simpson's Paradox

18th Sep 2013

2 min read

19

An article by Judea Pearl, available here. It's quick at 8 pages, and worth reading if you enjoy statistics (though I think people who already are familiar with the math of causality¹ will get more out of it than others²). I'll talk here about the part that I think is generally interesting:

Any claim to a resolution of a paradox, especially one that has resisted a century of attempted resolution must meet certain criteria. First and foremost, the solution must explain why people consider the phenomenon surprising or unbelievable. Second, the solution must identify the class of scenarios in which the paradox may surface, and distinguish it from scenarios where it will surely not surface. Finally, in those scenarios where the paradox leads to indecision, we must identify the correct answer, explain the features of the scenario that lead to that choice, and prove mathematically that the answer chosen is indeed correct. The next three subsections will describe how these three requirements are met in the case of Simpson's paradox and, naturally, will proceed to convince readers that the paradox deserves the title "resolved."

I've never really liked the name "paradox," because what it seems to mean is "unintuitive phenomenon." (Wikipedia puts it as "something which seems false and yet might be true.") The trouble is that "unintuitive" is a two-place word, and it makes sense to think like reality, so that true things seem true to you, instead of still seeming false. (For example, when I first learned about Zeno's Paradox, I already knew calculus, and so Zeno's position was the one that seemed confusing and false.)

What I like most about Pearl's article is that it explicitly recognizes the importance of fully dissolving the paradox,³ and seems to do so. Simpson's Paradox isn't an unsolvable problem in statistics, it's a straightforward reversal effect--only if you use the language of causality.

1. My review of Causality gives a taste of what it would look like to be familiar with the math, but you'd need to actually read the book to pick it up. The Highly Advanced Epistemology 101 for Beginners sequence is relevant, and contains Eliezer's attempt to explain the basics of causality in Causal Diagrams and Causal Models.

2. Pearl discusses how you would go about using simulations to show that do calculus gives you the right result, but leaves it as an exercise for the reader.

3. How An Algorithm Feels From Inside is probably a better place to start than Dissolving the Question, and I can't help but echo a question from it: "So what kind of math design corresponds to [Simpson's Paradox]?"

See also: bentarm's explanation of Simpson's Paradox.

Personal Blog

19

New Comment

Rendering 0/20 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:43 AM

Moderation Log

19 Understanding Simpson's Paradox

by Vaniver

18th Sep 2013

2 min read

19

Any claim to a resolution of a paradox, especially one that has resisted a century of attempted resolution must meet certain criteria. First and foremost, the solution must explain why people consider the phenomenon surprising or unbelievable. Second, the solution must identify the class of scenarios in which the paradox may surface, and distinguish it from scenarios where it will surely not surface. Finally, in those scenarios where the paradox leads to indecision, we must identify the correct answer, explain the features of the scenario that lead to that choice, and prove mathematically that the answer chosen is indeed correct. The next three subsections will describe how these three requirements are met in the case of Simpson's paradox and, naturally, will proceed to convince readers that the paradox deserves the title "resolved."

2. Pearl discusses how you would go about using simulations to show that do calculus gives you the right result, but leaves it as an exercise for the reader.

See also: bentarm's explanation of Simpson's Paradox.

Personal Blog

19

Mentioned in

39Interactive Infographic on Simpson's Paradox

New Comment

Rendering 0/20 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:43 AM

Moderation Log

More from Vaniver

Curated and popular this week

20Comments

Comment Permalink

Vaniver13y90

Since Pearl asserts the first two, do I have to get rid of the idea that more qualifications lead to more pay? I can't see any other way out of the bind.

I believe so, with the caveat that this could be a reversal effect. That is, qualifications and pay may be positively correlated for the whole group because men have more of both than women, while for each subgroup the correlations are negative.

Consider the following situation:

Men have 60 points to spend at character creation. Each point can either be used on a year of schooling, or a dollar of salary, with a minimum of 10 in each.

Women have 30 points to spend at character creation. Each point can either be used on a year of schooling, or a dollar of salary, with a minimum of 10 in each.

Now Bob says, "Look! If we look at groups determined by salary, each man is more qualified than women in his cohort, by thirty years of schooling." Barbara says, "Look! If we look at groups determined by schooling, each woman earns less than men in her cohort, by thirty dollars."

If most people choose to spend their points equally, then the population will be dominated by the points (15,15) and (30,30), and so the Association of Higher Education will say "Look! Schooling and salary are positively correlated."

The causal diagram in this situation is clear, though: it's being male that leads to more points while the direct effect of schooling on salary is negative because those two come from the same pool of points.

cousin_it13y20

That's a great explanation, thanks for writing it! From now on, I will use your explanation instead of mine.

0tgb13y

Thanks for the insightful comment. I hadn't considered that particular application of Simpson's paradox. But really, I don't think this is that likely, is it? I mean, you're letting me get one statement I like "qualifications correlate with earnings in general" but give up two statements that I find likely: "qualification correlate with earnings for males (resp. females)". This paper looks like it says that qualifications are correlated with earnings for each subgroup. See the tables on pages 21 and 22. I say "looks like" since I haven't actually read it and just skipped to the tables. I hope to get a chance to look at it more in depth soon.

See in context