Understanding Simpson's Paradox

Vaniver

An article by Judea Pearl, available here. It's quick at 8 pages, and worth reading if you enjoy statistics (though I think people who already are familiar with the math of causality¹ will get more out of it than others²). I'll talk here about the part that I think is generally interesting:

Any claim to a resolution of a paradox, especially one that has resisted a century of attempted resolution must meet certain criteria. First and foremost, the solution must explain why people consider the phenomenon surprising or unbelievable. Second, the solution must identify the class of scenarios in which the paradox may surface, and distinguish it from scenarios where it will surely not surface. Finally, in those scenarios where the paradox leads to indecision, we must identify the correct answer, explain the features of the scenario that lead to that choice, and prove mathematically that the answer chosen is indeed correct. The next three subsections will describe how these three requirements are met in the case of Simpson's paradox and, naturally, will proceed to convince readers that the paradox deserves the title "resolved."

I've never really liked the name "paradox," because what it seems to mean is "unintuitive phenomenon." (Wikipedia puts it as "something which seems false and yet might be true.") The trouble is that "unintuitive" is a two-place word, and it makes sense to think like reality, so that true things seem true to you, instead of still seeming false. (For example, when I first learned about Zeno's Paradox, I already knew calculus, and so Zeno's position was the one that seemed confusing and false.)

What I like most about Pearl's article is that it explicitly recognizes the importance of fully dissolving the paradox,³ and seems to do so. Simpson's Paradox isn't an unsolvable problem in statistics, it's a straightforward reversal effect--only if you use the language of causality.

1. My review of Causality gives a taste of what it would look like to be familiar with the math, but you'd need to actually read the book to pick it up. The Highly Advanced Epistemology 101 for Beginners sequence is relevant, and contains Eliezer's attempt to explain the basics of causality in Causal Diagrams and Causal Models.

2. Pearl discusses how you would go about using simulations to show that do calculus gives you the right result, but leaves it as an exercise for the reader.

3. How An Algorithm Feels From Inside is probably a better place to start than Dissolving the Question, and I can't help but echo a question from it: "So what kind of math design corresponds to [Simpson's Paradox]?"

See also: bentarm's explanation of Simpson's Paradox.

Any claim to a resolution of a paradox, especially one that has resisted a century of attempted resolution must meet certain criteria. First and foremost, the solution must explain why people consider the phenomenon surprising or unbelievable. Second, the solution must identify the class of scenarios in which the paradox may surface, and distinguish it from scenarios where it will surely not surface. Finally, in those scenarios where the paradox leads to indecision, we must identify the correct answer, explain the features of the scenario that lead to that choice, and prove mathematically that the answer chosen is indeed correct. The next three subsections will describe how these three requirements are met in the case of Simpson's paradox and, naturally, will proceed to convince readers that the paradox deserves the title "resolved."

2. Pearl discusses how you would go about using simulations to show that do calculus gives you the right result, but leaves it as an exercise for the reader.

See also: bentarm's explanation of Simpson's Paradox.

Why don't you take his model, as described, and see if it really produces this effect?

For the first disagreement, that's a disagreement about his commentary on his second figure. I don't have the data to actually calculate the correlation there, but eyeballing it the groups look like they don't have a positive relationship between education and income anywhere near that of the larger group.

The second disagreement is on interpretation. If you add noise in both dimensions to a multivariate Gaussian model with mean differences between groups, then that impacts any slice of the model (modified by the angle between the mean difference vector and the slice vector). If one subgroup is above and to the right of the other subgroup, that means it's above for every vertical slice and to the right for every horizontal slice. (On northwest-southeast slices, there's no mean difference between the distributions, just population size differences, and the mean difference is maximized on the northeast-southwest slice.)

The particular slicing used in this effect- looking at each vertical slice individually, and each horizontal slice individually- seems reasonable, except that in the presence of mean differences it behaves as a filter that preserves the NW-SE noise!

The grandparent was wrong before I edited it, where I speculated that the noise had to be negatively correlated. That's the claim that the major axis of the covariance ellipse has to be oriented a particular direction, but that was an overreach, as you see the reverse regression effect if there is any noise along the NW-SE axis. Take a look at Yan's first figure- it has noise in both blues and greens, but it's one-dimensional noise going NE-SW, and so we don't see reverse regression.

My original thought (when I thought you might need the major axis to be NW-SE, rather than just the NW-SE axis to be nonzero) was that this was just a reversal effect, with the noise providing the reversing factor. That's still true but I'm surprised at how benign the restrictions on the noise are.

That is, I disagree with Yan that this has a different origin than Simpson's Paradox, but I agree with Yan that this is an important example of how pernicious reversal effects are, and that noise generates them by default, in some sense. I would demonstrate it with a multivariate Gaussian where the blue mean is [6 6], the green mean is [4 4], and the covariance matrix is [1 .5; .5 1], so that it's obvious that the dominant relationship for each group is a positive relationship between education and income but the NW-SE relationship exists and these slices make it visible.

Hi Vaniver! =D

On the commentary: your eyeballing seems good, but I don't think I ever said anything about relative comparisons between correlation coefficients (namely just overall correlation is positive). As you observed, I could easily make all 3 correlations (blue-only, green only, or blue+green) positive. I don't have any interesting things to say about their relative degrees.

I don't quite see the difference in interpretation from this writing. I agree with basically all the stuff you've written? The fact that the slicing "behaves as a filter&quo... (read more)

19

Understanding Simpson's Paradox

19

19

19

Understanding Simpson's Paradox

19

19