I have a question that I can't work out. From Pearl's Causality book (the 2000 version with the excellent commentary in the back), I read on page 356:
Another example involves a controversy called "reverse regression," which occupied the social science literature in the 1970s. Should we, in salary discrimination cases, compare salaries of equally qualified men and women or instead compare qualifications of equally paid men and women?
Remarkably, the two choices led to opposite conclusions. It turned out that men earned a higher salary than equally qualified women and, simultaneously, men were more qualified than equally paid women. The moral is that all conclusions are extremely sensitive to which variables we choose to hold constant when we are comparing, and that is why the adjustment problem is so critical in the analysis of observational studies.
My problem is that I cannot imagine world in which men earn more than equally qualified women, men are more qualified than equally paid women, and that more qualified men (respectively, women) are paid more than more qualified men (respectively, women). There does not appear to be such a set of points in the space (Wages) x (Qualifications) x (Genders) where all of these conditions hold true. Since Pearl asserts the first two, do I have to get rid of the idea that more qualifications lead to more pay? I can't see any other way out of the bind.
(My reasoning for why this appears to be impossible: start with the assumption of the first two conditions (i.e. Pearl's assertions). Consider a man of some qualifications and pay. A woman A as qualified as him earn less. A woman B who earn as much as him are more qualified. But the slope of the qualifications-wages line between woman A and woman B goes the wrong way for qualifications to be positively correlated to wages - the less qualified woman earns more! So if this is possible, there's something quite unintuitive going on with the distributions.)
Let's take a world with 10 people and 4 jobs:
If you control for education:
... and if you control for pay:
An article by Judea Pearl, available here. It's quick at 8 pages, and worth reading if you enjoy statistics (though I think people who already are familiar with the math of causality1 will get more out of it than others2). I'll talk here about the part that I think is generally interesting:
I've never really liked the name "paradox," because what it seems to mean is "unintuitive phenomenon." (Wikipedia puts it as "something which seems false and yet might be true.") The trouble is that "unintuitive" is a two-place word, and it makes sense to think like reality, so that true things seem true to you, instead of still seeming false. (For example, when I first learned about Zeno's Paradox, I already knew calculus, and so Zeno's position was the one that seemed confusing and false.)
What I like most about Pearl's article is that it explicitly recognizes the importance of fully dissolving the paradox,3 and seems to do so. Simpson's Paradox isn't an unsolvable problem in statistics, it's a straightforward reversal effect--only if you use the language of causality.
1. My review of Causality gives a taste of what it would look like to be familiar with the math, but you'd need to actually read the book to pick it up. The Highly Advanced Epistemology 101 for Beginners sequence is relevant, and contains Eliezer's attempt to explain the basics of causality in Causal Diagrams and Causal Models.
2. Pearl discusses how you would go about using simulations to show that do calculus gives you the right result, but leaves it as an exercise for the reader.
3. How An Algorithm Feels From Inside is probably a better place to start than Dissolving the Question, and I can't help but echo a question from it: "So what kind of math design corresponds to [Simpson's Paradox]?"
See also: bentarm's explanation of Simpson's Paradox.