Comment author:gwern
20 February 2013 09:24:34PM
*
4 points
[-]

The tl;dr version is that the effect is going to be small unless you have a very inaccurate test, and it's suspicious to focus on a small effect when there's probably other, larger effects we could be looking at.

Yes, the effect is small in absolute magnitude - if you look at the example SAT shrinking that Vaniver and I were working out, the difference between the male/female shrunk scores is like 5 points although that's probably an underestimate since it's ignoring the difference in variance and only looking at means - but these 5 points could have a big difference depending on how the score is used or what other differences you look at.

For example, not shrinking could lead to a number of girls getting into Harvard that would not have since Harvard has so many applicants and they all have very high SAT scores; there could well be a noticeable effect on the margin. When you're looking at like 30 applications for each seat, 10 SAT points could be the difference between success and failure for a few applicants.

One could probably estimate how many by looking for logistic regressions of 'SAT score vs admission chance', seeing how much 10 points is worth, and multiplying against the number of applicants. 35k applicants in 2011 for 2.16k spots. One logistic regression has a 'model 7' taking into account many factors where going from 1300 to 1600 goes from an odds ratio of 1.907 to 10.381; so if I'm interpreting this right, an extra 10pts on your total SAT is worth an odds ratio of ((10.381 - 1.907) / (1600-1300)) * 10 + 1 = 1.282. So the members of a group given a 10pt gain are each 1.28x more likely to be admitted than they were before; before, they had a 2.16/35 = 6.17% chance, and now they have a (1.28 * 2.16) / 35 = 2.76 / 35 = 7.89% chance. To finish the analysis: if 17.5k boys apply and 17.5k girls apply and 6.17% of the boys are admitted while 7.89% of the girls are admitted, then there will be an extra (17500 * 0.0789) - (17500 * 0.0617) = 301 girls.

(A boost of more than 1% leading to 301 additional girls on the margin sounds too high to me. Probably I did something wrong in manipulating the odds ratios.)

One could make the same point about means of bell curves differing a little bit: it may lead to next to no real difference towards the middle, but out on the tails it can lead to absurd differentials. I think I once calculated that a difference of one standard deviation in IQ between groups A and B leads to a difference out at 3 deviations for A vs 4 deviations for B, what is usually the cutoff for 'genius', of ~50x. One sd is a lot and certainly not comparable to 10 points on the SAT, but you see what I mean.

But if my first draw is a red marble

How do you know your first draw is a red marble?

BUT, you'd still be a fool to exclude ALL red candidates on that basis, since you also know that you should legitimately have red candidates in your pool, and by accepting red candidates you increase the overall number of programmers you have access to.

Depends on what you're going to do with them, I suppose... If you can only hire 1 weasel, you'll be better off going with one of the blue weasels, no? While if you're just giving probabilities (I'm straining to think of how to continue the analogy: maybe the weasels are floating Hanson-style student loans on prediction markets and you want to see how to buy or sell their interest rates), sure, you just mark down your estimated probability by 1% or whatever.

If we assume two populations, red-weasel-haters and rationalists, we could even run Bayes' Theorem and conclude that anyone who goes around feeling the need to point out that 1% difference is SIGNIFICANTLY more likely to be a red-weasel-hater, not a rationalist.

Alas! When red-weasel-hating is supported by statistics, only people interested in statistics will be hating on red-weasels. :)

Comment author:VincentYu
21 February 2013 05:49:46AM
0 points
[-]

One logistic regression has a 'model 7' taking into account many factors where going from 1300 to 1600 goes from an odds ratio of 1.907 to 10.381; so if I'm interpreting this right, an extra 10pts on your total SAT is worth an odds ratio of ((10.381 - 1.907) / (1600-1300)) * 10 + 1 = 1.282.

Aren't odds ratios multiplicative? It also seems to me that we should take the center of the SAT score bins to avoid an off-by-one bin width bias, so (10.381 / 1.907) ^ (10 / (1550 - 1350)) = 1.088. (Or compute additively with log-odds.)

As Vaniver mentioned, this estimate varies across the SAT score bins. If we look only at the top two SAT bins in Model 7: (10.381 / 4.062) ^ (10 / (1550 - 1450)) = 1.098.

Note that within the logistic model, they binned their SAT score data and regressed on them as dichotomous indicator variables, instead of using the raw scores and doing polynomial/nonparametric regression (I presume they did this to simplify their work because all other predictor variables are dichotomous).

Comment author:gwern
21 February 2013 04:13:32PM
0 points
[-]

Aren't odds ratios multiplicative? It also seems to me that we should take the center of the SAT score bins to avoid an off-by-one bin width bias, so (10.381 / 1.907) ^ (10 / (1550 - 1350)) = 1.088. (Or compute additively with log-odds.)

Yeah; Vaniver already did it via log odds.

If we look only at the top two SAT bins in Model 7: (10.381 / 4.062) ^ (10 / (1550 - 1450)) = 1.098.

Which is higher than the top bin of 1.088 so I guess that makes using the top bin an underestimate (fine by me).

Note that within the logistic model, they binned their SAT score data and regressed on them as dichotomous indicator variables, instead of using the raw scores and doing polynomial/nonparametric regression

Alas! I just went with the first paper on Harvard I found in Google which did a logistic regression involving SAT scores (well, second: the first one confounded scores with being legacies and minorities and so wasn't useful). There may be a more useful paper out there.

## Comments (590)

Best*4 points [-]Yes, the effect is small in absolute magnitude - if you look at the example SAT shrinking that Vaniver and I were working out, the difference between the male/female shrunk scores is like 5 points although that's probably an underestimate since it's ignoring the difference in variance and only looking at means - but these 5 points

couldhave a big difference depending on how the score is used or what other differences you look at.For example, not shrinking could lead to a number of girls getting into Harvard that would not have since Harvard has

somany applicants and they all have very high SAT scores; there could well be a noticeable effect on the margin. When you're looking at like 30 applications for each seat, 10 SAT points could be the difference between success and failure for a few applicants.One could probably estimate how many by looking for logistic regressions of 'SAT score vs admission chance', seeing how much 10 points is worth, and multiplying against the number of applicants. 35k applicants in 2011 for 2.16k spots. One logistic regression has a 'model 7' taking into account many factors where going from 1300 to 1600 goes from an odds ratio of 1.907 to 10.381; so if I'm interpreting this right, an extra 10pts on your total SAT is worth an odds ratio of

`((10.381 - 1.907) / (1600-1300)) * 10 + 1 = 1.282`

. So the members of a group given a 10pt gain are each 1.28x more likely to be admitted than they were before; before, they had a`2.16/35 = 6.17%`

chance, and now they have a`(1.28 * 2.16) / 35 = 2.76 / 35 = 7.89%`

chance. To finish the analysis: if 17.5k boys apply and 17.5k girls apply and 6.17% of the boys are admitted while 7.89% of the girls are admitted, then there will be an extra`(17500 * 0.0789) - (17500 * 0.0617) = 301`

girls.(A boost of more than 1% leading to 301 additional girls on the margin sounds too high to me. Probably I did something wrong in manipulating the odds ratios.)

One could make the same point about means of bell curves differing a little bit: it may lead to next to no real difference towards the middle, but out on the tails it can lead to absurd differentials. I think I once calculated that a difference of one standard deviation in IQ between groups A and B leads to a difference out at 3 deviations for A vs 4 deviations for B, what is usually the cutoff for 'genius', of ~50x. One sd is a lot and certainly not comparable to 10 points on the SAT, but you see what I mean.

How do you

knowyour first draw is a red marble?Depends on what you're going to do with them, I suppose... If you can only hire 1 weasel, you'll be better off going with one of the blue weasels, no? While if you're just giving probabilities (I'm straining to think of how to continue the analogy: maybe the weasels are floating Hanson-style student loans on prediction markets and you want to see how to buy or sell their interest rates), sure, you just mark down your estimated probability by 1% or whatever.

Alas! When red-weasel-hating is supported by statistics, only people interested in statistics will be hating on red-weasels. :)

Aren't odds ratios multiplicative? It also seems to me that we should take the center of the SAT score bins to avoid an off-by-one bin width bias, so (10.381 / 1.907) ^ (10 / (1550 - 1350)) = 1.088. (Or compute additively with log-odds.)

As Vaniver mentioned, this estimate varies across the SAT score bins. If we look only at the top two SAT bins in Model 7: (10.381 / 4.062) ^ (10 / (1550 - 1450)) = 1.098.

Note that within the logistic model, they binned their SAT score data and regressed on them as dichotomous indicator variables, instead of using the raw scores and doing polynomial/nonparametric regression (I presume they did this to simplify their work because all other predictor variables are dichotomous).

Yeah; Vaniver already did it via log odds.

Which is higher than the top bin of 1.088 so I guess that makes using the top bin an underestimate (fine by me).

Alas! I just went with the first paper on Harvard I found in Google which did a logistic regression involving SAT scores (well, second: the first one confounded scores with being legacies and minorities and so wasn't useful). There may be a more useful paper out there.