Comment author:gwern
20 February 2013 09:24:34PM
*
4 points
[-]

The tl;dr version is that the effect is going to be small unless you have a very inaccurate test, and it's suspicious to focus on a small effect when there's probably other, larger effects we could be looking at.

Yes, the effect is small in absolute magnitude - if you look at the example SAT shrinking that Vaniver and I were working out, the difference between the male/female shrunk scores is like 5 points although that's probably an underestimate since it's ignoring the difference in variance and only looking at means - but these 5 points could have a big difference depending on how the score is used or what other differences you look at.

For example, not shrinking could lead to a number of girls getting into Harvard that would not have since Harvard has so many applicants and they all have very high SAT scores; there could well be a noticeable effect on the margin. When you're looking at like 30 applications for each seat, 10 SAT points could be the difference between success and failure for a few applicants.

One could probably estimate how many by looking for logistic regressions of 'SAT score vs admission chance', seeing how much 10 points is worth, and multiplying against the number of applicants. 35k applicants in 2011 for 2.16k spots. One logistic regression has a 'model 7' taking into account many factors where going from 1300 to 1600 goes from an odds ratio of 1.907 to 10.381; so if I'm interpreting this right, an extra 10pts on your total SAT is worth an odds ratio of ((10.381 - 1.907) / (1600-1300)) * 10 + 1 = 1.282. So the members of a group given a 10pt gain are each 1.28x more likely to be admitted than they were before; before, they had a 2.16/35 = 6.17% chance, and now they have a (1.28 * 2.16) / 35 = 2.76 / 35 = 7.89% chance. To finish the analysis: if 17.5k boys apply and 17.5k girls apply and 6.17% of the boys are admitted while 7.89% of the girls are admitted, then there will be an extra (17500 * 0.0789) - (17500 * 0.0617) = 301 girls.

(A boost of more than 1% leading to 301 additional girls on the margin sounds too high to me. Probably I did something wrong in manipulating the odds ratios.)

One could make the same point about means of bell curves differing a little bit: it may lead to next to no real difference towards the middle, but out on the tails it can lead to absurd differentials. I think I once calculated that a difference of one standard deviation in IQ between groups A and B leads to a difference out at 3 deviations for A vs 4 deviations for B, what is usually the cutoff for 'genius', of ~50x. One sd is a lot and certainly not comparable to 10 points on the SAT, but you see what I mean.

But if my first draw is a red marble

How do you know your first draw is a red marble?

BUT, you'd still be a fool to exclude ALL red candidates on that basis, since you also know that you should legitimately have red candidates in your pool, and by accepting red candidates you increase the overall number of programmers you have access to.

Depends on what you're going to do with them, I suppose... If you can only hire 1 weasel, you'll be better off going with one of the blue weasels, no? While if you're just giving probabilities (I'm straining to think of how to continue the analogy: maybe the weasels are floating Hanson-style student loans on prediction markets and you want to see how to buy or sell their interest rates), sure, you just mark down your estimated probability by 1% or whatever.

If we assume two populations, red-weasel-haters and rationalists, we could even run Bayes' Theorem and conclude that anyone who goes around feeling the need to point out that 1% difference is SIGNIFICANTLY more likely to be a red-weasel-hater, not a rationalist.

Alas! When red-weasel-hating is supported by statistics, only people interested in statistics will be hating on red-weasels. :)

Comment author:Vaniver
20 February 2013 11:13:46PM
*
3 points
[-]

an extra 10pts on your total SAT is worth an odds ratio of 1.282

We can check this interpretation by taking it to the 30th power, and seeing if we recover something sensible; unfortunately, that gives us an odds ratio of over 1700! If we had their beta coefficients, we could see how much 10 points corresponds to, but it doesn't look like they report it.

Logistic regression is a technique that compresses the real line down to the range between 0 and 1; you can think of that model as the schools giving everyone a score, admitting people above a threshold with probably approximately 1, admitting people below a threshold with probability approximately 0, and then admitting people in between with a probability that increases based on their score (with a score of '0' corresponding to a 50% chance of getting in).

We might be able to recover their beta by taking the log of the odds they report (see here). This gives us a reasonable but not too pretty result, with an estimate that 100 points of SAT is worth a score adjustment of .8. (The actual amount varies for each SAT band, which makes sense if their score for each student nonlinearly weights SAT scores. The jump from the 1400s to the 1500s is slightly bigger than the jump from the 1300s to the 1400s, suggesting that at the upper bands differences in SAT scores might matter more.)

A score increase of .08 cashes out as an odds ratio of 1.083, which when we take that to the power 30 we get 11.023, which is pretty close to what we'd expect.

I think I once calculated that a difference of one standard deviation in IQ between groups A and B leads to a difference out at 3 deviations for A vs 4 deviations for B, what is usually the cutoff for 'genius', of ~50x.

Two standard deviations is generally enough to get you into 'gifted and talented' programs, as they call them these days. Four standard deviations gets you to finishing in the top 200 of the Putnam competition, according to Griffe's calculations, which are also great at illustrating male/female ratios at various levels given Project Talent data on math ability.

I'll also note again that the SAT is probably not the best test to use for this; it gives a male/female math ability variance ratio estimate of 1.1, whereas Project Talent estimated it as 1.2. Which estimate you choose makes a big difference in your estimation of the strength of this effect. (Note that, typically, more females take the SAT than males, because the cutoff for interest in the SAT is below the population mean, where male variability hurts as well as other factors, and this systemic bias in subject selection will show up in the results.)

Comment author:gwern
21 February 2013 04:11:04PM
2 points
[-]

Thanks for the odds corrections. I knew I got something wrong...

Two standard deviations is generally enough to get you into 'gifted and talented' programs, as they call them these days.

G&T stuff, yeah, but in the materials I've read 2sd is not enough to move you from 'bright' or 'gifted and talented' to 'genius' categories, which seems to usually be defined as >2.5-3sd, and using 3sd made the calculation easier.

Comment author:Vaniver
21 February 2013 04:52:30PM
0 points
[-]

Eh. MENSA requires upper 2% (which is ~2 standard deviations). Whether you label that 'genius' or 'bright' or something else doesn't seem terribly important. 3.5 standard deviations is the 2.3 out of 10,000 level, which is about a hundred times more restrictive.

Comment author:gwern
21 February 2013 04:59:04PM
2 points
[-]

I'd call MENSA merely bright... You need something in between 'normal' and 'genius' and bright seems fine. Genius carries all the wrong connotations for something as common as MENSA-level; 2.3 out of 10k seems more reasonable.

## Comments (591)

Best*4 points [-]Yes, the effect is small in absolute magnitude - if you look at the example SAT shrinking that Vaniver and I were working out, the difference between the male/female shrunk scores is like 5 points although that's probably an underestimate since it's ignoring the difference in variance and only looking at means - but these 5 points

couldhave a big difference depending on how the score is used or what other differences you look at.For example, not shrinking could lead to a number of girls getting into Harvard that would not have since Harvard has

somany applicants and they all have very high SAT scores; there could well be a noticeable effect on the margin. When you're looking at like 30 applications for each seat, 10 SAT points could be the difference between success and failure for a few applicants.One could probably estimate how many by looking for logistic regressions of 'SAT score vs admission chance', seeing how much 10 points is worth, and multiplying against the number of applicants. 35k applicants in 2011 for 2.16k spots. One logistic regression has a 'model 7' taking into account many factors where going from 1300 to 1600 goes from an odds ratio of 1.907 to 10.381; so if I'm interpreting this right, an extra 10pts on your total SAT is worth an odds ratio of

`((10.381 - 1.907) / (1600-1300)) * 10 + 1 = 1.282`

. So the members of a group given a 10pt gain are each 1.28x more likely to be admitted than they were before; before, they had a`2.16/35 = 6.17%`

chance, and now they have a`(1.28 * 2.16) / 35 = 2.76 / 35 = 7.89%`

chance. To finish the analysis: if 17.5k boys apply and 17.5k girls apply and 6.17% of the boys are admitted while 7.89% of the girls are admitted, then there will be an extra`(17500 * 0.0789) - (17500 * 0.0617) = 301`

girls.(A boost of more than 1% leading to 301 additional girls on the margin sounds too high to me. Probably I did something wrong in manipulating the odds ratios.)

One could make the same point about means of bell curves differing a little bit: it may lead to next to no real difference towards the middle, but out on the tails it can lead to absurd differentials. I think I once calculated that a difference of one standard deviation in IQ between groups A and B leads to a difference out at 3 deviations for A vs 4 deviations for B, what is usually the cutoff for 'genius', of ~50x. One sd is a lot and certainly not comparable to 10 points on the SAT, but you see what I mean.

How do you

knowyour first draw is a red marble?Depends on what you're going to do with them, I suppose... If you can only hire 1 weasel, you'll be better off going with one of the blue weasels, no? While if you're just giving probabilities (I'm straining to think of how to continue the analogy: maybe the weasels are floating Hanson-style student loans on prediction markets and you want to see how to buy or sell their interest rates), sure, you just mark down your estimated probability by 1% or whatever.

Alas! When red-weasel-hating is supported by statistics, only people interested in statistics will be hating on red-weasels. :)

*3 points [-]We can check this interpretation by taking it to the 30th power, and seeing if we recover something sensible; unfortunately, that gives us an odds ratio of over 1700! If we had their beta coefficients, we could see how much 10 points corresponds to, but it doesn't look like they report it.

Logistic regression is a technique that compresses the real line down to the range between 0 and 1; you can think of that model as the schools giving everyone a score, admitting people above a threshold with probably approximately 1, admitting people below a threshold with probability approximately 0, and then admitting people in between with a probability that increases based on their score (with a score of '0' corresponding to a 50% chance of getting in).

We might be able to recover their beta by taking the log of the odds they report (see here). This gives us a reasonable but not too pretty result, with an estimate that 100 points of SAT is worth a score adjustment of .8. (The actual amount varies for each SAT band, which makes sense if their score for each student nonlinearly weights SAT scores. The jump from the 1400s to the 1500s is slightly bigger than the jump from the 1300s to the 1400s, suggesting that at the upper bands differences in SAT scores might matter more.)

A score increase of .08 cashes out as an odds ratio of 1.083, which when we take that to the power 30 we get 11.023, which is pretty close to what we'd expect.

Two standard deviations is generally enough to get you into 'gifted and talented' programs, as they call them these days. Four standard deviations gets you to finishing in the top 200 of the Putnam competition, according to Griffe's calculations, which are also great at illustrating male/female ratios at various levels given Project Talent data on math ability.

I'll also note again that the SAT is probably not the best test to use for this; it gives a male/female math ability variance ratio estimate of 1.1, whereas Project Talent estimated it as 1.2. Which estimate you choose makes a big difference in your estimation of the strength of this effect. (Note that, typically, more females take the SAT than males, because the cutoff for interest in the SAT is below the population mean, where male variability hurts as well as other factors, and this systemic bias in subject selection will show up in the results.)

Thanks for the odds corrections. I knew I got something wrong...

G&T stuff, yeah, but in the materials I've read 2sd is not enough to move you from 'bright' or 'gifted and talented' to 'genius' categories, which seems to usually be defined as >2.5-3sd, and using 3sd made the calculation easier.

Eh. MENSA requires upper 2% (which is ~2 standard deviations). Whether you label that 'genius' or 'bright' or something else doesn't seem terribly important. 3.5 standard deviations is the 2.3 out of 10,000 level, which is about a hundred times more restrictive.

I'd call MENSA merely bright... You need something in between 'normal' and 'genius' and bright seems fine. Genius carries all the wrong connotations for something as common as MENSA-level; 2.3 out of 10k seems more reasonable.