magfrump comments on 2012 Survey Results - Less Wrong

80 Post author: Yvain 07 December 2012 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (640)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 30 November 2012 02:13:41AM *  9 points [-]

This still suffers from selection bias - I'd imagine that people with lower IQ are more likely to leave the field blank than people with higher IQ.

I think this is only true if we're going to also assume that the selection bias is operating on ACT and SAT scores. But we know they correlate with IQ, and quite a few respondents included ACT/SAT1600/SAT2400 data while they didn't include the IQ; so all we have to do is take for each standardized test the subset of people with IQ scores and people without, and see if the latter have lower scores indicating lower IQs. The results seem to indicate that while there may be a small difference in means between the groups on the 3 scores, it's neither of large effect size nor statistical significance.

ACT:

R> lwa <- subset(lw, !is.na(as.integer(ACTscoreoutof36)))
R> lwiq <- subset(lwa, !is.na(as.integer(IQ)))
R> lwiqnot <- subset(lwa, is.na(as.integer(IQ)))
R> t.test(lwiq$ACTscoreoutof36, lwiqnot$ACTscoreoutof36, alternative="less")
Welch Two Sample t-test
data: lwiq$ACTscoreoutof36 and lwiqnot$ACTscoreoutof36 t = 0.5088, df = 141.9, p-value = 0.6942
alternative hypothesis: true difference in means is less than 0 95 percent confidence interval:
-Inf 0.7507 sample estimates:
mean of x mean of y 32.68 32.50

Original SAT:

R> lwa <- subset(lw, !is.na(as.integer(SATscoresoutof1600)))
R> lwiq <- subset(lwa, !is.na(as.integer(IQ)))
R> lwiqnot <- subset(lwa, is.na(as.integer(IQ)))
R> t.test(lwiq$SATscoresoutof1600, lwiqnot$SATscoresoutof1600, alternative="less")
Welch Two Sample t-test
data: lwiq$SATscoresoutof1600 and lwiqnot$SATscoresoutof1600 t = -1.137, df = 237.4, p-value = 0.1284
alternative hypothesis: true difference in means is less than 0 95 percent confidence interval:
-Inf 6.607 sample estimates:
mean of x mean of y 1476 1490

New SAT:

R> lwa <- subset(lw, !is.na(as.integer(SATscoresoutof2400)))
R> lwiq <- subset(lwa, !is.na(as.integer(IQ)))
R> lwiqnot <- subset(lwa, is.na(as.integer(IQ)))
R> t.test(lwiq$SATscoresoutof2400, lwiqnot$SATscoresoutof2400, alternative="less")
Welch Two Sample t-test
data: lwiq$SATscoresoutof2400 and lwiqnot$SATscoresoutof2400 t = -0.9645, df = 129.9, p-value = 0.1683
alternative hypothesis: true difference in means is less than 0 95 percent confidence interval:
-Inf 109.3 sample estimates:
mean of x mean of y 2221 2374

The lack of variation is unsurprising since the (original) SAT and ACT are correlated, after all:

R> lwa <- subset(lw, !is.na(as.integer(ACTscoreoutof36)))
R> lwsat <- subset(lwa, !is.na(as.integer(SATscoresoutof1600)))
R> cor.test(lwsat$SATscoresoutof1600, lwsat$ACTscoreoutof36)
Pearson's product-moment correlation
data: lwsat$SATscoresoutof1600 and lwsat$ACTscoreoutof36 t = 8.839, df = 66, p-value = 8.415e-13
alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.6038 0.8291 sample estimates:
cor 0.7362
Comment author: magfrump 01 December 2012 04:15:47AM 2 points [-]

I'm interested in this analysis but I don't think the results are presented nicely, and I am not THAT interested. If someone else wants to summarize the parent I promise to upvote you.

Comment author: gwern 01 December 2012 04:42:33AM 5 points [-]

I... thought I did summarize it nicely:

But we know they correlate with IQ, and quite a few respondents included ACT/SAT1600/SAT2400 data while they didn't include the IQ; so all we have to do is take for each standardized test the subset of people with IQ scores and people without, and see if the latter have lower scores indicating lower IQs. The results seem to indicate that while there may be a small difference in means between the groups on the 3 scores, it's neither of large effect size nor statistical significance.

Comment author: magfrump 01 December 2012 05:03:46AM 4 points [-]

That is actually better than I remembered immediately after reading it; with the data coming after the discussion my brain pattern-completed to expect a conclusion after the data. Also the paragraph is a little bit dense; a paragraph break before the last sentence might make it a little more readable in my mind.

I had already upvoted your post, regardless :)