army1987 comments on 2012 Survey Results - Less Wrong

80 Post author: Yvain 07 December 2012 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (640)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 01 December 2012 10:16:37PM 8 points [-]

Error bars, please!

Comment author: Kindly 01 December 2012 10:53:53PM *  1 point [-]

I was lazy and ignored all non-numerical IQ comments, so I got slightly different numbers. But my 95% confidence intervals are:

  • 145.18±3.27 in 2009
  • 140.12±1.41 in 2011
  • 138.42±1.33 in 2012
Comment author: gwern 01 December 2012 11:24:31PM *  4 points [-]

The summary data:

  1. 2009: n=67, 145.88(14.02)
  2. 2011: n=331; 140.10(13.07)
  3. 2012: n=346; 138.30(12.58); graphed:

<code>boxplot(lwi2009, lwi2011, lwi2012)</code>

The basic formula for a confidence interval of a population is: mean ± (z-score of confidence × (standard deviation / √n)). So for z-score=95%=1.96:

  1. = the range 142.5-149.2
  2. = the range 141.5-138.7
  3. = the range 137-139.6

Or to run the usual t-tests and look at the confidence interval they calculate for the difference; for 2009 & 2012, the 95% CI for the difference in mean IQ is 3.563-10.578:

R> lw2009 <- read.csv("lw-2009.csv")
R> lw2011 <- read.csv("lw-2011.csv")
R> lw2012 <- read.csv("lw-2012.csv")
R> # lwi2009 <- lw2009$IQ[!is.na(lw2009$IQ)]
R> # hand-cleaned:
R> lwi2009 <- c(120,125,128,129,130,130,130,130,130,130,130,130,130,131,132,132,133,134,136,138,138,139,139,140,
140,140,140,140,140,140,140,140,140,141,142,144,145,145,145,148,148,150,150,150,150,152,154,154,
155,155,155,155,156,158,158,160,160,160,160,162,163,164,165,166,170,171,173,180)
R> lwi2011 <- lw2011$IQ[!is.na(lw2011$IQ)]
R> lwi2012 <- lw2012$IQ[!is.na(lw2012$IQ)]
R>
R> t.test(lwi2009, lwi2012)
Welch Two Sample t-test
data: lwi2009 and lwi2012
t = 4.004, df = 91.49, p-value = 0.0001264
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.563 10.578
sample estimates:
mean of x mean of y
145.4 138.3
R> t.test(lwi2009, lwi2011)
Welch Two Sample t-test
data: lwi2009 and lwi2011
t = 2.968, df = 94.8, p-value = 0.003791
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.752 8.830
sample estimates:
mean of x mean of y
145.4 140.1
R> t.test(lwi2011, lwi2012)
Welch Two Sample t-test
data: lwi2011 and lwi2012
t = 1.804, df = 670.4, p-value = 0.07174
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1578 3.7174
sample estimates:
mean of x mean of y
140.1 138.3
Comment author: gwern 02 December 2012 12:56:26AM *  2 points [-]

To add a linear model (for those unfamiliar, see my HPMoR examples) which will really just recapitulate the simple averages calculation:

R> lw2009 <- read.csv("lw-2009.csv")
R> lw2011 <- read.csv("lw-2011.csv")
R> lw2012 <- read.csv("lw-2012.csv")
R>
R> # lwi2009 <- lw2009$IQ[!is.na(lw2009$IQ)]
R> # hand-cleaned:
R> lwi2009 <- c(120,125,128,129,130,130,130,130,130,130,130,130,130,131,132,132,133,134,136,138,138,139,139,140,
R> 140,140,140,140,140,140,140,140,141,142,144,145,145,145,148,148,150,150,150,150,152,154,154,
R> 155,155,155,156,158,158,160,160,160,160,162,163,164,165,166,170,171,173,180)
R> lwi2011 <- lw2011$IQ[!is.na(lw2011$IQ)]
R> lwi2012 <- lw2012$IQ[!is.na(lw2012$IQ)]
R>
R> xs <- c(rep(as.Date("2009-03-01"), length(lwi2009)), rep(as.Date("2011-11-01"), length(lwi2011)), rep(as.Date("2012-11-01"), length(lwi2012)))
R> ys <- c(lwi2009, lwi2011, lwi2012)
R> model <- lm(ys ~ xs)
R> summary(model)
Call:
lm(formula = ys ~ xs)
Residuals:
Min 1Q Median 3Q Max
-38.29 -8.29 -0.29 6.73 63.81
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 219.49064 19.42751 11.30 < 2e-16
xs -0.00519 0.00126 -4.11 4.5e-05
Residual standard error: 12.9 on 741 degrees of freedom
Multiple R-squared: 0.0222, Adjusted R-squared: 0.0209
F-statistic: 16.9 on 1 and 741 DF, p-value: 4.48e-05

<code>plot(xs,ys); abline(model)</code>

Comment author: satt 02 December 2012 10:21:25AM 4 points [-]

Note that Epiphany dates the 2009 survey to around March, while the other two surveys happened around November, so inputting the survey dates just as years lowballs the time gap between the first & second surveys. Your linear trend'll be a bit exaggerated.

Comment author: gwern 02 December 2012 06:55:26PM 4 points [-]

I've fixed it as appropriate.

Your linear trend'll be a bit exaggerated.

Before, the slope per year was -2.24 (minus 2.25 points a year), now the slope spits out as -0.00519 but if I'm understanding my changes right, the unit has switched from per year to per day and 365.25 times -0.005 IQ points per day is -1.896 per year.

2.25 vs 1.9 is fairly different.