Manfred comments on 2012 Survey Results - Less Wrong

80 Post author: Yvain 07 December 2012 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (640)

You are viewing a single comment's thread.

Comment author: Manfred 30 November 2012 12:18:49AM *  3 points [-]

Women were on average newer to the community - 21 months vs. 39 for men - but to my surprise a t-test was unable to declare this significant. Maybe I'm doing it wrong?

Well, possibly. The t-distribution is used for "estimating the mean of a normally distributed population," (yay wikipedia) and you're trying to estimate the mean of a slanted-uniformly-distributed-with-a-spike-at-the-beginning population.

But there is another important consideration, which is that applying more scrutiny to unexpected results gives you systematic error (confirmation bias), and that's bad. To avoid this big problem, any increase in test quality should probably be part of a wholesale reanalysis, i.e. prolly not gonna happen. But there is another route, which is just accepting that your results are imperfect and widening your mental error bars. After all, where does this systematic error come from when you re-analyze unexpected results? It comes from you making mistakes on other things too, but not re-analyzing them! So once you know about the systematic error, you also know about all these other mistakes you have on average made :P

Comment author: gwern 30 November 2012 01:30:52AM *  3 points [-]

Well, possibly. The t-distribution is used for "estimating the mean of a normally distributed population," (yay wikipedia) and you're trying to estimate the mean of a slanted-uniformly-distributed-with-a-spike-at-the-beginning population.

Yeah, it'd have to be some combination of a uniform Poisson (since we don't seem to be growing a lot, per Yvain) and an exponential distribution (constant mortality of users). If we graph histograms, either blunt or finegrained, it looks like that but also with weird huge spikes besides the original OB->LW spike:

R> hist(as.numeric(as.character(lw$TimeinCommunity)))

R> hist(as.numeric(as.character(lw$TimeinCommunity)), breaks=50)

But on the plus side, if we look at the genders as a box plot, we discover why the mean is lower for women but there's not significance:

R> lwm <- subset(lw, as.character(Gender)=="M (cisgender)")
R> lwf <- subset(lw, as.character(Gender)=="F (cisgender)")
R> boxplot(as.numeric(lwm$TimeinCommunity), as.numeric(lwf$TimeinCommunity))

There are, after all, many fewer women.

Comment author: VincentYu 02 December 2012 11:17:10AM 2 points [-]

but also with weird huge spikes besides the original OB->LW spike

The spikes are just due to people estimating in half-years: 12, 18, 24, 30, 36.