2016 LessWrong Diaspora Survey Analysis: Part One (Meta and Demographics)

namespace

2016 LessWrong Diaspora Survey Analysis

Overview

Survey Meta

Introduction

Hello everybody, this is part one in a series of posts analyzing the 2016 LessWrong Diaspora Survey. The survey ran from March 24th to May 1st and had 3083 respondents.

Almost two thousand eight hundred and fifty hours were spent surveying this year and you've all waited nearly two months from the first survey response to the results writeup. While the results have been available for over a week, they haven't seen widespread dissemination in large part because they lacked a succinct summary of their contents.

When we started the survey in march I posted this graph showing the dropoff in question responses over time:

So it seems only reasonable to post the same graph with this years survey data:

(I should note that this analysis counts certain things as questions that the other chart does not, so it says there are many more questions than the previous survey when in reality where are about as many as last year.)

2016 Diaspora Survey Stats

Survey hours spent in total: 2849.818888888889

Average number of minutes spent on survey: 102.14404619673437

Median number of minutes spent on survey: 39.775

Mode minutes spent on survey: 20.266666666666666

The takeaway here seems to be that some people take a long time with the survey, raising the average. However, most people's survey time is somewhere below the forty five minute mark. LessWrong does a very long survey, and I wanted to make sure that investment was rewarded with a deep detailed analysis. Weighing in at over four thousand lines of python code, I hope the analysis I've put together is worth the wait.

Credits

I'd like to thank people who contributed to the analysis effort:

Bartosz Wroblewski

Kuudes on #lesswrong

Obormot on #lesswrong

Two anonymous contributors

And anybody else who I may have forgotten. Thanks again to Scott Alexander, who wrote the majority of the survey and ran it in 2014, and who has also been generous enough to license his part of the survey under a creative commons license along with mine.

Demographics

Age

The 2014 survey gave these numbers for age:

Age: 27.67 + 8.679 (22, 26, 31) [1490]

In 2016 the numbers were:

Mean: 28.108772669759592
Median: 26.0
Mode: 23.0

Most LWers are in their early to mid twenties, with some older LWers bringing up the average. The average is close enough to the former figure that we can probably say the LW demographic is in their 20's or 30's as a general rule.

Sex and Gender

In 2014 our gender ratio looked like this:

Female: 179, 11.9%
Male: 1311, 87.2%

In 2016 the proportion of women in the community went up by over four percent:

Male: 2021 83.5%
Female: 393 16.2%

One hypothesis on why this happened is that the 2016 survey focused on the diaspora rather than just LW. Diaspora communities plausibly have marginally higher rates of female membership. If I had more time I would write an analysis investigating the demographics of each diaspora community, but to answer this particular question I think a couple of SQL queries are illustrative:

(Note: ActiveMemberships one and two are 'LessWrong' and 'LessWrong Meetups' respectively.)
sqlite> select count(birthsex) from data where (ActiveMemberships_1 = "Yes" OR ActiveMemberships_2 = "Yes") AND birthsex="Male";
425
sqlite> select count(birthsex) from data where (ActiveMemberships_1 = "Yes" OR ActiveMemberships_2 = "Yes") AND birthsex="Female";
66
>>> 66 / (425 + 66)\
0.13441955193482688\

Well, maybe. Of course, before we wring our hands too much on this question it pays to remember that assigned sex at birth isn't the whole story. The gender question in 2014 had these results:

F (cisgender): 150, 10.0%
F (transgender MtF): 24, 1.6%
M (cisgender): 1245, 82.8%
M (transgender FtM): 5, 0.3%
Other: 64, 4.3%

In 2016:

F (cisgender): 321 13.3%
F (transgender MtF): 65 2.7%
M (cisgender): 1829 76%
M (transgender FtM): 23 1%
Other: 156 6.48%

Some things to note here. 16.2% of respondents were assigned female at birth but only 13.3% still identify as women. 1% are transmen, but where did the other 1.9% go? Presumably into the 'Other' field. Let's find out.

sqlite> select count(birthsex) from data where birthsex = "Female" AND gender = "Other";
57
sqlite> select count(*) from data;
3083
>>> 57 / 3083
0.018488485241647746

Seems to be the case. In general the proportion of men is down 6.1% from 2014. We also gained 1.1% transwomen and .7% transmen in 2016. Moving away from binary genders, this surveys nonbinary gender count gained in proportion by nearly 2.2%. This means that over one in twenty LWers identified as a nonbinary gender, making it a larger demographic than binary transgender LWers! As exciting as that may sound to some ears the numbers tell one story and the write ins tell quite another.

It pays to keep in mind that nonbinary genders are a common troll option for people who want to write in criticism of the question. A quick look at the write ins accompanying the other option indicates that this is what many people used it for, but by no means all. At 156 responses, that's small enough to be worth doing a quick manual tally.

<table border="0"> <caption>"Other" Genders, Sample Size: 156 </caption> <thead> <tr> <th>Classification</th><th>Count</th> </tr> </thead> <tbody> <tr> <td>Agender</td> <td>35</td> </tr> <tr> <td>Esoteric</td> <td>6</td> </tr> <tr> <td>Female</td> <td>6</td> </tr> <tr> <td>Male</td> <td>21</td> </tr> <tr> <td>Male-To-Female</td> <td>1</td> </tr> <tr> <td>Nonbinary</td> <td>55</td> </tr> <tr> <td>Objection on Basis Gender Doesn't Exist</td> <td>6</td> </tr> <tr> <td>Objection on Basis Gender Is Binary</td> <td>7</td> </tr> <tr> <td>in Process of Transitioning</td> <td>2</td> </tr> <tr> <td>Refusal</td> <td>7</td> </tr> <tr> <td>Undecided</td> <td>10</td> </tr> </tbody> </table> So depending on your comfort zone as to what constitutes a countable gender, there are 90 to 96 valid 'other' answers in the survey dataset. (Labeled dataset)

>>> 90 / 3083
0.029192345118391177

With some cleanup the number trails behind the binary transgender one by the greater part of a percentage point, but only by. I bet that if you went through and did the same sort of tally on the 2014 survey results you'd find that the proportion of valid nonbinary gender write ins has gone up between then and now.

Some interesting 'esoteric' answers: Attack Helocopter, Blackstar, Elizer, spiderman, Agenderfluid

For the rest of this section I'm going to just focus on differences between the 2016 and 2014 surveys.

2014 Demographics Versus 2016 Demographics

Country

United States: -1.000% 1298 53.700%
United Kingdom: -0.100% 183 7.600%
Canada: +0.100% 144 6.000%
Australia: +0.300% 141 5.800%
Germany: -0.600% 85 3.500%
Russia: +0.700% 57 2.400%
Finland: -0.300% 25 1.000%
New Zealand: -0.200% 26 1.100%
India: -0.100% 24 1.000%
Brazil: -0.300% 16 0.700%
France: +0.400% 34 1.400%
Israel: +0.200% 29 1.200%
Other: 354 14.646%

[Summing these all up to one shows that nearly 1% of change is unaccounted for. My hypothesis is that this 1% went into the other countries not in the list, this can't be easily confirmed because the 2014 analysis does not list the other country percentage.]

Race

Asian (East Asian): -0.600% 80 3.300%
Asian (Indian subcontinent): +0.300% 60 2.500%
Middle Eastern: 0.000% 14 0.600%
Black: -0.300% 12 0.500%
White (non-Hispanic): -0.300% 2059 85.800%
Hispanic: +0.300% 57 2.400%
Other: +1.200% 108 4.500%

Sexual Orientation

Heterosexual: -5.000% 1640 70.400%
Homosexual: +1.300% 103 4.400%
Bisexual: +4.000% 428 18.400%
Other: +3.880% 144 6.180%

[LessWrong got 5.3% more gay, 9.1% if you're more loose with the definition. Before we start any wild speculation, the 2014 question included asexuality as an option and it got 3.9% of the responses, we spun this off into a separate question on the 2016 survey which should explain a significant portion of the change.]

Are you asexual?

Yes: 171 0.074
No: 2129 0.926

[Scott said in 2014 that he'd probably 'vastly undercounted' our asexual readers, a near doubling in our count would seem to support this.]

Relationship Style

Prefer monogomous: -0.900% 1190 50.900%
Prefer polyamorous: +3.100% 426 18.200%
Uncertain/no preference: -2.100% 673 28.800%
Other: +0.426% 45 1.926%

[Polyamorous gained three points, presumably the drop in uncertain people went into that bin.]

Number of Partners

0: -2.300% 1094 46.800%
1: -0.400% 1039 44.400%
2: +1.200% 107 4.600%
3: +0.900% 46 2.000%
4: +0.100% 15 0.600%
5: +0.200% 8 0.300%
Lots and lots: +1.000% 29 1.200%

Relationship Goals

...and seeking more relationship partners: +0.200% 577 24.800%
...and possibly open to more relationship partners: -0.300% 716 30.800%
...and currently not looking for more relationship partners: +1.300% 1034 44.400%

Are you married?

Yes: 443 0.19
No: 1885 0.81

[This question appeared in a different form on the previous survey. Marriage went up by .8% from last year.]

Who do you currently live with most of the time?

Alone: -2.200% 487 20.800%
With parents and/or guardians: +0.100% 476 20.300%
With partner and/or children: +2.100% 687 29.400%
With roommates: -2.000% 619 26.500%

[This would seem to line up with the result that single LWers went down by 2.3%]

How many children do you have?

Sum: 598 or greater
0: +5.400% 2042 87.000%
1: +0.500% 115 4.900%
2: +0.100% 124 5.300%
3: +0.900% 48 2.000%
4: -0.100% 7 0.300%
5: +0.100% 6 0.300%
6: 0.000% 2 0.100%
Lots and lots: 0.000% 3 0.100%

[Interestingly enough, childless LWers went up by 5.4%. This would seem incongruous with the previous results. Not sure how to investigate though.]

Are you planning on having more children?

Yes: -5.400% 720 30.700%
Uncertain: +3.900% 755 32.200%
No: +2.800% 869 37.100%

[This is an interesting result, either nearly 4% of LWers are suddenly less enthusiastic about having kids, or new entrants to the survey are less likely and less sure if they want to. Possibly both.]

Work Status

Student: -5.402% 968 31.398%
Academics: +0.949% 205 6.649%
Self-employed: +4.223% 309 10.023%
Independently wealthy: +0.762% 42 1.362%
Non-profit work: +1.030% 152 4.930%
For-profit work: -1.756% 954 30.944%
Government work: +0.479% 135 4.379%
Homemaker: +1.024% 47 1.524%
Unemployed: +0.495% 228 7.395%

[Most interesting result here is that 5.4% of LWers are no longer students or new survey entrants aren't.]

Profession

Art: +0.800% 51 2.300%
Biology: +0.300% 49 2.200%
Business: -0.800% 72 3.200%
Computers (AI): +0.700% 79 3.500%
Computers (other academic, computer science): -0.100% 156 7.000%
Computers (practical): -1.200% 681 30.500%
Engineering: +0.600% 150 6.700%
Finance / Economics: +0.500% 116 5.200%
Law: -0.300% 50 2.200%
Mathematics: -1.500% 147 6.600%
Medicine: +0.100% 49 2.200%
Neuroscience: +0.100% 28 1.300%
Philosophy: 0.000% 54 2.400%
Physics: -0.200% 91 4.100%
Psychology: 0.000% 48 2.100%
Other: +2.199% 277 12.399%
Other "hard science": -0.500% 26 1.200%
Other "social science": -0.200% 48 2.100%

[The largest profession growth for LWers in 2016 was art, that or this is a consequence of new survey entrants.]

What is your highest education credential earned?

None: -0.700% 96 4.200%
High School: +3.600% 617 26.700%
2 year degree: +0.200% 105 4.500%
Bachelor's: -1.600% 815 35.300%
Master's: -0.500% 415 18.000%
JD/MD/other professional degree: 0.000% 66 2.900%
PhD: -0.700% 145 6.300%
Other: +0.288% 39 1.688%

[Hm, the academic credentials of LWers seems to have gone down some since the last survey. As usual this may also be the result of new survey entrants.]

Footnotes

The 2850 hour estimate of survey hours is very naive. It measures the time between starting and turning in the survey, a person didn't necessarily sit there during all that time. For example this could easily be including people who spent multiple days doing other things before finally finishing their survey.
The apache helicopter image is licensed under the Open Government License, which requires attribution. That particular edit was done by Wubbles on the LW Slack.
The first published draft of this post made a basic stats error calculating the proportion of women in active memberships one and two, dividing the number of women by the number of men rather than the number of women by the number of men and women.

[-]ArgleBlargle10y180

Thanks for doing this.

[-]Adrià Garriga-alonso10y50

Gratitude thread.

What a load of work, Ingres. Thank you for doing this.

[-]DanArmak10y10

Thank you, Ingres.

[-]Morgrim10y100

When I was doing the survey I found the 'Highest Education Credential Earning' question difficult because the credentials listed don't match those in my home country, Australia. For example, we have a system of "technical certificates" that fall in between High School and Bachelor's degrees. (I think I chose '2 year degree' as the closest approximation, even though mine only took 1 year to complete.) And I know that doing a Bachelors in some areas is the functional equivalent of doing a Masters in others.

Would a question asking for how many years of post-schooling study one has completed be more or less useful? The wording could be tricky, since then there is ambiguity about whether to list time spent if one is part way through a qualification. If the majority of respondents are from places that match the listed options then mucking about with the question may not be of much value either.

[-]taryneast10y00

I have exactly the same problem because I did an honours-year... which is halfway between a Bachelor's and a Masters.

[-]PipFoweraker10y00

I ran into this issue as well, being relatively well credentialed professionally and through the TAFE / AQF framework. It's hard to know where to put the scale, so I normally do an equivalence of hours-studied-full-time-loading in my head and use that.

[-]ShardPhoenix10y70

Asian (East Asian): -0.600% 80 3.300%
Asian (Indian subcontinent): +0.300% 60 2.500%

Something I've been curious about for a while is the low proportion of Asian and Indian people in the LWsphere compared to STEM communties in general. Any ideas?

[-]username210y30

There are very few East Asians in Europe, compared to North America. About 30 percent of LWers are from Europe.

[-]knb10y20

I'm not sure that "STEM communities" is a valid reference group for LW.

[-]FourFire10y50

Attack Helicopter is probably a reference to this.

[-]Houshalter10y30

What's more surprising is that a lot of people put "male" in the "gender:other" write in. Often with a message protesting that they don't believe in other genders. Very strange, and possibly messes up the data a bit.

[-]Fluttershy10y20

Hat tip goes to an anonymous friend of mine who had been playing around with the survey data, and noticed that all MtF and FtM trans survey respondents reported being bisexual.

[-]username210y90

I'm sorry, that can't be right. In the most recent public data, there are 11 homosexual MtFs, 3 homosexual FtMs, and 1 heterosexual FtM, not to mention the respondents who chose "Other". In Python:

  >>> import pandas as pd
  >>> survey = pd.read_csv('2016_lw_survey_public_release_3.csv')
  >>> print survey.groupby('Gender')['SexualOrientation'].value_counts()

After dropping the rows where the IQ was lower than 80 or higher than 190, age lower than 14 or higher than 60, and income higher than 300,000, and dropping the rows where the IQ, age or income were N/A, there still remain 5 homosexual MtFs and a couple pansexuals. Perhaps your anonymous friend was somewhat aggressive in pruning the data?

Good job on catching that, and thank you for mentioning it :)