2016 LessWrong Diaspora Survey Analysis: Part One (Meta and Demographics)

19 ingres 14 May 2016 06:09AM

2016 LessWrong Diaspora Survey Analysis

Overview

  • Results and Dataset
  • Meta
  • Demographics (You are here)
  • LessWrong Usage and Experience
  • LessWrong Criticism and Successorship
  • Diaspora Community Analysis
  • What it all means for LW 2.0
  • Mental Health Section
  • Basilisk Section/Analysis
  • Blogs and Media analysis
  • Politics
  • Calibration Question And Probability Question Analysis
  • Charity And Effective Altruism Analysis

Survey Meta

Introduction

Hello everybody, this is part one in a series of posts analyzing the 2016 LessWrong Diaspora Survey. The survey ran from March 24th to May 1st and had 3083 respondents.

Almost two thousand eight hundred and fifty hours were spent surveying this year and you've all waited nearly two months from the first survey response to the results writeup. While the results have been available for over a week, they haven't seen widespread dissemination in large part because they lacked a succinct summary of their contents.

When we started the survey in march I posted this graph showing the dropoff in question responses over time:

So it seems only reasonable to post the same graph with this years survey data:

(I should note that this analysis counts certain things as questions that the other chart does not, so it says there are many more questions than the previous survey when in reality where are about as many as last year.)

2016 Diaspora Survey Stats

Survey hours spent in total: 2849.818888888889

Average number of minutes spent on survey: 102.14404619673437

Median number of minutes spent on survey: 39.775

Mode minutes spent on survey: 20.266666666666666

The takeaway here seems to be that some people take a long time with the survey, raising the average. However, most people's survey time is somewhere below the forty five minute mark. LessWrong does a very long survey, and I wanted to make sure that investment was rewarded with a deep detailed analysis. Weighing in at over four thousand lines of python code, I hope the analysis I've put together is worth the wait.

Credits

I'd like to thank people who contributed to the analysis effort:

Bartosz Wroblewski

Kuudes on #lesswrong

Obormot on #lesswrong

Two anonymous contributors

And anybody else who I may have forgotten. Thanks again to Scott Alexander, who wrote the majority of the survey and ran it in 2014, and who has also been generous enough to license his part of the survey under a creative commons license along with mine.


Demographics

Age

The 2014 survey gave these numbers for age:

Age: 27.67 + 8.679 (22, 26, 31) [1490]

In 2016 the numbers were:

Mean: 28.108772669759592
Median: 26.0
Mode: 23.0

Most LWers are in their early to mid twenties, with some older LWers bringing up the average. The average is close enough to the former figure that we can probably say the LW demographic is in their 20's or 30's as a general rule.

Sex and Gender

In 2014 our gender ratio looked like this:

Female: 179, 11.9%
Male: 1311, 87.2%

In 2016 the proportion of women in the community went up by over four percent:

Male: 2021 83.5%
Female: 393 16.2%

One hypothesis on why this happened is that the 2016 survey focused on the diaspora rather than just LW. Diaspora communities plausibly have marginally higher rates of female membership. If I had more time I would write an analysis investigating the demographics of each diaspora community, but to answer this particular question I think a couple of SQL queries are illustrative:

(Note: ActiveMemberships one and two are 'LessWrong' and 'LessWrong Meetups' respectively.)
sqlite> select count(birthsex) from data where (ActiveMemberships_1 = "Yes" OR ActiveMemberships_2 = "Yes") AND birthsex="Male";
425
sqlite> select count(birthsex) from data where (ActiveMemberships_1 = "Yes" OR ActiveMemberships_2 = "Yes") AND birthsex="Female";
66
>>> 66 / (425 + 66)
0.13441955193482688

Well, maybe. Of course, before we wring our hands too much on this question it pays to remember that assigned sex at birth isn't the whole story. The gender question in 2014 had these results:

F (cisgender): 150, 10.0%
F (transgender MtF): 24, 1.6%
M (cisgender): 1245, 82.8%
M (transgender FtM): 5, 0.3%
Other: 64, 4.3%

In 2016:

F (cisgender): 321 13.3%
F (transgender MtF): 65 2.7%
M (cisgender): 1829 76%
M (transgender FtM): 23 1%
Other: 156 6.48%

Some things to note here. 16.2% of respondents were assigned female at birth but only 13.3% still identify as women. 1% are transmen, but where did the other 1.9% go? Presumably into the 'Other' field. Let's find out.

sqlite> select count(birthsex) from data where birthsex = "Female" AND gender = "Other";
57
sqlite> select count(*) from data;
3083
>>> 57 / 3083
0.018488485241647746

Seems to be the case. In general the proportion of men is down 6.1% from 2014. We also gained 1.1% transwomen and .7% transmen in 2016. Moving away from binary genders, this surveys nonbinary gender count gained in proportion by nearly 2.2%. This means that over one in twenty LWers identified as a nonbinary gender, making it a larger demographic than binary transgender LWers! As exciting as that may sound to some ears the numbers tell one story and the write ins tell quite another.

It pays to keep in mind that nonbinary genders are a common troll option for people who want to write in criticism of the question. A quick look at the write ins accompanying the other option indicates that this is what many people used it for, but by no means all. At 156 responses, that's small enough to be worth doing a quick manual tally.

"Other" Genders, Sample Size: 156
ClassificationCount
Agender 35
Esoteric 6
Female 6
Male 21
Male-To-Female 1
Nonbinary 55
Objection on Basis Gender Doesn't Exist 6
Objection on Basis Gender Is Binary 7
in Process of Transitioning 2
Refusal 7
Undecided 10

So depending on your comfort zone as to what constitutes a countable gender, there are 90 to 96 valid 'other' answers in the survey dataset. (Labeled dataset)

>>> 90 / 3083
0.029192345118391177

With some cleanup the number trails behind the binary transgender one by the greater part of a percentage point, but only by. I bet that if you went through and did the same sort of tally on the 2014 survey results you'd find that the proportion of valid nonbinary gender write ins has gone up between then and now.

Some interesting 'esoteric' answers: Attack Helocopter, Blackstar, Elizer, spiderman, Agenderfluid

For the rest of this section I'm going to just focus on differences between the 2016 and 2014 surveys.

2014 Demographics Versus 2016 Demographics

Country

United States: -1.000% 1298 53.700%
United Kingdom: -0.100% 183 7.600%
Canada: +0.100% 144 6.000%
Australia: +0.300% 141 5.800%
Germany: -0.600% 85 3.500%
Russia: +0.700% 57 2.400%
Finland: -0.300% 25 1.000%
New Zealand: -0.200% 26 1.100%
India: -0.100% 24 1.000%
Brazil: -0.300% 16 0.700%
France: +0.400% 34 1.400%
Israel: +0.200% 29 1.200%
Other: 354 14.646%

[Summing these all up to one shows that nearly 1% of change is unaccounted for. My hypothesis is that this 1% went into the other countries not in the list, this can't be easily confirmed because the 2014 analysis does not list the other country percentage.]

Race

Asian (East Asian): -0.600% 80 3.300%
Asian (Indian subcontinent): +0.300% 60 2.500%
Middle Eastern: 0.000% 14 0.600%
Black: -0.300% 12 0.500%
White (non-Hispanic): -0.300% 2059 85.800%
Hispanic: +0.300% 57 2.400%
Other: +1.200% 108 4.500%

Sexual Orientation

Heterosexual: -5.000% 1640 70.400%
Homosexual: +1.300% 103 4.400%
Bisexual: +4.000% 428 18.400%
Other: +3.880% 144 6.180%

[LessWrong got 5.3% more gay, 9.1% if you're more loose with the definition. Before we start any wild speculation, the 2014 question included asexuality as an option and it got 3.9% of the responses, we spun this off into a separate question on the 2016 survey which should explain a significant portion of the change.]

Are you asexual?

Yes: 171 0.074
No: 2129 0.926

[Scott said in 2014 that he'd probably 'vastly undercounted' our asexual readers, a near doubling in our count would seem to support this.]

Relationship Style

Prefer monogomous: -0.900% 1190 50.900%
Prefer polyamorous: +3.100% 426 18.200%
Uncertain/no preference: -2.100% 673 28.800%
Other: +0.426% 45 1.926%

[Polyamorous gained three points, presumably the drop in uncertain people went into that bin.]

Number of Partners

0: -2.300% 1094 46.800%
1: -0.400% 1039 44.400%
2: +1.200% 107 4.600%
3: +0.900% 46 2.000%
4: +0.100% 15 0.600%
5: +0.200% 8 0.300%
Lots and lots: +1.000% 29 1.200%

Relationship Goals

...and seeking more relationship partners: +0.200% 577 24.800%
...and possibly open to more relationship partners: -0.300% 716 30.800%
...and currently not looking for more relationship partners: +1.300% 1034 44.400%

Are you married?

Yes: 443 0.19
No: 1885 0.81

[This question appeared in a different form on the previous survey. Marriage went up by .8% from last year.]

Who do you currently live with most of the time?

Alone: -2.200% 487 20.800%
With parents and/or guardians: +0.100% 476 20.300%
With partner and/or children: +2.100% 687 29.400%
With roommates: -2.000% 619 26.500%

[This would seem to line up with the result that single LWers went down by 2.3%]

How many children do you have?

Sum: 598 or greater
0: +5.400% 2042 87.000%
1: +0.500% 115 4.900%
2: +0.100% 124 5.300%
3: +0.900% 48 2.000%
4: -0.100% 7 0.300%
5: +0.100% 6 0.300%
6: 0.000% 2 0.100%
Lots and lots: 0.000% 3 0.100%

[Interestingly enough, childless LWers went up by 5.4%. This would seem incongruous with the previous results. Not sure how to investigate though.]

Are you planning on having more children?

Yes: -5.400% 720 30.700%
Uncertain: +3.900% 755 32.200%
No: +2.800% 869 37.100%

[This is an interesting result, either nearly 4% of LWers are suddenly less enthusiastic about having kids, or new entrants to the survey are less likely and less sure if they want to. Possibly both.]

Work Status

Student: -5.402% 968 31.398%
Academics: +0.949% 205 6.649%
Self-employed: +4.223% 309 10.023%
Independently wealthy: +0.762% 42 1.362%
Non-profit work: +1.030% 152 4.930%
For-profit work: -1.756% 954 30.944%
Government work: +0.479% 135 4.379%
Homemaker: +1.024% 47 1.524%
Unemployed: +0.495% 228 7.395%

[Most interesting result here is that 5.4% of LWers are no longer students or new survey entrants aren't.]

Profession

Art: +0.800% 51 2.300%
Biology: +0.300% 49 2.200%
Business: -0.800% 72 3.200%
Computers (AI): +0.700% 79 3.500%
Computers (other academic, computer science): -0.100% 156 7.000%
Computers (practical): -1.200% 681 30.500%
Engineering: +0.600% 150 6.700%
Finance / Economics: +0.500% 116 5.200%
Law: -0.300% 50 2.200%
Mathematics: -1.500% 147 6.600%
Medicine: +0.100% 49 2.200%
Neuroscience: +0.100% 28 1.300%
Philosophy: 0.000% 54 2.400%
Physics: -0.200% 91 4.100%
Psychology: 0.000% 48 2.100%
Other: +2.199% 277 12.399%
Other "hard science": -0.500% 26 1.200%
Other "social science": -0.200% 48 2.100%

[The largest profession growth for LWers in 2016 was art, that or this is a consequence of new survey entrants.]

What is your highest education credential earned?

None: -0.700% 96 4.200%
High School: +3.600% 617 26.700%
2 year degree: +0.200% 105 4.500%
Bachelor's: -1.600% 815 35.300%
Master's: -0.500% 415 18.000%
JD/MD/other professional degree: 0.000% 66 2.900%
PhD: -0.700% 145 6.300%
Other: +0.288% 39 1.688%

[Hm, the academic credentials of LWers seems to have gone down some since the last survey. As usual this may also be the result of new survey entrants.]


Footnotes

  1. The 2850 hour estimate of survey hours is very naive. It measures the time between starting and turning in the survey, a person didn't necessarily sit there during all that time. For example this could easily be including people who spent multiple days doing other things before finally finishing their survey.

  2. The apache helicopter image is licensed under the Open Government License, which requires attribution. That particular edit was done by Wubbles on the LW Slack.

  3. The first published draft of this post made a basic stats error calculating the proportion of women in active memberships one and two, dividing the number of women by the number of men rather than the number of women by the number of men and women.
Comment author: ingres 02 May 2016 07:18:29PM *  8 points [-]

Update on where I'm at:

Right this minute I'm writing a tool that imports the survey structure into a python datastructure to improve the analysis. This might take a bit, but once it's done it should make developing a generic basic analysis to replace the current one much easier. It'll also let me fix issues like the answers being in a weird order, with this I'll be able to order them by the order they appeared on the survey. To clarify what I said earlier, I think I can get out a fixed basic analysis today. A formal writeup will probably take longer.

Sub-update (Mon May 2 17:36:47 PDT 2016):

Wrote the tool, now writing the analysis with it.

Sub-update (Mon May 2 22:58:34 PDT 2016):

I have a mostly-working prototype of the analysis, finishing it up now.

Sub-update (Mon May 2 23:26:28 PDT 2016):

I've reached the point where I'm too tired to do anymore today, but what I've done so far seems to be enough to patch up the holes in the report system. I'll finish it tomorrow but in the meantime:

Basic Analysis With Null Entries Included Basic Analysis With Null Entries Excluded

Comment author: Yvain 01 May 2016 05:11:57PM 19 points [-]

Nice work.

If possible, please do a formal writeup like this: http://lesswrong.com/lw/lhg/2014_survey_results/

If possible, please change the data on your PDF file to include an option to have it without nonresponders. For example, right now sex is 66% male, 12% female, unknown 22%, which makes it hard to intuitively tell what the actual sex ratio is. If you remove the unknowns you see that the knowns are 85% male 15% female, which is a much more useful result. This is especially true since up to 50% of people are unknowns on some questions.

If possible, please include averages for numerical questions. For example, there's no data about age on the PDF file because it just says everybody was a "responder" but doesn't list numbers.

Comment author: ingres 01 May 2016 05:21:45PM *  2 points [-]

On all of these. I'm a bit busy today though so expect them much later today or tomorrow.

Update (Tue May 3 21:34:49 PDT 2016):

Points two and three have been fixed, formal write up to follow.

Comment author: efenj 04 April 2016 06:18:25PM *  1 point [-]

Is there an easy way of printing one's replies (or saving them permanently for offline use), other than either:

  1. Printing out each separate page;
  2. Waiting for all the answers to be published and extracting one's own row (though that's suboptimal since the questions will presumably be absent and also, one has to wait)?

In the old survey/census I could print (to pdf) the entire form in one go.

Thanks for organising the survey!

Comment author: ingres 04 April 2016 09:19:00PM *  2 points [-]

Oh I'm sorry about that. It's actually an option in the software but I didn't turn it on because I couldn't imagine anybody would use it. ^^;

Fixing now.

EDIT: Should be an option now when you complete the survey, thanks!

Comment author: moridinamael 28 March 2016 01:49:30PM 4 points [-]

I'm a little unclear on how to proceed. I didn't establish a "save", so I can't really resume the survey. Does that mean I should start a new survey and pick up where I left off, or ... ?

Comment author: ingres 29 March 2016 05:34:13PM *  2 points [-]

If you'd be willing to go through the trouble of doing it, yes that's exactly what you should do. I didn't think of that, thanks.

Though from a data-consistency perspective people doing this would skew our response rate higher than it really is, I'd rather have the question data than an accurate response rate though so. shrug

On the session timeout front, we're trying something out to make the sessions longer, which should cut down on that particular problem significantly.

Comment author: moridinamael 27 March 2016 06:43:48PM 26 points [-]

Was taking it, and it crashed with a "This webpage is not available" error.

Comment author: ingres 27 March 2016 09:28:42PM 4 points [-]

We had some power outage related downtime for three hours or so, should be back up now.

Comment author: ingres 27 March 2016 01:32:13AM 17 points [-]

I'd like to make a miniature announcement so there isn't any confusion:

Most of the time when somebody writes in a suggestion for improving the questions I don't reply to it, I just silently upvote the post and write down the question in a list of things to do for the next survey. But I am reading them, and I plan to go through and read them again before I wrap up the final survey analysis.

Comment author: benwr 26 March 2016 08:33:27AM 7 points [-]

Great survey!

However, when you save your progress and are asked to save a password, there's no indication that it will be sent to you in an email or saved at all in recoverable form. I used my least-secure password generation algorithm anyway, but: Do you think you could add a note to the effect that users should not use passwords that they use elsewhere?

Comment author: ingres 26 March 2016 10:16:36PM *  5 points [-]

Looking into it now.

EDIT: Added this warning to the save form:

"We store the password and send it to you by email, so please do not use a 'trusted' password for this that you use for anything important." (Not our design decision by the way.)

Comment author: Viliam 26 March 2016 09:38:18PM *  5 points [-]

Error
We are sorry but your session has expired.
Either you have been inactive for too long, you have cookies disabled for your browser, or there were problems with your connection.
Please contact namespace ( root@localhost ) for further assistance.

If you have to leave the computer in the middle of the survey, the software will punish you by throwing away your already completed answers. Really sucks after having completed about 100 of them. :(

What the hell was the purpose of checking whether someone was "inactive for too long"? So what, they were inactive, now they are active again, what's the big deal? Sometimes real life intervenes.

(Problems with connections happen too; I have a crappy wi-fi connection that I often have to restart several times a day. But that wasn't the case now. Also, why can't the software deal with disabled cookies? Calling root@localhost and waiting for an explanation...)

EDIT: If you happen to find yourself in a similar situation, use the e-mail mentioned in the article. As long as you remember enough data to uniquely identify your half-written response, the situation can be fixed.

Comment author: ingres 26 March 2016 10:01:59PM *  5 points [-]

Hi.

What the hell was the purpose of checking whether someone was "inactive for too long"? So what, they were inactive, now they are active again, what's the big deal? Sometimes real life intervenes.

I have no idea why that happened and I'm really sorry. It's definitely not supposed to. root@localhost isn't a real email address it's just there to stymie system 'error' messages we were receiving that were bogus.

The real mailing address you want is jd@fortforecast.com. We'd love to talk to you.

Comment author: Morgrim 26 March 2016 05:11:49AM 5 points [-]

The questions on donating to charity only relate to donating money to charity. Some people who have sufficient free time but little disposable income donate time to charities instead. I have seen reports that donating time over money is more common amongst students and people of low income, who seem to be a smaller proportion of the LW diaspora, but it may be interesting to compare donated time vs money on future surveys.

In my experience donating one's time is also seen as being extra keen on that cause, presumably because it requires more effort, and there are certain causes that consider time more valuable than funds (eg local environmental causes, where hiring sufficient people to remove invasive weeds from a local swamp is more expensive than holding a big weeding exercise on a Saturday afternoon).

Comment author: ingres 26 March 2016 05:26:11AM *  3 points [-]

This is a really good point. It'd make an especially interesting question set because it would give us some idea of how seriously LWers take the comparative advantage idea when it comes to charity, as measured by their actions.

View more: Prev | Next