2016 LessWrong Diaspora Survey Results

namespace

Foreword:

As we wrap up the 2016 survey, I'd like to start by thanking everybody who took
the time to fill it out. This year we had 3083 respondents, more than twice the
number we had last year. (Source: http://lesswrong.com/lw/lhg/2014_survey_results/)
This seems consistent with the hypothesis that the LW community hasn't declined
in population so much as migrated into different communities. Being the *diaspora*
survey I had expectations for more responses than usual, but twice as many was
far beyond them.

Before we move on to the survey results, I feel obligated to put a few affairs
in order in regards to what should be done next time. The copyright situation
for the survey was ambiguous this year, and to prevent that from happening again
I'm pleased to announce that this years survey questions will be released jointly
by me and Scott Alexander as Creative Commons licensed content. We haven't
finalized the details of this yet so expect it sometime this month.

I would also be remiss not to mention the large amount of feedback we received
on the survey. Some of which led to actionable recommendations I'm going to
preserve here for whoever does it next:

- Put free response form at the very end to suggest improvements/complain.

- Fix metaethics question in general, lots of options people felt were missing.

- Clean up definitions of political affilations in the short politics section.
In particular, 'Communist' has an overly aggressive/negative definition.

- Possibly completely overhaul short politics section.

- Everywhere that a non-answer is taken as an answer should be changed so that
non answer means what it ought to, no answer or opinion. "Absence of a signal
should never be used as a signal." - Julian Bigelow, 1947

- Give a definition for the singularity on the question asking when you think it
will occur.

- Ask if people are *currently* suffering from depression. Possibly add more
probing questions on depression in general since the rates are so extraordinarily
high.

- Include a link to what cisgender means on the gender question.

- Specify if the income question is before or after taxes.

- Add charity questions about time donated.

- Add "ineligible to vote" option to the voting question.

- Adding some way for those who are pregnant to indicate it on the number of
children question would be nice. It might be onerous however so don't feel
obligated. (Remember that it's more important to have a smooth survey than it
is to catch every edge case.)

And read this thread: http://lesswrong.com/lw/nfk/lesswrong_2016_survey/,
it's full of suggestions, corrections and criticism.

Without further ado,

Basic Results:

2016 LessWrong Diaspora Survey Questions (PDF Format)

2016 LessWrong Diaspora Survey Results (PDF Format, Missing 23 Responses)

2016 LessWrong Diaspora Survey Results Complete (Text Format, Null Entries Included)

2016 LessWrong Diaspora Survey Results Complete (Text Format, Null Entries Excluded)

2016 LessWrong Diaspora Survey Results Complete (Text Format, Null Entries Included, 13 Responses Filtered, Percentages)

2016 LessWrong Diaspora Survey Results Complete (Text Format, Null Entries Excluded, 13 Responses Filtered, Percentages)

2016 LessWrong Diaspora Survey Results Complete (HTML Format, Null Entries Excluded)

Our report system is currently on the fritz and isn't calculating numeric questions. If I'd known this earlier I'd have prepared the results for said questions ahead of time. Instead they'll be coming out later today or tomorrow. (EDIT: These results are now in the text format survey results.)

Philosophy and Community Issues At LessWrong's Peak (Write Ins)

Peak Philosophy Issues Write Ins (Part One)

Peak Philosophy Issues Write Ins (Part Two)

Peak Community Issues Write Ins (Part One)

Peak Community Issues Write Ins (Part Two)

Philosophy and Community Issues Now (Write Ins)

Philosophy Issues Now Write Ins (Part One)

Philosophy Issues Now Write Ins (Part Two)

Community Issues Now Write Ins (Part One)

Community Issues Now Write Ins (Part Two)

Rejoin Conditions

Rejoin Condition Write Ins (Part One)

Rejoin Condition Write Ins (Part Two)

Rejoin Condition Write Ins (Part Three)

Rejoin Condition Write Ins (Part Four)

Rejoin Condition Write Ins (Part Five)

CC-Licensed Machine Readable Survey and Public Data

2016 LessWrong Diaspora Survey Structure (License)

2016 LessWrong Diaspora Survey Public Dataset

(Note for people looking to work with the dataset: My survey analysis code repository includes a sqlite converter, examples, and more coming soon. It's a great way to get up and running with the dataset really quickly.)

In depth analysis:

Analysis Posts

Part One: Meta and Demographics

Part Two: LessWrong Use, Successorship, Diaspora

Part Three: Mental Health, Basilisk, Blogs and Media

Part Four: Politics, Calibration & Probability, Futurology, Charity & Effective Altruism

Aggregated Data

Effective Altruism and Charitable Giving Analysis

Mental Health Stats By Diaspora Community (Including self dxers)

How Diaspora Communities Compare On Mental Health Stats (I suspect these charts are subtly broken somehow, will investigate later)

Improved Mental Health Charts By Obormot (Using public survey data)

Improved Mental Health Charts By Anonymous (Using full survey data)

Political Opinions By Political Affiliation

Political Opinions By Political Affiliation Charts (By anonymous)

Blogs And Media Demographic Clusters

Blogs And Media Demographic Clusters (HTML Format, Impossible Answers Excluded)

Calibration Question And Brier Score Analysis

More coming soon!

Survey Analysis Code

Some notes:

1. FortForecast on the communities section, Bayesed And Confused on the blogs section, and Synthesis on the stories section were all 'troll' answers designed to catch people who just put down everything. Somebody noted that the three 'fortforecast' users had the entire DSM split up between them, that's why.

2. Lots of people asked me for a list of all those cool blogs and stories and communities on the survey, they're included in the survey questions PDF above.

Public TODO:

1. Add more in depth analysis, fix the ones that decided to suddenly break at the last minute or I suspect were always broken.

2. Add a compatibility mode so that the current question codes are converted to older ones for 3rd party analysis that rely on them.

If anybody would like to help with these, write to jd@fortforecast.com

Nice work.

If possible, please do a formal writeup like this: http://lesswrong.com/lw/lhg/2014_survey_results/

If possible, please change the data on your PDF file to include an option to have it without nonresponders. For example, right now sex is 66% male, 12% female, unknown 22%, which makes it hard to intuitively tell what the actual sex ratio is. If you remove the unknowns you see that the knowns are 85% male 15% female, which is a much more useful result. This is especially true since up to 50% of people are unknowns on some questions.

If possible, please include averages for numerical questions. For example, there's no data about age on the PDF file because it just says everybody was a "responder" but doesn't list numbers.

On all of these. I'm a bit busy today though so expect them much later today or tomorrow.

Update (Tue May 3 21:34:49 PDT 2016):

Points two and three have been fixed, formal write up to follow.

Update on where I'm at:

Right this minute I'm writing a tool that imports the survey structure into a python datastructure to improve the analysis. This might take a bit, but once it's done it should make developing a generic basic analysis to replace the current one much easier. It'll also let me fix issues like the answers being in a weird order, with this I'll be able to order them by the order they appeared on the survey. To clarify what I said earlier, I think I can get out a fixed basic analysis today. A formal writeup will probably take longer.

Sub-update (Mon May 2 17:36:47 PDT 2016):

Wrote the tool, now writing the analysis with it.

Sub-update (Mon May 2 22:58:34 PDT 2016):

I have a mostly-working prototype of the analysis, finishing it up now.

Sub-update (Mon May 2 23:26:28 PDT 2016):

I've reached the point where I'm too tired to do anymore today, but what I've done so far seems to be enough to patch up the holes in the report system. I'll finish it tomorrow but in the meantime:

Basic Analysis With Null Entries Included Basic Analysis With Null Entries Excluded

Thanks for your work, ingres! I want to point out two possible errors in the data analysis output files:

In Blogs And Media Demographic Clusters, average age is 2.4
In the Superbabies section of Calibration Question And Brier Score Analysis, no people said they would have their child genetically modified for improvement purposes (e.g. to heighten their intelligence or reduce their risk of schizophrenia). This 0 for "yes" is for all Brier score groups, and it can't be right because:

1) on the next question, a lot of people answered they would have their child modified to change their eye color (but not to reduce risk of schizophrenia? doesn't make sense);

2) on a question further down, a lot have stated they have a positive opinion of such modifications.

Also, I remember answering "yes" there, so unless my survey answers got thrown out somehow, it can't be zero.

I'm somehow reminded of the scene in Logicomix where Russel figures that Godel should be given some kind of award for actually reading his Principia, though he's not sure what.

You went through the analysis files, closely enough to spot errors, and then actually went and reported them? Props.

Thanks. I'll treat myself to a cookie. There's more:

In Calibration Question And Brier Score Analysis:

All questions that ask for an answer "on a scale from one to five" show two 5.0 answers, with different percentages. I guess one of the fives should be None. That's 3 questions for each Brier score group.

In Blogs And Media Demographic Clusters:

If None means "no answer", then I think it is incorrectly given as 0.0 for some questions. For example, let's sum up the percentages for ReligiousViews for Cluster 0, where 0.0 is for None: 0.027+ 0.055 + 0.112 + 0.414 + 0.0 + 0.019 + 0.061=0.688 The percentages do not sum up to 1.
After the questions about children, there are 9 items labelled WorkStatus_1..9. I think these should be Student/Full Employment/Self-employed/Unemployed, etc.
In EducationCredentials, these are listed as two separate items: Bachelor's and Bachelors
Not an error, but maybe sort the answers to the number-of-children and number-of-partners questions? Looks a bit weird when they are scrambled and go like 0, 4, 5, 1, Lots and lots, 2, 3.

"Amount of EA money sent to top four GiveWell charities" might be low because GiveWell itself is not included in that list. (I ended up putting my donation to GiveWell under "other", which while technically accurate, wasn't ideal.) In addition to GiveWell specifically, it would have been worth having an option for Effective Altruism's sort of giving (charities directed at obvious, cost-effective ways of saving the lives of / improving the quality of life for the world's poorest), but not to organizations specifically recommended by GiveWell.

2.12% MtF, 0.75% FtM, 5.42% other of which roughly (just eyeballing, not counting) 33% seem trans. That'd be ~4.66% trans. Thoughts?

Last year had just below 2% transgender and 4% other. Either a slight increase in the number of trans people or a slightly higher percentage of trans people in the diaspora community than the LW core could explain it.

It's difficult to conclude what that implies. One can tell a story about how high openness means people are more likely to identify as trans, or one can tell a story about how rationality helps one realize those sorts of things, or one can tell a story about how more young people are identifying as some sort of queer and the rationality community skews young.

I mean, you could run correlations with Openness to experience or with age, right? I guess there's probably too small of a sample size to do a lot of interesting analysis with it, but I'm sure one could do some.

There's a first order effect--trans people having higher openness--and a second order effect--high openness communities having more trans people--and I'm more confident of the second than I am of the first.

When it comes to age, a quick calculation of average by group (throwing out three age outliers):

F_______26.8
MtF_____23.5
M_______28.7
FtM_____23.1
Other___26.1
Overall_28.1

Thanks for doing this!

The text file results look a little weird because the ordering of the options is inconsistent. If you look at, say, the Anarchist views on the Great Stagnation, you see that it goes:

Believe, No strong opinion, Strongly doubt, Strongly believe, Doubt

When a more natural ordering would be something like:

Strongly doubt, Doubt, No strong opinion, Believe, Strongly believe

(If you're scanning a particular question with ctrl-f, it's useful to be able to always see things in the same order.)

All the links to the data appear to have gone down.

Yes, I'm working on that. Hold tight.

When it comes to issues like the Briers score I think it would be nice to have an expended data set that includes them as additional rows.

Thanks a lot for doing this; I love seeing who makes up the community.

Question: Where in the data do we find the various write-in answers? Like for the blogs and such.

I'm going to add these as html/text files to the "Basic Results" section. Thanks for reminding me. In the mean time they're available in the public data release.

All the columns that end with [other] or contain the text "WriteIn" are free response forms. Specifically these columns:

[1] "BirthSex.other."             "Gender.other."               "Country.other."              "Race.other."
[5] "SexualOrientation.other."    "RelationshipStyle.other."    "LivingWith.other."           "Profession.other."
[9] "EducationCredentials.other." "ReligionType.other."         "FamilyReligion.other."       "ReligiousBackground.other."
[13] "ReferredBy.other."           "RokoKnowledgeSource.other."  "PeakPhilWriteInOne"          "PeakPhilWriteInTwo"
[17] "PeakCommWriteInOne"          "PeakCommWriteInTwo"          "NowPhilWriteInOne"           "NowPhilWriteInTwo"
[21] "NowCIssueWriteInOne"         "NowCIssueWriteInTwo"         "PhilosophyWriteIn1"          "PhilosophyWriteIn2"
[25] "PhilosophyWriteIn3"          "CommunityWriteIn1"           "CommunityWriteIn2"           "CommunityWriteIn3"
[29] "ActiveMemberships.other."    "BlogsReadWriteIn.SQ001."     "BlogsReadWriteIn.SQ2."       "BlogsReadWriteIn.SQ3."
[33] "BlogsReadWriteIn.SQ4."       "ComplexAffiliation.other."   "EndOfWorkConcerns.other."    "XRiskType.other."

Just replace the periods with brackets.

EDIT: Now they are available in the above post.