You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

2016 LessWrong Diaspora Survey Analysis: Part Four (Politics, Calibration & Probability, Futurology, Charity & Effective Altruism)

10 ingres 10 September 2016 03:51AM

Politics

The LessWrong survey has a very involved section dedicated to politics. In previous analysis the benefits of this weren't fully realized. In the 2016 analysis we can look at not just the political affiliation of a respondent, but what beliefs are associated with a certain affiliation. The charts below summarize most of the results.

Political Opinions By Political Affiliation



































Miscellaneous Politics

There were also some other questions in this section which aren't covered by the above charts.

PoliticalInterest

On a scale from 1 (not interested at all) to 5 (extremely interested), how would you describe your level of interest in politics?

1: 67 (2.182%)

2: 257 (8.371%)

3: 461 (15.016%)

4: 595 (19.381%)

5: 312 (10.163%)

Voting

Did you vote in your country's last major national election? (LW Turnout Versus General Election Turnout By Country)
Group Turnout
LessWrong 68.9%
Austrailia 91%
Brazil 78.90%
Britain 66.4%
Canada 68.3%
Finland 70.1%
France 79.48%
Germany 71.5%
India 66.3%
Israel 72%
New Zealand 77.90%
Russia 65.25%
United States 54.9%
Numbers taken from Wikipedia, accurate as of the last general election in each country listed at time of writing.

AmericanParties

If you are an American, what party are you registered with?

Democratic Party: 358 (24.5%)

Republican Party: 72 (4.9%)

Libertarian Party: 26 (1.8%)

Other third party: 16 (1.1%)

Not registered for a party: 451 (30.8%)

(option for non-Americans who want an option): 541 (37.0%)

Calibration And Probability Questions

Calibration Questions

I just couldn't analyze these, sorry guys. I put many hours into trying to get them into a decent format I could even read and that sucked up an incredible amount of time. It's why this part of the survey took so long to get out. Thankfully another LessWrong user, Houshalter, has kindly done their own analysis.

All my calibration questions were meant to satisfy a few essential properties:

  1. They should be 'self contained'. I.E, something you can reasonably answer or at least try to answer with a 5th grade science education and normal life experience.
  2. They should, at least to a certain extent, be Fermi Estimable.
  3. They should progressively scale in difficulty so you can see whether somebody understands basic probability or not. (eg. In an 'or' question do they put a probability of less than 50% of being right?)

At least one person requested a workbook, so I might write more in the future. I'll obviously write more for the survey.

Probability Questions

Question Mean Median Mode Stdev
Please give the obvious answer to this question, so I can automatically throw away all surveys that don't follow the rules: What is the probability of a fair coin coming up heads? 49.821 50.0 50.0 3.033
What is the probability that the Many Worlds interpretation of quantum mechanics is more or less correct? 44.599 50.0 50.0 29.193
What is the probability that non-human, non-Earthly intelligent life exists in the observable universe? 75.727 90.0 99.0 31.893
...in the Milky Way galaxy? 45.966 50.0 10.0 38.395
What is the probability that supernatural events (including God, ghosts, magic, etc) have occurred since the beginning of the universe? 13.575 1.0 1.0 27.576
What is the probability that there is a god, defined as a supernatural intelligent entity who created the universe? 15.474 1.0 1.0 27.891
What is the probability that any of humankind's revealed religions is more or less correct? 10.624 0.5 1.0 26.257
What is the probability that an average person cryonically frozen today will be successfully restored to life at some future time, conditional on no global catastrophe destroying civilization before then? 21.225 10.0 5.0 26.782
What is the probability that at least one person living at this moment will reach an age of one thousand years, conditional on no global catastrophe destroying civilization in that time? 25.263 10.0 1.0 30.510
What is the probability that our universe is a simulation? 25.256 10.0 50.0 28.404
What is the probability that significant global warming is occurring or will soon occur, and is primarily caused by human actions? 83.307 90.0 90.0 23.167
What is the probability that the human race will make it to 2100 without any catastrophe that wipes out more than 90% of humanity? 76.310 80.0 80.0 22.933

 

Probability questions is probably the area of the survey I put the least effort into. My plan for next year is to overhaul these sections entirely and try including some Tetlock-esque forecasting questions, a link to some advice on how to make good predictions, etc.

Futurology

This section got a bit of a facelift this year. Including new cryonics questions, genetic engineering, and technological unemployment in addition to the previous years.

Cryonics

Cryonics

Are you signed up for cryonics?

Yes - signed up or just finishing up paperwork: 48 (2.9%)

No - would like to sign up but unavailable in my area: 104 (6.3%)

No - would like to sign up but haven't gotten around to it: 180 (10.9%)

No - would like to sign up but can't afford it: 229 (13.8%)

No - still considering it: 557 (33.7%)

No - and do not want to sign up for cryonics: 468 (28.3%)

Never thought about it / don't understand: 68 (4.1%)

CryonicsNow

Do you think cryonics, as currently practiced by Alcor/Cryonics Institute will work?

Yes: 106 (6.6%)

Maybe: 1041 (64.4%)

No: 470 (29.1%)

Interestingly enough, of those who think it will work with enough confidence to say 'yes', only 14 are actually signed up for cryonics.

sqlite> select count(*) from data where CryonicsNow="Yes" and Cryonics="Yes - signed up or just finishing up paperwork";

14

sqlite> select count(*) from data where CryonicsNow="Yes" and (Cryonics="Yes - signed up or just finishing up paperwork" OR Cryonics="No - would like to sign up but unavailable in my area" OR "No - would like to sign up but haven't gotten around to it" OR "No - would like to sign up but can't afford it");

34

CryonicsPossibility

Do you think cryonics works in principle?

Yes: 802 (49.3%)

Maybe: 701 (43.1%)

No: 125 (7.7%)

LessWrongers seem to be very bullish on the underlying physics of cryonics even if they're not as enthusiastic about current methods in use.

The Brain Preservation Foundation also did an analysis of cryonics responses to the LessWrong Survey.

Singularity

SingularityYear

By what year do you think the Singularity will occur? Answer such that you think, conditional on the Singularity occurring, there is an even chance of the Singularity falling before or after this year. If you think a singularity is so unlikely you don't even want to condition on it, leave this question blank.

Mean: 8.110300081581755e+16

Median: 2080.0

Mode: 2100.0

Stdev: 2.847858859055733e+18

I didn't bother to filter out the silly answers for this.

Obviously it's a bit hard to see without filtering out the uber-large answers, but the median doesn't seem to have changed much from the 2014 survey.

Genetic Engineering

ModifyOffspring

Would you ever consider having your child genetically modified for any reason?

Yes: 1552 (95.921%)

No: 66 (4.079%)

Well that's fairly overwhelming.

GeneticTreament

Would you be willing to have your child genetically modified to prevent them from getting an inheritable disease?

Yes: 1387 (85.5%)

Depends on the disease: 207 (12.8%)

No: 28 (1.7%)

I find it amusing how the strict "No" group shrinks considerably after this question.

GeneticImprovement

Would you be willing to have your child genetically modified for improvement purposes? (eg. To heighten their intelligence or reduce their risk of schizophrenia.)

Yes : 0 (0.0%)

Maybe a little: 176 (10.9%)

Depends on the strength of the improvements: 262 (16.2%)

No: 84 (5.2%)

Yes I know 'yes' is bugged, I don't know what causes this bug and despite my best efforts I couldn't track it down. There is also an issue here where 'reduce your risk of schizophrenia' is offered as an example which might confuse people, but the actual science of things cuts closer to that than it does to a clean separation between disease risk and 'improvement'.

 

This question is too important to just not have an answer to so I'll do it manually. Unfortunately I can't easily remove the 'excluded' entries so that we're dealing with the exact same distribution but only 13 or so responses are filtered out anyway.

sqlite> select count(*) from data where GeneticImprovement="Yes";

1100

>>> 1100 + 176 + 262 + 84
1622
>>> 1100 / 1622
0.6781750924784217

67.8% are willing to genetically engineer their children for improvements.

GeneticCosmetic

Would you be willing to have your child genetically modified for cosmetic reasons? (eg. To make them taller or have a certain eye color.)

Yes: 500 (31.0%)

Maybe a little: 381 (23.6%)

Depends on the strength of the improvements: 277 (17.2%)

No: 455 (28.2%)

These numbers go about how you would expect, with people being progressively less interested the more 'shallow' a genetic change is seen as.


GeneticOpinionD

What's your overall opinion of other people genetically modifying their children for disease prevention purposes?

Positive: 1177 (71.7%)

Mostly Positive: 311 (19.0%)

No strong opinion: 112 (6.8%)

Mostly Negative: 29 (1.8%)

Negative: 12 (0.7%)

GeneticOpinionI

What's your overall opinion of other people genetically modifying their children for improvement purposes?

Positive: 737 (44.9%)

Mostly Positive: 482 (29.4%)

No strong opinion: 273 (16.6%)

Mostly Negative: 111 (6.8%)

Negative: 38 (2.3%)

GeneticOpinionC

What's your overall opinion of other people genetically modifying their children for cosmetic reasons?

Positive: 291 (17.7%)

Mostly Positive: 290 (17.7%)

No strong opinion: 576 (35.1%)

Mostly Negative: 328 (20.0%)

Negative: 157 (9.6%)

All three of these seem largely consistent with peoples personal preferences about modification. Were I inclined I could do a deeper analysis that actually takes survey respondents row by row and looks at correlation between preference for ones own children and preference for others.

Technological Unemployment

LudditeFallacy

Do you think the Luddite's Fallacy is an actual fallacy?

Yes: 443 (30.936%)

No: 989 (69.064%)

We can use this as an overall measure of worry about technological unemployment, which would seem to be high among the LW demographic.

UnemploymentYear

By what year do you think the majority of people in your country will have trouble finding employment for automation related reasons? If you think this is something that will never happen leave this question blank.

Mean: 2102.9713740458014

Median: 2050.0

Mode: 2050.0

Stdev: 1180.2342850727339

Question is flawed because you can't distinguish answers of "never happen" from people who just didn't see it.

Interesting question that would be fun to take a look at in comparison to the estimates for the singularity.

EndOfWork

Do you think the "end of work" would be a good thing?

Yes: 1238 (81.287%)

No: 285 (18.713%)

Fairly overwhelming consensus, but with a significant minority of people who have a dissenting opinion.

EndOfWorkConcerns

If machines end all or almost all employment, what are your biggest worries? Pick two.

Question Count Percent
People will just idle about in destructive ways 513 16.71%
People need work to be fulfilled and if we eliminate work we'll all feel deep existential angst 543 17.687%
The rich are going to take all the resources for themselves and leave the rest of us to starve or live in poverty 1066 34.723%
The machines won't need us, and we'll starve to death or be otherwise liquidated 416 13.55%
Question is flawed because it demanded the user 'pick two' instead of up to two.

The plurality of worries are about elites who refuse to share their wealth.

Existential Risk

XRiskType

Which disaster do you think is most likely to wipe out greater than 90% of humanity before the year 2100?

Nuclear war: +4.800% 326 (20.6%)

Asteroid strike: -0.200% 64 (4.1%)

Unfriendly AI: +1.000% 271 (17.2%)

Nanotech / grey goo: -2.000% 18 (1.1%)

Pandemic (natural): +0.100% 120 (7.6%)

Pandemic (bioengineered): +1.900% 355 (22.5%)

Environmental collapse (including global warming): +1.500% 252 (16.0%)

Economic / political collapse: -1.400% 136 (8.6%)

Other: 35 (2.217%)

Significantly more people worried about Nuclear War than last year. Effect of new respondents, or geopolitical situation? Who knows.

Charity And Effective Altruism

Charitable Giving

Income

What is your approximate annual income in US dollars (non-Americans: convert at www.xe.com)? Obviously you don't need to answer this question if you don't want to. Please don't include commas or dollar signs.

Sum: 66054140.47384

Mean: 64569.052271593355

Median: 40000.0

Mode: 30000.0

Stdev: 107297.53606321265

IncomeCharityPortion

How much money, in number of dollars, have you donated to charity over the past year? (non-Americans: convert to dollars at http://www.xe.com/ ). Please don't include commas or dollar signs in your answer. For example, 4000

Sum: 2389900.6530000004

Mean: 2914.5129914634144

Median: 353.0

Mode: 100.0

Stdev: 9471.962766896671

XriskCharity

How much money have you donated to charities aiming to reduce existential risk (other than MIRI/CFAR) in the past year?

Sum: 169300.89

Mean: 1991.7751764705883

Median: 200.0

Mode: 100.0

Stdev: 9219.941506342007

CharityDonations

How much have you donated in US dollars to the following charities in the past year? (Non-americans: convert to dollars at http://www.xe.com/) Please don't include commas or dollar signs in your answer. Options starting with "any" aren't the name of a charity but a category of charity.

Question Sum Mean Median Mode Stdev
Against Malaria Foundation 483935.027 1905.256 300.0 None 7216.020
Schistosomiasis Control Initiative 47908.0 840.491 200.0 1000.0 1618.785
Deworm the World Initiative 28820.0 565.098 150.0 500.0 1432.712
GiveDirectly 154410.177 1429.723 450.0 50.0 3472.082
Any kind of animal rights charity 83130.47 1093.821 154.235 500.0 2313.493
Any kind of bug rights charity 1083.0 270.75 157.5 None 353.396
Machine Intelligence Research Institute 141792.5 1417.925 100.0 100.0 5370.485
Any charity combating nuclear existential risk 491.0 81.833 75.0 100.0 68.060
Any charity combating global warming 13012.0 245.509 100.0 10.0 365.542
Center For Applied Rationality 127101.0 3177.525 150.0 100.0 12969.096
Strategies for Engineered Negligible Senescence Research Foundation 9429.0 554.647 100.0 20.0 1156.431
Wikipedia 12765.5 53.189 20.0 10.0 126.444
Internet Archive 2975.04 80.406 30.0 50.0 173.791
Any campaign for political office 38443.99 366.133 50.0 50.0 1374.305
Other 564890.46 1661.442 200.0 100.0 4670.805
"Bug Rights" charity was supposed to be a troll fakeout but apparently...

This table is interesting given the recent debates about how much money certain causes are 'taking up' in Effective Altruism.

Effective Altruism

Vegetarian

Do you follow any dietary restrictions related to animal products?

Yes, I am vegan: 54 (3.4%)

Yes, I am vegetarian: 158 (10.0%)

Yes, I restrict meat some other way (pescetarian, flexitarian, try to only eat ethically sourced meat): 375 (23.7%)

No: 996 (62.9%)

EAKnowledge

Do you know what Effective Altruism is?

Yes: 1562 (89.3%)

No but I've heard of it: 114 (6.5%)

No: 74 (4.2%)

EAIdentity

Do you self-identify as an Effective Altruist?

Yes: 665 (39.233%)

No: 1030 (60.767%)

The distribution given by the 2014 survey results does not sum to one, so it's difficult to determine if Effective Altruism's membership actually went up or not but if we take the numbers at face value it experienced an 11.13% increase in membership.

EACommunity

Do you participate in the Effective Altruism community?

Yes: 314 (18.427%)

No: 1390 (81.573%)

Same issue as last, taking the numbers at face value community participation went up by 5.727%

EADonations

Has Effective Altruism caused you to make donations you otherwise wouldn't?

Yes: 666 (39.269%)

No: 1030 (60.731%)

Wowza!

Effective Altruist Anxiety

EAAnxiety

Have you ever had any kind of moral anxiety over Effective Altruism?

Yes: 501 (29.6%)

Yes but only because I worry about everything: 184 (10.9%)

No: 1008 (59.5%)


There's an ongoing debate in Effective Altruism about what kind of rhetorical strategy is best for getting people on board and whether Effective Altruism is causing people significant moral anxiety.

It certainly appears to be. But is moral anxiety effective? Let's look:

Sample Size: 244
Average amount of money donated by people anxious about EA who aren't EAs: 257.5409836065574

Sample Size: 679
Average amount of money donated by people who aren't anxious about EA who aren't EAs: 479.7501384388807

Sample Size: 249 Average amount of money donated by EAs anxious about EA: 1841.5292369477913

Sample Size: 314
Average amount of money donated by EAs not anxious about EA: 1837.8248407643312

It seems fairly conclusive that anxiety is not a good way to get people to donate more than they already are, but is it a good way to get people to become Effective Altruists?

Sample Size: 1685
P(Effective Altruist): 0.3940652818991098
P(EA Anxiety): 0.29554896142433235
P(Effective Altruist | EA Anxiety): 0.5

Maybe. There is of course an argument to be made that sufficient good done by causing people anxiety outweighs feeding into peoples scrupulosity, but it can be discussed after I get through explaining it on the phone to wealthy PR-conscious donors and telling the local all-kill shelter where I want my shipment of dead kittens.

EAOpinion

What's your overall opinion of Effective Altruism?

Positive: 809 (47.6%)

Mostly Positive: 535 (31.5%)

No strong opinion: 258 (15.2%)

Mostly Negative: 75 (4.4%)

Negative: 24 (1.4%)

EA appears to be doing a pretty good job of getting people to like them.

Interesting Tables

Charity Donations By Political Affilation
Affiliation Income Charity Contributions % Income Donated To Charity Total Survey Charity % Sample Size
Anarchist 1677900.0 72386.0 4.314% 3.004% 50
Communist 298700.0 19190.0 6.425% 0.796% 13
Conservative 1963000.04 62945.04 3.207% 2.612% 38
Futarchist 1497494.1099999999 166254.0 11.102% 6.899% 31
Left-Libertarian 9681635.613839999 416084.0 4.298% 17.266% 245
Libertarian 11698523.0 214101.0 1.83% 8.885% 190
Moderate 3225475.0 90518.0 2.806% 3.756% 67
Neoreactionary 1383976.0 30890.0 2.232% 1.282% 28
Objectivist 399000.0 1310.0 0.328% 0.054% 10
Other 3150618.0 85272.0 2.707% 3.539% 132
Pragmatist 5087007.609999999 266836.0 5.245% 11.073% 131
Progressive 8455500.440000001 368742.78 4.361% 15.302% 217
Social Democrat 8000266.54 218052.5 2.726% 9.049% 237
Socialist 2621693.66 78484.0 2.994% 3.257% 126


Number Of Effective Altruists In The Diaspora Communities
Community Count % In Community Sample Size
LessWrong 136 38.418% 354
LessWrong Meetups 109 50.463% 216
LessWrong Facebook Group 83 48.256% 172
LessWrong Slack 22 39.286% 56
SlateStarCodex 343 40.98% 837
Rationalist Tumblr 175 49.716% 352
Rationalist Facebook 89 58.94% 151
Rationalist Twitter 24 40.0% 60
Effective Altruism Hub 86 86.869% 99
Good Judgement(TM) Open 23 74.194% 31
PredictionBook 31 51.667% 60
Hacker News 91 35.968% 253
#lesswrong on freenode 19 24.675% 77
#slatestarcodex on freenode 9 24.324% 37
#chapelperilous on freenode 2 18.182% 11
/r/rational 117 42.545% 275
/r/HPMOR 110 47.414% 232
/r/SlateStarCodex 93 37.959% 245
One or more private 'rationalist' groups 91 47.15% 193


Effective Altruist Donations By Political Affiliation
Affiliation EA Income EA Charity Sample Size
Anarchist 761000.0 57500.0 18
Futarchist 559850.0 114830.0 15
Left-Libertarian 5332856.0 361975.0 112
Libertarian 2725390.0 114732.0 53
Moderate 583247.0 56495.0 22
Other 1428978.0 69950.0 49
Pragmatist 1442211.0 43780.0 43
Progressive 4004097.0 304337.78 107
Social Democrat 3423487.45 149199.0 93
Socialist 678360.0 34751.0 41

Recent updates to gwern.net (2015-2016)

28 gwern 26 August 2016 07:22PM

Previously: 2011; 2012-2013; 2013-2014; 2014-2015

"When I was one-and-twenty / I heard a wise man say, / 'Give crowns and pounds and guineas / But not your heart away; / Give pearls away and rubies / But keep your fancy free.' / But I was one-and-twenty, / No use to talk to me."

My past year of completed writings, sorted by topic:

Genetics:

  • Embryo selection for intelligence cost-benefit analysis
    • meta-analysis of intelligence GCTAs, limits set by measurement error, current polygenic scores, possible gains with current IVF procedures, the benefits of selection on multiple complex traits, the possible annual value in the USA of selection & value of larger GWASes, societal consequences of various embryo selection scenarios, embryo count versus polygenic scores as limiting factors, comparison with iterated embryo selection, limits to total gains from iterated embryo selection etc.
  • Wikipedia article on Genome-wide complex trait analysis (GCTA)

AI:

Biology:

Statistics:

Cryptography:

Misc:

gwern.net itself has remained largely stable (some CSS fixes and image size changes); I continue to use Patreon and send out my newsletters.

2016 LessWrong Diaspora Survey Analysis: Part Three (Mental Health, Basilisk, Blogs and Media)

15 ingres 25 June 2016 03:40AM

2016 LessWrong Diaspora Survey Analysis

Overview


Mental Health

We decided to move the Mental Health section up closer in the survey this year so that the data could inform accessibility decisions.

LessWrong Mental Health As Compared To Base Rates In The General Population
Condition Base Rate LessWrong Rate LessWrong Self dx Rate Combined LW Rate Base/LW Rate Spread Relative Risk
Depression 17% 25.37% 27.04% 52.41% +8.37 1.492
Obsessive Compulsive Disorder 2.3% 2.7% 5.6% 8.3% +0.4 1.173
Autism Spectrum Disorder 1.47% 8.2% 12.9% 21.1% +6.73 5.578
Attention Deficit Disorder 5% 13.6% 10.4% 24% +8.6 2.719
Bipolar Disorder 3% 2.2% 2.8% 5% -0.8 0.733
Anxiety Disorder(s) 29% 13.7% 17.4% 31.1% -15.3 0.472
Borderline Personality Disorder 5.9% 0.6% 1.2% 1.8% -5.3 0.101
Schizophrenia 1.1% 0.8% 0.4% 1.2% -0.3 0.727
Substance Use Disorder 10.6% 1.3% 3.6% 4.9% -9.3 0.122

Base rates are taken from Wikipedia, US rates were favored over global rates where immediately available.

Accessibility Suggestions

So of the conditions we asked about, LessWrongers are at significant extra risk for three of them: Autism, ADHD, Depression.

LessWrong probably doesn't need to concern itself with being more accessible to those with autism as it likely already is. Depression is a complicated disorder with no clear interventions that can be easily implemented as site or community policy. It might be helpful to encourage looking more at positive trends in addition to negative ones, but the community already seems to do a fairly good job of this. (We could definitely use some more of it though.)

Attention Deficit Disorder - Public Service Announcement

That leaves ADHD, which we might be able to do something about, starting with this:

A lot of LessWrong stuff ends up falling into the same genre as productivity advice or 'self help'. If you have trouble with getting yourself to work, find yourself reading these things and completely unable to implement them, it's entirely possible that you have a mental health condition which impacts your executive function.

The best overview I've been able to find on ADD is this talk from Russell Barkely.

30 Essential Ideas For Parents

Ironically enough, this is a long talk, over four hours in total. Barkely is an entertaining speaker and the talk is absolutely fascinating. If you're even mildly interested in the subject I wholeheartedly recommend it. Many people who have ADHD just assume that they're lazy, or not trying hard enough, or just haven't found the 'magic bullet' yet. It never even occurs to them that they might have it because they assume that adult ADHD looks like childhood ADHD, or that ADHD is a thing that psychiatrists made up so they can give children powerful stimulants.

ADD is real, if you're in the demographic that takes this survey there's a decent enough chance you have it.

Attention Deficit Disorder - Accessibility

So with that in mind, is there anything else we can do?

Yes, write better.

Scott Alexander has written a blog post with writing advice for non-fiction, and the interesting thing about it is just how much of the advice is what I would tell you to do if your audience has ADD.

  • Reward the reader quickly and often. If your prose isn't rewarding to read it won't be read.

  • Make sure the overall article has good sectioning and indexing, people might be only looking for a particular thing and they won't want to wade through everything else to get it. Sectioning also gives the impression of progress and reduces eye strain.

  • Use good data visualization to compress information, take away mental effort where possible. Take for example the condition table above. It saves space and provides additional context. Instead of a long vertical wall of text with sections for each condition, it removes:

    • The extraneous information of how many people said they did not have a condition.

    • The space that would be used by creating a section for each condition. In fact the specific improvement of the table is that it takes extra advantage of space in the horizontal plane as well as the vertical plane.

    And instead of just presenting the raw data, it also adds:

    • The normal rate of incidence for each condition, so that the reader understands the extent to which rates are abnormal or unexpected.

    • Easy comparison between the clinically diagnosed, self diagnosed, and combined rates of the condition in the LW demographic. This preserves the value of the original raw data presentation while also easing the mental arithmetic of how many people claim to have a condition.

    • Percentage spread between the clinically diagnosed and the base rate, which saves the effort of figuring out the difference between the two values.

    • Relative risk between the clinically diagnosed and the base rate, which saves the effort of figuring out how much more or less likely a LessWronger is to have a given condition.

    Add all that together and you've created a compelling presentation that significantly improves on the 'naive' raw data presentation.

  • Use visuals in general, they help draw and maintain interest.

None of these are solely for the benefit of people with ADD. ADD is an exaggerated profile of normal human behavior. Following this kind of advice makes your article more accessible to everybody, which should be more than enough incentive if you intend to have an audience.1

Roko's Basilisk

This year we finally added a Basilisk question! In fact, it kind of turned into a whole Basilisk section. A fairly common question about this years survey is why the Basilisk section is so large. The basic reason is that asking only one or two questions about it would leave the results open to rampant speculation in one direction or another. By making the section comprehensive and covering every base, we've pretty much gotten about as complete of data as we'd want on the Basilisk phenomena.

Basilisk Knowledge
Do you know what Roko's Basilisk thought experiment is?

Yes: 1521 73.2%
No but I've heard of it: 158 7.6%
No: 398 19.2%

Basilisk Etiology
Where did you read Roko's argument for the Basilisk?

Roko's post on LessWrong: 323 20.2%
Reddit: 171 10.7%
XKCD: 61 3.8%
LessWrong Wiki: 234 14.6%
A news article: 71 4.4%
Word of mouth: 222 13.9%
RationalWiki: 314 19.6%
Other: 194 12.1%

Basilisk Correctness
Do you think Roko's argument for the Basilisk is correct?

Yes: 75 5.1%
Yes but I don't think it's logical conclusions apply for other reasons: 339 23.1%
No: 1055 71.8%

Basilisks And Lizardmen

One of the biggest mistakes I made with this years survey was not including "Do you believe Barack Obama is a hippopotamus?" as a control question in this section.2 Five percent is just outside of the infamous lizardman constant. This was the biggest survey surprise for me. I thought there was no way that 'yes' could go above a couple of percentage points. As far as I can tell this result is not caused by brigading but I've by no means investigated the matter so thoroughly that I would rule it out.

Higher?

Of course, we also shouldn't forget to investigate the hypothesis that the number might be higher than 5%. After all, somebody who thinks the Basilisk is correct could skip the questions entirely so they don't face potential stigma. So how many people skipped the questions but filled out the rest of the survey?

Eight people refused to answer whether they'd heard of Roko's Basilisk but went on to answer the depression question immediately after the Basilisk section. This gives us a decent proxy for how many people skipped the section and took the rest of the survey. So if we're pessimistic the number is a little higher, but it pays to keep in mind that there are other reasons to want to skip this section. (It is also possible that people took the survey up until they got to the Basilisk section and then quit so they didn't have to answer it, but this seems unlikely.)

Of course this assumes people are being strictly truthful with their survey answers. It's also plausible that people who think the Basilisk is correct said they'd never heard of it and then went on with the rest of the survey. So the number could in theory be quite large. My hunch is that it's not. I personally know quite a few LessWrongers and I'm fairly sure none of them would tell me that the Basilisk is 'correct'. (In fact I'm fairly sure they'd all be offended at me even asking the question.) Since 5% is one in twenty I'd think I'd know at least one or two people who thought the Basilisk was correct by now.

Lower?

One partial explanation for the surprisingly high rate here is that ten percent of the people who said yes by their own admission didn't know what they were saying yes to. Eight people said they've heard of the Basilisk but don't know what it is, and that it's correct. The lizardman constant also plausibly explains a significant portion of the yes responses, but that explanation relies on you already having a prior belief that the rate should be low.


Basilisk-Like Danger
Do you think Basilisk-like thought experiments are dangerous?

Yes, I think they're dangerous for decision theory reasons: 63 4.2%
Yes I think they're dangerous for social reasons (eg. A cult might use them): 194 12.8%
Yes I think they're dangerous for decision theory and social reasons: 136 9%
Yes I think they're socially dangerous because they make everybody involved look foolish: 253 16.7%
Yes I think they're dangerous for other reasons: 54 3.6%
No: 809 53.4%

Most people don't think Basilisk-Like thought experiments are dangerous at all. Of those that think they are, most of them think they're socially dangerous as opposed to a raw decision theory threat. The 4.2% number for pure decision theory threat is interesting because it lines up with the 5% number in the previous question for Basilisk Correctness.

P(Decision Theory Danger | Basilisk Belief) = 26.6%
P(Decision Theory And Social Danger | Basilisk Belief) = 21.3%

So of the people who say the Basilisk is correct, only half of them believe it is a decision theory based danger at all. (In theory this could be because they believe the Basilisk is a good thing and therefore not dangerous, but I refuse to lose that much faith in humanity.3)

Basilisk Anxiety
Have you ever felt any sort of anxiety about the Basilisk?

Yes: 142 8.8%
Yes but only because I worry about everything: 189 11.8%
No: 1275 79.4%

20.6% of respondents have felt some kind of Basilisk Anxiety. It should be noted that the exact wording of the question permits any anxiety, even for a second. And as we'll see in the next question that nuance is very important.

Degree Of Basilisk Worry
What is the longest span of time you've spent worrying about the Basilisk?

I haven't: 714 47%
A few seconds: 237 15.6%
A minute: 298 19.6%
An hour: 176 11.6%
A day: 40 2.6%
Two days: 16 1.05%
Three days: 12 0.79%
A week: 12 0.79%
A month: 5 0.32%
One to three months: 2 0.13%
Three to six months: 0 0.0%
Six to nine months: 0 0.0%
Nine months to a year: 1 0.06%
Over a year: 1 0.06%
Years: 4 0.26%

These numbers provide some pretty sobering context for the previous ones. Of all the people who worried about the Basilisk, 93.8% didn't worry about it for more than an hour. The next 3.65% didn't worry about it for more than a day or two. The next 1.9% didn't worry about it for more than a month and the last .7% or so have worried about it for longer.

Current Basilisk Worry
Are you currently worrying about the Basilisk?

Yes: 29 1.8%
Yes but only because I worry about everything: 60 3.7%
No: 1522 94.5%

Also encouraging. We should expect a small number of people to be worried at this question just because the section is basically the word "Basilisk" and "worry" repeated over and over so it's probably a bit scary to some people. But these numbers are much lower than the "Have you ever worried" ones and back up the previous inference that Basilisk anxiety is mostly a transitory phenomena.

One article on the Basilisk asked the question of whether or not it was just a "referendum on autism". It's a good question and now I have an answer for you, as per the table below:

Mental Health Conditions Versus Basilisk Worry
Condition Worried Worried But They Worry About Everything Combined Worry
Baseline (in the respondent population) 8.8% 11.8% 20.6%
ASD 7.3% 17.3% 24.7%
OCD 10.0% 32.5% 42.5%
AnxietyDisorder 6.9% 20.3% 27.3%
Schizophrenia 0.0% 16.7% 16.7%

 

The short answer: Autism raises your chances of Basilisk anxiety, but anxiety disorders and OCD especially raise them much more. Interestingly enough, schizophrenia seems to bring the chances down. This might just be an effect of small sample size, but my expectation was the opposite. (People who are really obsessed with Roko's Basilisk seem to present with schizophrenic symptoms at any rate.)

Before we move on, there's one last elephant in the room to contend with. The philosophical theory underlying the Basilisk is the CEV conception of friendly AI primarily espoused by Eliezer Yudkowsky. Which has led many critics to speculate on all kinds of relationships between Eliezer Yudkowsky and the Basilisk. Which of course obviously would extend to Eliezer Yudkowsky's Machine Intelligence Research Institute, a project to develop 'Friendly Artificial Intelligence' which does not implement a naive goal function that eats everything else humans actually care about once it's given sufficient optimization power.

The general thrust of these accusations is that MIRI, intentionally or not, profits from belief in the Basilisk. I think MIRI gets picked on enough, so I'm not thrilled about adding another log to the hefty pile of criticism they deal with. However this is a serious accusation which is plausible enough to be in the public interest for me to look at.

 

Percentage Of People Who Donate To MIRI Versus Basilisk Belief
Belief Percentage
Believe It's Incorrect 5.2%
Believe It's Structurally Correct 5.6%
Believe It's Correct 12.0%

Basilisk belief does appear to make you twice as likely to donate to MIRI. It's important to note from the perspective of earlier investigation that thinking it is "structurally correct" appears to make you about as likely as if you don't think it's correct, implying that both of these options mean about the same thing.

 

Sum Money Donated To MIRI Versus Basilisk Belief
Belief Mean Median Mode Stdev Total Donated
Believe It's Incorrect 1365.590 100.0 100.0 4825.293 75107.5
Believe It's Structurally Correct 2644.736 110.0 20.0 9147.299 50250.0
Believe It's Correct 740.555 300.0 300.0 1152.541 6665.0

Take these numbers with a grain of salt, it only takes one troll to plausibly lie about their income to ruin it for everybody else.

Interestingly enough, if you sum all three total donated counts and divide by a hundred, you find that five percent of the sum is about what was donated by the Basilisk group. ($6601 to be exact) So even though the modal and median donations of Basilisk believers are higher, they donate about as much as would be naively expected by assuming donations among groups are equal.4

 

Percentage Of People Who Donate To MIRI Versus Basilisk Worry
Anxiety Percentage
Never Worried 4.3%
Worried But They Worry About Everything 11.1%
Worried 11.3%

In contrast to the correctness question, merely having worried about the Basilisk at any point in time doubles your chances of donating to MIRI. My suspicion is that these people are not, as a general rule, donating because of the Basilisk per se. If you're the sort of person who is even capable of worrying about the Basilisk in principle, you're probably the kind of person who is likely to worry about AI risk in general and donate to MIRI on that basis. This hypothesis is probably unfalsifiable with the survey information I have, because Basilisk-risk is a subset of AI risk. This means that anytime somebody indicates on the survey that they're worried about AI risk this could be because they're worried about the Basilisk or because they're worried about more general AI risk.

 

Sum Money Donated To MIRI Versus Basilisk Worry
Anxiety Mean Median Mode Stdev Total Donated
Never Worried 1033.936 100.0 100.0 3493.373 56866.5
Worried But They Worry About Everything 227.047 75.0 300.0 438.861 4768.0
Worried 4539.25 90.0 10.0 11442.675 72628.0
Combined Worry         77396.0

Take these numbers with a grain of salt, it only takes one troll to plausibly lie about their income to ruin it for everybody else.

This particular analysis is probably the strongest evidence in the set for the hypothesis that MIRI profits (though not necessarily through any involvement on their part) from the Basilisk. People who worried from an unendorsed perspective donate less on average than everybody else. The modal donation among people who've worried about the Basilisk is ten dollars, which seems like a surefire way to torture if we're going with the hypothesis that these are people who believe the Basilisk is a real thing and they're concerned about it. So this implies that they don't, which supports my earlier hypothesis that people who are capable of feeling anxiety about the Basilisk are the core demographic to donate to MIRI anyway.

Of course, donors don't need to believe in the Basilisk for MIRI to profit from it. If exposing people to the concept of the Basilisk makes them twice as likely to donate but they don't end up actually believing the argument that would arguably be the ideal outcome for MIRI from an Evil Plot perspective. (Since after all, pursuing a strategy which involves Basilisk belief would actually incentivize torture from the perspective of the acausal game theories MIRI bases its FAI on, which would be bad.)

But frankly this is veering into very speculative territory. I don't think there's an evil plot, nor am I convinced that MIRI is profiting from Basilisk belief in a way that outweighs the resulting lost donations and damage to their cause.5 If anybody would like to assert otherwise I invite them to 'put up or shut up' with hard evidence. The world has enough criticism based on idle speculation and you're peeing in the pool.

Blogs and Media

Since this was the LessWrong diaspora survey, I felt it would be in order to reach out a bit to ask not just where the community is at but what it's reading. I went around to various people I knew and asked them about blogs for this section. However the picks were largely based on my mental 'map' of the blogs that are commonly read/linked in the community with a handful of suggestions thrown in. The same method was used for stories.

Blogs Read

LessWrong
Regular Reader: 239 13.4%
Sometimes: 642 36.1%
Rarely: 537 30.2%
Almost Never: 272 15.3%
Never: 70 3.9%
Never Heard Of It: 14 0.7%

SlateStarCodex (Scott Alexander)
Regular Reader: 1137 63.7%
Sometimes: 264 14.7%
Rarely: 90 5%
Almost Never: 61 3.4%
Never: 51 2.8%
Never Heard Of It: 181 10.1%

[These two results together pretty much confirm the results I talked about in part two of the survey analysis. A supermajority of respondents are 'regular readers' of SlateStarCodex. By contrast LessWrong itself doesn't even have a quarter of SlateStarCodexes readership.]

Overcoming Bias (Robin Hanson)
Regular Reader: 206 11.751%
Sometimes: 365 20.821%
Rarely: 391 22.305%
Almost Never: 385 21.962%
Never: 239 13.634%
Never Heard Of It: 167 9.527%

Minding Our Way (Nate Soares)
Regular Reader: 151 8.718%
Sometimes: 134 7.737%
Rarely: 139 8.025%
Almost Never: 175 10.104%
Never: 214 12.356%
Never Heard Of It: 919 53.06%

Agenty Duck (Brienne Yudkowsky)
Regular Reader: 55 3.181%
Sometimes: 132 7.634%
Rarely: 144 8.329%
Almost Never: 213 12.319%
Never: 254 14.691%
Never Heard Of It: 931 53.846%

Eliezer Yudkowsky's Facebook Page
Regular Reader: 325 18.561%
Sometimes: 316 18.047%
Rarely: 231 13.192%
Almost Never: 267 15.248%
Never: 361 20.617%
Never Heard Of It: 251 14.335%

Luke Muehlhauser (Eponymous)
Regular Reader: 59 3.426%
Sometimes: 106 6.156%
Rarely: 179 10.395%
Almost Never: 231 13.415%
Never: 312 18.118%
Never Heard Of It: 835 48.49%

Gwern.net (Gwern Branwen)
Regular Reader: 118 6.782%
Sometimes: 281 16.149%
Rarely: 292 16.782%
Almost Never: 224 12.874%
Never: 230 13.218%
Never Heard Of It: 595 34.195%

Siderea (Sibylla Bostoniensis)
Regular Reader: 29 1.682%
Sometimes: 49 2.842%
Rarely: 59 3.422%
Almost Never: 104 6.032%
Never: 183 10.615%
Never Heard Of It: 1300 75.406%

Ribbon Farm (Venkatesh Rao)
Regular Reader: 64 3.734%
Sometimes: 123 7.176%
Rarely: 111 6.476%
Almost Never: 150 8.751%
Never: 150 8.751%
Never Heard Of It: 1116 65.111%

Bayesed And Confused (Michael Rupert)
Regular Reader: 2 0.117%
Sometimes: 10 0.587%
Rarely: 24 1.408%
Almost Never: 68 3.988%
Never: 167 9.795%
Never Heard Of It: 1434 84.106%

[This was the 'troll' answer to catch out people who claim to read everything.]

The Unit Of Caring (Anonymous)
Regular Reader: 281 16.452%
Sometimes: 132 7.728%
Rarely: 126 7.377%
Almost Never: 178 10.422%
Never: 216 12.646%
Never Heard Of It: 775 45.375%

GiveWell Blog (Multiple Authors)
Regular Reader: 75 4.438%
Sometimes: 197 11.657%
Rarely: 243 14.379%
Almost Never: 280 16.568%
Never: 412 24.379%
Never Heard Of It: 482 28.521%

Thing Of Things (Ozy Frantz)
Regular Reader: 363 21.166%
Sometimes: 201 11.72%
Rarely: 143 8.338%
Almost Never: 171 9.971%
Never: 176 10.262%
Never Heard Of It: 661 38.542%

The Last Psychiatrist (Anonymous)
Regular Reader: 103 6.023%
Sometimes: 94 5.497%
Rarely: 164 9.591%
Almost Never: 221 12.924%
Never: 302 17.661%
Never Heard Of It: 826 48.304%

Hotel Concierge (Anonymous)
Regular Reader: 29 1.711%
Sometimes: 35 2.065%
Rarely: 49 2.891%
Almost Never: 88 5.192%
Never: 179 10.56%
Never Heard Of It: 1315 77.581%

The View From Hell (Sister Y)
Regular Reader: 34 1.998%
Sometimes: 39 2.291%
Rarely: 75 4.407%
Almost Never: 137 8.049%
Never: 250 14.689%
Never Heard Of It: 1167 68.566%

Xenosystems (Nick Land)
Regular Reader: 51 3.012%
Sometimes: 32 1.89%
Rarely: 64 3.78%
Almost Never: 175 10.337%
Never: 364 21.5%
Never Heard Of It: 1007 59.48%

I tried my best to have representation from multiple sections of the diaspora, if you look at the different blogs you can probably guess which blogs represent which section.

Stories Read

Harry Potter And The Methods Of Rationality (Eliezer Yudkowsky)
Whole Thing: 1103 61.931%
Partially And Intend To Finish: 145 8.141%
Partially And Abandoned: 231 12.97%
Never: 221 12.409%
Never Heard Of It: 81 4.548%

Significant Digits (Alexander D)
Whole Thing: 123 7.114%
Partially And Intend To Finish: 105 6.073%
Partially And Abandoned: 91 5.263%
Never: 333 19.26%
Never Heard Of It: 1077 62.29%

Three Worlds Collide (Eliezer Yudkowsky)
Whole Thing: 889 51.239%
Partially And Intend To Finish: 35 2.017%
Partially And Abandoned: 36 2.075%
Never: 286 16.484%
Never Heard Of It: 489 28.184%

The Fable of the Dragon-Tyrant (Nick Bostrom)
Whole Thing: 728 41.935%
Partially And Intend To Finish: 31 1.786%
Partially And Abandoned: 15 0.864%
Never: 205 11.809%
Never Heard Of It: 757 43.606%

The World of Null-A (A. E. van Vogt)
Whole Thing: 92 5.34%
Partially And Intend To Finish: 18 1.045%
Partially And Abandoned: 25 1.451%
Never: 429 24.898%
Never Heard Of It: 1159 67.266%

[Wow, I never would have expected this many people to have read this. I mostly included it on a lark because of its historical significance.]

Synthesis (Sharon Mitchell)
Whole Thing: 6 0.353%
Partially And Intend To Finish: 2 0.118%
Partially And Abandoned: 8 0.47%
Never: 217 12.75%
Never Heard Of It: 1469 86.31%

[This was the 'troll' option to catch people who just say they've read everything.]

Worm (Wildbow)
Whole Thing: 501 28.843%
Partially And Intend To Finish: 168 9.672%
Partially And Abandoned: 184 10.593%
Never: 430 24.755%
Never Heard Of It: 454 26.137%

Pact (Wildbow)
Whole Thing: 138 7.991%
Partially And Intend To Finish: 59 3.416%
Partially And Abandoned: 148 8.57%
Never: 501 29.01%
Never Heard Of It: 881 51.013%

Twig (Wildbow)
Whole Thing: 55 3.192%
Partially And Intend To Finish: 132 7.661%
Partially And Abandoned: 65 3.772%
Never: 560 32.501%
Never Heard Of It: 911 52.873%

Ra (Sam Hughes)
Whole Thing: 269 15.558%
Partially And Intend To Finish: 80 4.627%
Partially And Abandoned: 95 5.495%
Never: 314 18.161%
Never Heard Of It: 971 56.16%

My Little Pony: Friendship Is Optimal (Iceman)
Whole Thing: 424 24.495%
Partially And Intend To Finish: 16 0.924%
Partially And Abandoned: 65 3.755%
Never: 559 32.293%
Never Heard Of It: 667 38.533%

Friendship Is Optimal: Caelum Est Conterrens (Chatoyance)
Whole Thing: 217 12.705%
Partially And Intend To Finish: 16 0.937%
Partially And Abandoned: 24 1.405%
Never: 411 24.063%
Never Heard Of It: 1040 60.89%

Ender's Game (Orson Scott Card)
Whole Thing: 1177 67.219%
Partially And Intend To Finish: 22 1.256%
Partially And Abandoned: 43 2.456%
Never: 395 22.559%
Never Heard Of It: 114 6.511%

[This is the most read story according to survey respondents, beating HPMOR by 5%.]

The Diamond Age (Neal Stephenson)
Whole Thing: 440 25.346%
Partially And Intend To Finish: 37 2.131%
Partially And Abandoned: 55 3.168%
Never: 577 33.237%
Never Heard Of It: 627 36.118%

Consider Phlebas (Iain Banks)
Whole Thing: 302 17.507%
Partially And Intend To Finish: 52 3.014%
Partially And Abandoned: 47 2.725%
Never: 439 25.449%
Never Heard Of It: 885 51.304%

The Metamorphosis Of Prime Intellect (Roger Williams)
Whole Thing: 226 13.232%
Partially And Intend To Finish: 10 0.585%
Partially And Abandoned: 24 1.405%
Never: 322 18.852%
Never Heard Of It: 1126 65.925%

Accelerando (Charles Stross)
Whole Thing: 293 17.045%
Partially And Intend To Finish: 46 2.676%
Partially And Abandoned: 66 3.839%
Never: 425 24.724%
Never Heard Of It: 889 51.716%

A Fire Upon The Deep (Vernor Vinge)
Whole Thing: 343 19.769%
Partially And Intend To Finish: 31 1.787%
Partially And Abandoned: 41 2.363%
Never: 508 29.28%
Never Heard Of It: 812 46.801%

I also did a k-means cluster analysis of the data to try and determine demographics and the ultimate conclusion I drew from it is that I need to do more analysis. Which I would do, except that the initial analysis was a whole bunch of work and jumping further down the rabbit hole in the hopes I reach an oasis probably isn't in the best interests of myself or my readers.

Footnotes


  1. This is a general trend I notice with accessibility. Not always, but very often measures taken to help a specific group end up having positive effects for others as well. Many of the accessibility suggestions of the W3C are things you wish every website did.

  2. I hadn't read this particular SSC post at the time I compiled the survey, but I was already familiar with the concept of a lizardman constant and should have accounted for it.

  3. I've been informed by a member of the freenode #lesswrong IRC channel that this is in fact Roko's opinion, because you can 'timelessly trade with the future superintelligence for rewards, not just punishment' according to a conversation they had with him last summer. Remember kids: Don't do drugs, including Max Tegmark.

  4. You might think that this conflicts with the hypothesis that the true rate of Basilisk belief is lower than 5%. It does a bit, but you also need to remember that these people are in the LessWrong demographic, which means regardless of what the Basilisk belief question means we should naively expect them to donate five percent of the MIRI donation pot.

  5. That is to say, it does seem plausible that MIRI 'profits' from Basilisk belief based on this data, but I'm fairly sure any profit is outweighed by the significant opportunity cost associated with it. I should also take this moment to remind the reader that the original Basilisk argument was supposed to prove that CEV is a flawed concept from the perspective of not having deleterious outcomes for people, so MIRI using it as a way to justify donating to them would be weird.

If there was one element of statistical literacy that you could magically implant in every head, what would it be?

3 enfascination 22 February 2016 07:53PM

Alternatively, what single concept from statistics would most improve people's interpretations of popular news and daily life events?

The trouble with Bayes (draft)

10 snarles 19 October 2015 08:50PM

Prerequisites

This post requires some knowledge of Bayesian and Frequentist statistics, as well as probability. It is intended to explain one of the more advanced concepts in statistical theory--Bayesian non-consistency--to non-statisticians, and although the level required is much less than would be required to read some of the original papers on the topic[1], some considerable background is still required.

The Bayesian dream

Bayesian methods are enjoying a well-deserved growth of popularity in the sciences. However, most practitioners of Bayesian inference, including most statisticians, see it as a practical tool. Bayesian inference has many desirable properties for a data analysis procedure: it allows for intuitive treatment of complex statistical models, which include models with non-iid data, random effects, high-dimensional regularization, covariance estimation, outliers, and missing data. Problems which have been the subject of Ph. D. theses and entire careers in the Frequentist school, such as mixture models and the many-armed bandit problem, can be satisfactorily handled by introductory-level Bayesian statistics.

A more extreme point of view, the flavor of subjective Bayes best exemplified by Jaynes' famous book [2], and also by an sizable contingent of philosophers of science, elevates Bayesian reasoning to the methodology for probabilistic reasoning, in every domain, for every problem. One merely needs to encode one's beliefs as a prior distribution, and Bayesian inference will yield the optimal decision or inference.

To a philosophical Bayesian, the epistemological grounding of most statistics (including "pragmatic Bayes") is abysmal. The practice of data analysis is either dictated by arbitrary tradition and protocol on the one hand, or consists of users creatively employing a diverse "toolbox" of methods justified by a diverse mixture of incompatible theoretical principles like the minimax principle, invariance, asymptotics, maximum likelihood or *gasp* "Bayesian optimality." The result: a million possible methods exist for any given problem, and a million interpretations exist for any data set, all depending on how one frames the problem. Given one million different interpretations for the data, which one should *you* believe?

Why the ambiguity? Take the textbook problem of determining whether a coin is fair or weighted, based on the data obtained from, say, flipping it 10 times. Keep in mind, a principled approach to statistics decides the rule for decision-making before you see the data. So, what rule whould you use for your decision? One rule is, "declare it's weighted, if either 10/10 flips are heads or 0/10 flips are heads." Another rule is, "always declare it to be weighted." Or, "always declare it to be fair." All in all, there are 10 possible outcomes (supposing we only care about the total) and therefore there are 2^10 possible decision rules. We can probably rule out most of them as nonsensical, like, "declare it to be weighted if 5/10 are heads, and fair otherwise" since 5/10 seems like the fairest outcome possible. But among the remaining possibilities, there is no obvious way to choose the "best" rule. After all, the performance of the rule, defined as the probability you will make the correct conclusion from the data, depends on the unknown state of the world, i.e. the true probability of flipping heads for that particular the coin.

The Bayesian approach "cuts" the Gordion knot of choosing the best rule, by assuming a prior distribution over the unknown state of the world. Under this prior distribution, one can compute the average perfomance of any decision rule, and choose the best one. For example, suppose your prior is that with probability 99.9999%, the coin is fair. Then the best decision rule would be to "always declare it to be fair!"

The Bayesian approach gives you the optimal decision rule for the problem, as soon as you come up with a model for the data and a prior for your model. But when you are looking at data analysis problems in the real world (as opposed to a probability textbook), the choice of model is rarely unambiguous. Hence, for me, the standard Bayesian approach does not go far enough--if there are a million models you could choose from, you still get a million different conclusions as a Bayesian.

Hence, one could argue that a "pragmatic" Bayesian who thinks up a new model for every problem is just as epistemologically suspect as any Frequentist. Only the strongest form of subjective Bayesianism can one escape this ambiguity. The dream for the subjective Bayesian dream is to start out in life with a single model. A single prior. For the entire world. This "world prior" would contain all the entirety of one's own life experience, and the grand total of human knowledge. Surely, writing out this prior is impossible. But the point is that a true Bayesian must behave (at least approximately) as if they were driven by such a universal prior. In principle, having such an universal prior (at least conceptually) solves the problem of choosing models and priors for problems: the priors and models you choose for particular problems are determined by the posterior of your universal prior. For example, why did you decide on a linear model for your economics data? It's because according to your universal posterior, you particular economic data is well-described by such a model with high-probability.

The main practical consequence of the universal prior is that your inferences in one problem should be consistent which your inferences in another, related problem. Even if the subjective Bayesian never writes out a "grand model", their integrated approach to data analysis for related problems still distinguishes their approach from the piecemeal approach of frequentists, who tend to treat each data analysis problem as if it occurs in an isolated universe. (So I claim, though I cannot point to any real example of such a subjective Bayesian.)

Yet, even if the subjective Bayesian ideal could be realized, many philosophers of science (e.g. Deborah Mayo) would consider it just as ambiguous as non-Bayesian approaches, since even if you have an unambiguous proecdure for forming personal priors, your priors are still going to differ from mine. I don't consider this a defect, since my worldview necessarily does differ from yours. My ultimate goal is to make the best decision for myself. That said, such egocentrism, even if rationally motivated, may indeed be poorly suited for a collaborative enterprise like science.

For me, the most far more troublesome objection to the "Bayesian dream" is the question, "How would actually you go about constructing this prior that represents all of your beliefs?" Looking in the Bayesian literature, one does not find any convincing examples of any user of Bayesian inference managing to actually encode all (or even a tiny portion) of their beliefs in the form of the prior--in fact, for the most part, we see alarmingly little thought or justification being put into the construction of the priors.

Nevertheless, I myself remained one of these "hardcore Bayesians", at least from a philosophical point of view, ever since I started learning about statistics. My faith in the "Bayesian dream" persisted even after spending three years in the Ph. D. program in Stanford (a department with a heavy bias towards Frequentism) and even after I personally started doing research in frequentist methods. (I see frequentist inference as a poor man's approximation for the ideal Bayesian inference.) Though I was aware of the Bayesian non-consistency results, I largely dismissed them as mathematical pathologies. And while we were still a long way from achieving universal inference, I held the optimistic view that improved technology and theory might one day finally make the "Bayesian dream" achievable. However, I could not find a way to ignore one particular example on Wasserman's blog[3], due to its relevance to very practical problems in causal inference. Eventually I thought of an even simpler counterexample, which devastated my faith in the possibility of constructing a universal prior. Perhaps a fellow Bayesian can find a solution to this quagmire, but I am not holding my breath.

The root of the problem is the extreme degree of ignorance we have about our world, the degree of surprisingness of many true scientific discoveries, and the relative ease with which we accept these surprises. If we consider this behavior rational (which I do), then the subjective Bayesian is obligated to construct a prior which captures this behavior. Yet, the diversity of possible surprises the model must be able to accommodate makes it practically impossible (if not mathematically impossible) to construct such a prior. The alternative is to reject all possibility of surprise, and refuse to update any faster than a universal prior would (extremely slowly), which strikes me as a rather poor epistemological policy.

In the rest of the post, I'll motivate my example, sketch out a few mathematical details (explaining them as best I can to a general audience), then discuss the implications.

Introduction: Cancer classification

Biology and medicine are currently adapting to the wealth of information we can obtain by using high-throughput assays: technologies which can rapidly read the DNA of an individual, measure the concentration of messenger RNA, metabolites, and proteins. In the early days of this "large-scale" approach to biology which began with the Human Genome Project, some optimists had hoped that such an unprecedented torrent of raw data would allow scientists to quickly "crack the genetic code." By now, any such optimism has been washed away by the overwhelming complexity and uncertainty of human biology--a complexity which has been made clearer than ever by the flood of data--and replaced with a sober appreciation that in the new "big data" paradigm, making a discovery becomes a much easier task than understanding any of those discoveries.

Enter the application of machine learning to this large-scale biological data. Scientists take these massive datasets containing patient outcomes, demographic characteristics, and high-dimensional genetic, neurological, and metabolic data, and analyze them using algorithms like support vector machines, logistic regression and decision trees to learn predictive models to relate key biological variables, "biomarkers", to outcomes of interest.

To give a specific example, take a look at this abstract from the Shipp. et. al. paper on detecting survival rates for cancer patients [4]:

Diffuse large B-cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is curable in less than 50% of patients. Prognostic models based on pre-treatment characteristics, such as the International Prognostic Index (IPI), are currently used to predict outcome in DLBCL. However, clinical outcome models identify neither the molecular basis of clinical heterogeneity, nor specific therapeutic targets. We analyzed the expression of 6,817 genes in diagnostic tumor specimens from DLBCL patients who received cyclophosphamide, adriamycin, vincristine and prednisone (CHOP)-based chemotherapy, and applied a supervised learning prediction method to identify cured versus fatal or refractory disease. The algorithm classified two categories of patients with very different five-year overall survival rates (70% versus 12%). The model also effectively delineated patients within specific IPI risk categories who were likely to be cured or to die of their disease. Genes implicated in DLBCL outcome included some that regulate responses to B-cell−receptor signaling, critical serine/threonine phosphorylation pathways and apoptosis. Our data indicate that supervised learning classification techniques can predict outcome in DLBCL and identify rational targets for intervention.

The term "supervised learning" refers to any algorithm for learning a predictive model for predicting some outcome Y(could be either categorical or numeric) from covariates or features X. In this particular paper, the authors used a relatively simple linear model (which they called "weighted voting") for prediction.

A linear model is fairly easy to interpret: it produces a single "score variable" via a weighted average of a number of predictor variables. Then it predicts the outcome (say "survival" or "no survival") based on a rule like, "Predict survival if the score is larger than 0." Yet, far more advanced machine learning models have been developed, including "deep neural networks" which are winning all of the image recognition and machine translation competitions at the moment. These "deep neural networks" are especially notorious for being difficult to interpret. Along with similarly complicated models, neural networks are often called "black box models": although you can get miraculously accurate answers out of the "box", peering inside won't give you much of a clue as to how it actually works.

Now it is time for the first thought experiment. Suppose a follow-up paper to the Shipp paper reports dramatically improved prediction for survival outcomes of lymphoma patients. The authors of this follow-up paper trained their model on a "training sample" of 500 patients, then used it to predict the five-year outcome of chemotherapy patients, on a "test sample" of 1000 patients. It correctly predicts the outcome ("survival" vs "no survival") on 990 of the 1000 patients.

Question 1: what is your opinion on the predictive accuracy of this model on the population of chemotherapy patients? Suppose that publication bias is not an issue (the authors of this paper designed the study in advance and committed to publishing) and suppose that the test sample of 1000 patients is "representative" of the entire population of chemotherapy patients.

Question 2: does your judgment depend on the complexity of the model they used? What if the authors used an extremely complex and counterintuitive model, and cannot even offer any justification or explanation for why it works? (Nevertheless, their peers have independently confirmed the predictive accuracy of the model.)

A Frequentist approach

The Frequentist answer to the thought experiment is as follows. The accuracy of the model is a probability p which we wish to estimate. The number of successes on the 1000 test patients is Binomial(p, 1000). Based on the data, one can construct a confidence interal: say, we are 99% confident that the accuracy is above 83%. What does 99% confident mean? I won't try to explain, but simply say that in this particular situation, "I'm pretty sure" that the accuracy of the model is above 83%.

A Bayesian approach

The Bayesian interjects, "Hah! You can't explain what your confidence interval actually means!" He puts a uniform prior on the probability p. The posterior distribution of p, conditional on the data, is Beta(991, 11). This gives a 99% credible interval that p is in [0.978, 0.995]. You can actually interpret the interval in probabilistic terms, and it gives a much tighter interval as well. Seems like a Bayesian victory...?

A subjective Bayesian approach

As I have argued before, a Bayesian approach which comes up with a model after hearing about the problem is bound to suffer from the same inconsistency and arbitariness as any non-Bayesian approach. You might assume a uniform distribution for p in this problem... but yet another paper comes along with a similar prediction model? You would need a join distribution for the current model and the new model. What if a theory comes along that could help explain the success of the current method? The parameter p might take a new meaning in this context.

So as a subjective Bayesian, I argue that slapping a uniform prior on the accuracy is the wrong approach. But I'll stop short of actually constructing a Bayesian model of the entire world: let's say we want to restrict our attention to this particular issue of cancer prediction. We want to model the dynamics behind cancer and cancer treatment in humans. Needless to say, the model is still ridiculously complicated. However, I don't think it's out of reach of the efforts of a well-funded, large collaborative effort of scientists.

Roughly speaking, the model can be divided into a distribution over theories of human biology, and conditional on the theory of biology, a course-grained model of an individual patient. The model would not include every cell, every molecule, etc., but it would contain many latent variables in addition to the variables measured in any particular cancer study. Let's call the variables actually measured in the study, X, and also the survival outcome, Y.

Now here is the epistemologically correct way to answer the thought experiment. Take a look at the X's and Y's of the patients in the training and test set. Update your probabilistic model of human biology based on the data. Then take a look at the actual form of the classifier: it's a function f() mapping X's to Y's. The accuracy of the classsifer is no longer parameter: it's a quantity Pr[f(X) = Y] which has a distribution under your posterior. That is, for any given "theory of human biology", Pr[f(X) = Y] has a fixed value: now, over the distribution of possible theories of human biology (based on the data of the current study as well as all previous studies and your own beliefs) Pr[f(X) = Y] has a distribution, and therefore, an average. But what will this posterior give you? Will you get something similar to the interval [0.978, 0.995] you got from the "practical Bayes" approach?

Who knows? But I would guess in all likelihood not. My guess you would get a very different interval from [0.978, 0.995], because in this complex model there is no direct link from the empirical success rate of prediction, and the quantity Pr[f(X) = Y]. But my intuition for this fact comes from the following simpler framework.

A non-parametric Bayesian approach

Instead of reasoning about a gand Bayesian model of biology, I now take a middle ground, and suggesting that while we don't need to capture the entire latent dynamics of cancer, we should at the very least we should try to include the X's and the Y's in the model, instead of merely abstracting the whole experiment as a Binomial trial (as did the frequentist and pragmatic Bayesian.) Hence we need a prior over joint distributions of (X, Y). And yes, I do mean a prior distribution over probability distributions: we are saying that (X, Y) has some unknown joint distribution, which we treat as being drawn at random from a large collection of distributions. This is therefore a non-parametric Bayes approach: the term non-parametric means that the number of the parameters in the model is not finite.

Since in this case Y is a binary outcome, a joint distribution can be decomposed as a marginal distribution over X, and a function g(x) giving the conditional probability that Y=1 given X=x. The marginal distribution is not so interesting or important for us, since it simple reflects the composition of the population of patients. For the purpose of this example, let us say that the marginal is known (e.g., a finite distribution over the population of US cancer patients). What we want to know is the probability of patient survival, and this is given by the function g(X) for the particular patient's X. Hence, we will mainly deal with constructing a prior over g(X).

To construct a prior, we need to think of intuitive properties of the survival probability function g(x). If x is similar to x', then we expect the survival probabilities to be similar. Hence the prior on g(x) should be over random, smooth functions. But we need to choose the smoothness so that the prior does not consist of almost-constant functions. Suppose for now that we choose particular class of smooth functions (e.g. functions with a certain Lipschitz norm) and choose our prior to to be uniform over functions of that smoothness. We could go further and put a prior on the smoothness hyperparameter, but for now we won't.

Now, although I assert my faithfulness to the Bayesian ideal, I still want to think about how whatever prior we choose would allow use to answer some simple though experiments. Why is that? I hold that the ideal Bayesian inference should capture and refine what I take to be "rational behavior." Hence, if a prior produces irrational outcomes, I reject that prior as not reflecting my beliefs.

Take the following thought experiment: we simply want to estimate the expected value of Y, E[Y]. Hence, we draw 100 patients independently with replacement from the population and record their outcomes: suppose the sum is 80 out of 100. The Frequentist (and prgamatic Bayesian) would end up concluding that with high probability/confidence/whatever, the expected value of Y is around 0.8, and I would hold that an ideal rationalist come up with a similar belief. But what would our non-parametric model say? We would draw a random function g(x) conditional on our particular observations: we get a quantity E[g(X)] for each instantiation of g(x): the distribution of E[g(X)]'s over the posterior allows us to make credible intervals for E[Y].

But what do we end up getting? Either one of two things happens. Either you choose too little smoothness, and E[g(X)] ends up concentrating at around 0.5, no matter what data you put into the model. This is the phenomenon of Bayesian non-consistency, and a detailed explanation can be found in several of the listed references: but to put it briefly, sampling at a few isolated points gives you too little information on the rest of the function. This example is not as pathological as the ones used in the literature: if you sample infinitely many points, you will eventually get the posterior to concentrate around the true value of E[Y], but all the same, the convergence is ridiculously slow. Alternatively, use a super-high smoothness, and the posterior of E[g(X)] has a nice interval around the sample value just like in the Binomial example. But now if you look at your posterior draws of g(x), you'll notice the functions are basically constants. Putting a prior on smoothness doesn't change things: the posterior on smoothness doesn't change, since you don't actually have enough data to determine the smoothness of the function. The posterior average of E[g(X)] is no longer always 0.5: it gets a little bit affected by the data, since within the 10% mass of the posterior corresponding to the smooth prior, the average of E[g(X)] is responding to the data. But you are still almost as slow as before in converging to the truth.

At the time that I started thinking about the above "uniform sampling" example, I was stil convinced of a Bayesian resolution. Obviously, using a uniform prior over smooth functions is too naive: you can tell by seeing that the prior distribution over E[g(X)] is already highly concentrated around 0.5. How about a hierarchical model, where first we draw a parameter p from the uniform distribution, and then draw g(x) from the uniform distribution over smooth functions with mean value equal to p? This gets you non-constant g(x) in the posterior, while your posteriors of E[g(X)] converge to the truth as quickly as in the Binomial example. Arguing backwards, I would say that such a prior comes closer to capturing my beliefs.

But then I thought, what about more complicated problems than computing E[Y]? What if you have to compute the expectation of Y conditional on some complicated function of X taking on a certain value: i.e. E[Y|f(X) = 1]? In the frequentist world, you can easily compute E[Y|f(X)=1] by rejection sampling: get a sample of individuals, average the Y's of the individuals whose X's satisfy f(X) = 1. But how could you formulate a prior that has the same property? For a finite collection of functions f, {f1,...,f100}, say, you might be able to construct a prior for g(x) so that the posterior for E[g(X)|fi = 1] converges to the truth for every i in {1,..,100}. I don't know how to do so, but perhaps you know. But the frequentist intervals work for every function f! Can you construct a prior which can do the same?

I am happy to argue that a true Bayesian would not need consistency for every possible f in the mathematical universe. It is cool that frequentist inference works for such a general collection: but it may well be unnecessary for the world we live in. In other words, there may be functions f which are so ridiculous, that even if you showed me that empirically, E[Y|f(X)=1] = 0.9, based on data from 1 million patients, I would not believe that E[Y|f(X)=1] was close to 0.9. It is a counterintuitive conclusion, but one that I am prepared to accept.

Yet, the set of f's which are not so ridiculous, which in fact I might accept to be reasonable based on conventional science, may be so large as to render impossible the construction of a prior which could accommodate them all. But the Bayesian dream makes the far stronger demand that our prior capture not just our current understanding of science but to match the flexibility of rational thought. I hold that given the appropriate evidence, rationalists can be persuaded to accept truths which they could not even imagine beforehand. Thinking about how we could possibly construct a prior to mimic this behavior, the Bayesian dream seems distant indeed.

Discussion

To be updated later... perhaps responding to some of your comments!

 

[1] Diaconis and Freedman, "On the Consistency of Bayes Estimates"

[2] ET Jaynes, Probability: the Logic of Science

[3] https://normaldeviate.wordpress.com/2012/08/28/robins-and-wasserman-respond-to-a-nobel-prize-winner/

[4] Shipp et al. "Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning." Nature

Seeking geeks interested in bioinformatics

17 bokov 22 June 2015 01:44PM

I work at a small but feisty research team whose focus is biomedical informatics, i.e. mining biomedical data. Especially anonymized hospital records pooled over multiple healthcare networks. My personal interest is ultimately life-extension, and my colleagues are warming up to the idea as well. But the short-term goal that will be useful many different research areas is building infrastructure to massively accelerate hypothesis testing on and modelling of retrospective human data.

 

We have a job posting here (permanent, non-faculty, full-time, benefits):

https://www.uthscsajobs.com/postings/3113

 

If you can program, want to work in an academic research setting, and can relocate to San Antonio, TX, I invite you to apply. Thanks.

Note: The first step of the recruitment process will be a coding challenge, which will include an arithmetical or string-manipulation problem to solve in real-time using a language and developer tools of your choice.

edit: If you tried applying and were unable to access the posting, it's because the link has changed, our HR has an automated process that periodically expires the links for some reason. I have now updated the job post link.

No peace in our time?

9 Stuart_Armstrong 26 May 2015 02:41PM

There's a new paper arguing, contra Pinker, that the world is not getting more peaceful:

On the tail risk of violent conflict and its underestimation

Pasquale Cirillo and Nassim Nicholas Taleb

Abstract—We examine all possible statistical pictures of violent conflicts over common era history with a focus on dealing with incompleteness and unreliability of data. We apply methods from extreme value theory on log-transformed data to remove compact support, then, owing to the boundedness of maximum casualties, retransform the data and derive expected means. We find the estimated mean likely to be at least three times larger than the sample mean, meaning severe underestimation of the severity of conflicts from naive observation. We check for robustness by sampling between high and low estimates and jackknifing the data. We study inter-arrival times between tail events and find (first-order) memorylessless of events. The statistical pictures obtained are at variance with the claims about "long peace".

Every claim in the abstract is supported by the data - with the exception of the last claim. Which is the important one, as it's the only one really contradicting the "long peace" thesis.

Most of the paper is an analysis of trends in peace and war that establish that what we see throughout conflict history is consistent with a memoryless powerlaw process whose mean we underestimate from the sample. That is useful and interesting.

However, the paper does not compare the hypothesis that the world is getting peaceful with the alternative hypothesis that it's business as usual. Note that it's not cherry-picking to suggest that the world might be getting more peaceful since 1945 (or 1953). We've had the development of nuclear weapons, the creation of the UN, and the complete end of direct great power wars (a rather unprecedented development). It would be good to test this hypothesis; unfortunately this paper, while informative, does not do so.

The only part of the analysis that could be applied here is the claim that:

For an events with more than 10 million victims, if we refer to actual estimates, the average time delay is 101.58 years, with a mean absolute deviation of 144.47 years

This could mean that the peace since the second world war is not unusual, but could be quite typical. But this ignores the "per capita" aspect of violence: the more people, the more deadly events we expect at same per capita violence. Since the current population is so much larger than it's ever been, the average time delay is certainly lower that 101.58 years. They do have a per capita average time delay - table III. Though this seems to predict events with 10 million casualties (per 7.2 billion people) every 37 years or so. That's 3.3 million casualties just after WW2, rising to 10 million today. This has never happened so far (unless one accepts the highest death toll estimate of the Korean war; as usual, it is unclear whether 1945 or 1953 was the real transition).

This does not prove that the "long peace" is right, but at least shows the paper has failed to prove it wrong.

Some secondary statistics from the results of LW Survey

8 Nanashi 12 February 2015 04:46PM

 

Global LW (N=643) vs USA LW (N=403) vs. Average US Household (Comparable Income)
Income Bracket LW Mean Contributions USA LW Mean Contribution US Mean Contributions** [1]   LW Mean Income USA LW Mean Income US Mean*** Income [1]   LW Contributions /Income USA LW Contributions/Income US Contributions/Income [1]    
$0 - $25000 (41% of LW) $1,395.11 $935.47 $1,177.52   $11,241.14 $11,326.18 $15,109.85   12.41% 8.26% 7.79%    
$25000-$50000 (17% of LW) $438.25 $571.00 $1,748.08   $34,147.14 $32,758.06 $38,203.79   1.28% 1.74% 4.58%    
$50000-$75000 (12% of LW) $1,757.77 $1,638.59 $2,191.58   $60,387.69 $61,489.30 $62,342.05   2.91% 2.66% 3.52%    
$75000-$100000 (9% of LW) $1,883.36 $2,211.81 $2,624.81   $84,204.09 $83,049.54 $87,182.68   2.24% 2.66% 3.01%    
$100000-$200000 (16% of LW) $3,645.73 $3,372.84 $3,555.02   $123,581.28 $124,577.88 $137,397.03   2.95% 2.71% 2.59%    
>$200000 (5% of LW) $14,162.35 $15,970.67 $15,843.97   $296,884.63 $299,444.44 $569,447.35   4.77% 5.33% 2.78%    
Total $2,265.56 $2,669.85 $3,949.26   $62,285.72 $75,130.37 $133,734.60   3.64% 3.55% 2.95%    
All < $200000 $1,689.36 $1,649.32 $2,515.29   $51,254.43 $58,306.81 $81,207.03   3.30% 2.83% 3.10%    

 

Global LW (N=643) vs USA LW (N=403) vs. Average US Citizen (Comparable Age)
Age Bracket* LW Median US LW Median US Median*** [2]
15-24 $17,000.00 $20,000.00 $26,999.13
25-34 $50,000.00 $60,504.00 $45,328.70
All <35 $40,000.00 $58,000.00 $40,889.57

 

 

 

Global LW (N=407) vs USA LW (N=243) vs. Average US Citizen (Comparable IQ)
  Average LW** US LW US Between 125-155 IQ [3]
Median Income $40,000.00 $58,000.00 $60,528.70
Mean Contributions $2,265.56 $2,669.85 $2,016.00

 

Note: Three data points were removed from the sample due to my subjective opinion that they were fake. Any self-reported IQs of 0 were removed. Any self-reported income of 0 was removed. 

*89% of the LW population is between the age of 15 and 34.

**88% of the LW population has an IQ between 125 and 155, with an average IQ of 138. 

****Median numbers were adjusted down by a factor of 1.15 to account for the fact that the source data was calculating household median income rather than individual median income. 

[1] Internal Revenue Service, Charitable Giving by Households that Itemize Deductions (AGI and Itemized Contributions Summary by Zip, 2012), The Urban Institute, National Center for Charitable Statistics 

[2] U.S. Census Bureau, Current Population Survey, 2013 and 2014 Annual Social and Economic Supplements.

[3] Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress Intelligence, Vol. 35, No. 5. (September 2007), by Jay L. Zagorsky

 

Update 1: Updated chart 1&2 to account for the fact that the source data was calculating household median income rather than individual income.

Update 2: Reverted Chart 1 back to original because I realized that the purpose was to compare LWers to those in similar income brackets. So in that situation, whether it's a household or an individual is not as relevant. It does penalize households to an extent because they have less money available to donate to charity because they're splitting their money three ways. 

Update 3: Updated all charts to include data that is filtered for US only.

A hypothetical question for investors

3 bokov 03 December 2014 04:39PM

Let's suppose you start with $1000 to invest, and the only thing you can invest it in is stock ABC. You are only permitted to occupy two states:

* All assets in cash

* All assets in stock ABC

You incur a $2 transaction fee every time you buy or sell.

Kind of annoying limitations to operate under. But you have a powerful advantage as well. You have a perfect crystal ball that each day gives you the [probability density function](http://en.wikipedia.org/wiki/Probability_density_function) of ABC's closing price for the following day (but no further ahead in time).

What would be an optimal decision rule for when to buy and sell?

 

Request for suggestions: ageing and data-mining

14 bokov 24 November 2014 11:38PM

Imagine you had the following at your disposal:

  • A Ph.D. in a biological science, with a fair amount of reading and wet-lab work under your belt on the topic of aging and longevity (but in hindsight, nothing that turned out to leverage any real mechanistic insights into aging).
  • A M.S. in statistics. Sadly, the non-Bayesian kind for the most part, but along the way acquired the meta-skills necessary to read and understand most quantitative papers with life-science applications.
  • Love of programming and data, the ability to learn most new computer languages in a couple of weeks, and at least 8 years spent hacking R code.
  • Research access to large amounts of anonymized patient data.
  • Optimistically, two decades remaining in which to make it all count.

Imagine that your goal were to slow or prevent biological aging...

  1. What would be the specific questions you would try to tackle first?
  2. What additional skills would you add to your toolkit?
  3. How would you allocate your limited time between the research questions in #1 and the acquisition of new skills in #2?

Thanks for your input.


Update

I thank everyone for their input and apologize for how long it has taken me to post an update.

I met with Aubrey de Grey and he recommended using the anonymized patient data to look for novel uses for already-prescribed drugs. He also suggested I do a comparison of existing longitudinal studies (e.g. Framingham) and the equivalent data elements from our data warehouse. I asked him that if he runs into any researchers with promising theories or methods but for a massive human dataset to test them on, to send them my way.

My original question was a bit to broad in retrospect: I should have focused more on how to best leverage the capabilities my project already has in place rather than a more general "what should I do with myself" kind of appeal. On the other hand, at the time I might have been less confident about the project's success than I am now. Though the conversation immediately went off into prospective experiments rather than analyzing existing data, there were some great ideas there that may yet become practical to implement.

At any rate, a lot of this has been overcome by events. In the last six months I realized that before we even get to the bifurcation point between longevity and other research areas, there are a crapload of technical, logistical, and organizational problems to solve. I no longer have any doubt that these real problems are worth solving, my team is well positioned to solve many of them, and the solutions will significantly accelerate research in many areas including longevity. We have institutional support, we have a credible revenue stream, and no shortage of promising directions to pursue. The limiting factor now is people-hours. So, we are recruiting.

Thanks again to everyone for their feedback.

 

The Universal Medical Journal Article Error

6 PhilGoetz 29 April 2014 05:57PM

(Oops. I forgot this was moved to Discussion.)

TL;DR:  When people read a journal article that concludes, "We have proved that it is not the case that for every X, P(X)", they generally credit the article with having provided at least weak evidence in favor of the proposition ∀x !P(x).  This is not necessarily so.

 

Authors using statistical tests are making precise claims, which must be quantified correctly.  Pretending that all quantifiers are universal because we are speaking English is one error.  It is not, as many commenters are claiming, a small error.  ∀x !P(x) is very different from !∀x P(x).

 

A more-subtle problem is that when an article uses an F-test on a hypothesis, it is possible (and common) to fail the F-test for P(x) with data that supports the hypothesis P(x).  The 95% confidence level was chosen for the F-test in order to count false positives as much more expensive than false negatives.  Applying it therefore removes us from the world of Bayesian logic.  You cannot interpret the failure of an F-test for P(x) as being even weak evidence for not P(x).

continue reading »

Learn (and Maybe Get a Credential in) Data Science

10 Jayson_Virissimo 01 February 2014 06:39PM

Coursera is now offering a sequence of online courses on data science. They include:

1. The Data Scientist's Toolbox

Upon completion of this course you will be able to identify and classify data science problems. You will also have created your Github account, created your first repository, and pushed your first markdown file to your account.


In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.


Upon completion of this course you will be able to obtain data from a variety of sources. You will know the principles of tidy data and data sharing. Finally, you will understand and be able to apply the basic tools for data cleaning and manipulation.


After successfully completing this course you will be able to make visual representations of data using the base, lattice, and ggplot2 plotting systems in R, apply basic principles of data graphics to create rich analytic graphics from different types of datasets, construct exploratory summaries of data in support of a specific question, and create visualizations of multidimensional data using exploratory multivariate statistical techniques.


In this course you will learn to write a document using R markdown, integrate live R code into a literate statistical program, compile R markdown documents using knitr and related tools, and organize a data analysis so that it is reproducible and accessible to others.


In this class students will learn the fundamentals of statistical inference. Students will receive a broad overview of the goals, assumptions and modes of performing statistical inference. Students will be able to perform inferential tasks in highly targeted settings and will be able to use  the skills developed as a roadmap for more complex inferential challenges.


In this course students will learn how to fit regression models, how to interpret coefficients, how to investigate residuals and variability.  Students will further learn special cases of regression models including use of dummy variables and multivariable adjustment. Extensions to generalized linear models, especially considering Poisson and logistic regression will be reviewed.


Upon completion of this course you will understand the components of a machine learning algorithm. You will also know how to apply multiple basic machine learning tools. You will also learn to apply these tools to build and evaluate predictors on real data.


Students will learn how communicate using statistics and statistical products. Emphasis will be paid to communicating uncertainty in statistical results. Students will learn how to create simple Shiny web applications and R packages for their data products.

You can take the entire sequence for free or pay $49 for each course in order to (upon completion) receive a Specialization Certificate from Johns Hopkins University.

The very popular blog Simply Statistics discusses the program here.

Understanding Simpson's Paradox

11 Vaniver 18 September 2013 07:07PM

An article by Judea Pearl, available here. It's quick at 8 pages, and worth reading if you enjoy statistics (though I think people who already are familiar with the math of causality1 will get more out of it than others2). I'll talk here about the part that I think is generally interesting:

continue reading »

Applied Bayes' Theorem: Calculating the probability that she's over me. Could somebody check my work? TT_TT

-9 abcd_z 19 July 2013 02:59AM

Solve for:  The probability that she's over me given that she didn't answer my call.

Estimated probabilities:
The probability that she'd miss my call given that she was over me: 90%
The probability that she's over me: 30%
The probability that she'd miss my call given she was not over me. 10%

(P(over me|missed call)= P(missed call|over me)*P(over me))  /  (P(missed call|over me)*P(over me)+P(Missed call|Not over me)*P(Not over me))

P(O|M)=(P(M|O)*P(O))/(P(M|O)*P(O)+P(M|N)*P(N))

P(O|M)=(.9*.3)/((.9*.3)+(.1*.7))

P(O|M)=(.27)/(.27)+(.07)) = .27/.34 = .794

Probability that she's over me given that she didn't answer the phone: 79.4%

TT_TT

 

EDIT: Something I've noticed here is that people are pointing to the chosen priors and saying that they seem unrealistic.

In our 3-year relationship, she almost never missed my calls, or if she did she would contact me back as soon as she realized that she missed my call.  In the current situation, she did no such thing.

EDIT2: Yes, we broke up.  Sorry I didn't make that clear.

[LINK] If correlation doesn’t imply causation, then what does?

4 Strilanc 12 July 2013 05:39AM

A post about how, for some causal models, causal relationships can be inferred without doing experiments that control one of the random variables.

If correlation doesn’t imply causation, then what does?

To help address problems like the two example problems just discussed, Pearl introduced a causal calculus. In the remainder of this post, I will explain the rules of the causal calculus, and use them to analyse the smoking-cancer connection. We’ll see that even without doing a randomized controlled experiment it’s possible (with the aid of some reasonable assumptions) to infer what the outcome of a randomized controlled experiment would have been, using only relatively easily accessible experimental data, data that doesn’t require experimental intervention to force people to smoke or not, but which can be obtained from purely observational studies.

The Logic of the Hypothesis Test: A Steel Man

5 Matt_Simpson 21 February 2013 06:19AM

Related to: Beyond Bayesians and Frequentists

Update: This comment by Cyan clearly explains the mistake I made - I forgot that the ordering of the hypothesis space is important is necessary for hypothesis testing to work. I'm not entirely convinced that NHST can't be recast in some "thin" theory of induction that may well change the details of the actual test, but I have no idea how to formalize this notion of a "thin" theory and most of the commenters either 1) misunderstood my aim (my fault, not theirs) or 2) don't think it can be formalized.

I'm teaching an econometrics course this semester and one of the things I'm trying to do is make sure that my students actually understand the logic of the hypothesis test. You can motivate it in terms of controlling false positives but that sort of interpretation doesn't seem to be generally applicable. Another motivation is a simple deductive syllogism with a small but very important inductive component. I'm borrowing the idea from a something we discussed in a course I had with Mark Kaiser - he called it the "nested syllogism of experimentation." I think it applies equally well to most or even all hypothesis tests. It goes something like this:

1. Either the null hypothesis or the alternative hypothesis is true.

2. If the null hypothesis is true, then the data has a certain probability distribution.

3. Under this distribution, our sample is extremely unlikely.

4. Therefore under the null hypothesis, our sample is extremely unlikely.

5. Therefore the null hypothesis is false.

6. Therefore the alternative hypothesis is true.

An example looks like this:

Suppose we have a random sample from a population with a normal distribution that has an unknown mean and unknown variance . Then:

1. Either or where is some constant.

2. Construct the test statistic where is the sample size, is the sample mean, and is the sample standard deviation.

3. Under the null hypothesis, has a distribution with degrees of freedom.

4. is really small under the null hypothesis (e.g. less than 0.05).

5. Therefore the null hypothesis is false.

6. Therefore the alternative hypothesis is true.

What's interesting to me about this process is that it almost tries to avoid induction altogether. Only the move from step 4 to 5 seems anything like an inductive argument. The rest is purely deductive - though admittedly it takes a couple premises in order to quantify just how likely our sample was and that surely has something to do with induction. But it's still a bit like solving the problem of induction by sweeping it under the rug then putting a big heavy deduction table on top so no one notices the lumps underneath. 

This sounds like it's a criticism, but actually I think it might be a virtue to minimize the amount of induction in your argument. Suppose you're really uncertain about how to handle induction. Maybe you see a lot of plausible sounding approaches, but you can poke holes in all of them. So instead of trying to actually solve the problem of induction, you set out to come up with a process which is robust to alternative views of induction. Ideally, if one or another theory of induction turns out to be correct, you'd like it to do the least damage possible to any specific inductive inferences you've made. One way to do this is to avoid induction as much as possible so that you prevent "inductive contamination" spreading to everything you believe. 

That's exactly what hypothesis testing seems to do. You start with a set of premises and keep deriving logical conclusions from them until you're forced to say "this seems really unlikely if a certain hypothesis is true, so we'll assume that the hypothesis is false" in order to get any further. Then you just keep on deriving logical conclusions with your new premise. Bayesians start yelling about the base rate fallacy in the inductive step, but they're presupposing their own theory of induction. If you're trying to be robust to inductive theories, why should you listen to a Bayesian instead of anyone else?

Now does hypothesis testing actually accomplish induction that is robust to philosophical views of induction? Well, I don't know - I'm really just spitballing here. But it does seem to be a useful steel man.

 

On the Importance of Systematic Biases in Science

26 gwern 20 January 2013 09:39PM

From pg812-1020 of Chapter 8 “Sufficiency, Ancillarity, And All That” of Probability Theory: The Logic of Science by E.T. Jaynes:

The classical example showing the error of this kind of reasoning is the fable about the height of the Emperor of China. Supposing that each person in China surely knows the height of the Emperor to an accuracy of at least ±1 meter, if there are N=1,000,000,000 inhabitants, then it seems that we could determine his height to an accuracy at least as good as

(8-49)

merely by asking each person’s opinion and averaging the results.

The absurdity of the conclusion tells us rather forcefully that the rule is not always valid, even when the separate data values are causally independent; it requires them to be logically independent. In this case, we know that the vast majority of the inhabitants of China have never seen the Emperor; yet they have been discussing the Emperor among themselves and some kind of mental image of him has evolved as folklore. Then knowledge of the answer given by one does tell us something about the answer likely to be given by another, so they are not logically independent. Indeed, folklore has almost surely generated a systematic error, which survives the averaging; thus the above estimate would tell us something about the folklore, but almost nothing about the Emperor.

We could put it roughly as follows:

error in estimate = (8-50)

where S is the common systematic error in each datum, R is the RMS ‘random’ error in the individual data values. Uninformed opinions, even though they may agree well among themselves, are nearly worthless as evidence. Therefore sound scientific inference demands that, when this is a possibility, we use a form of probability theory (i.e. a probabilistic model) which is sophisticated enough to detect this situation and make allowances for it.

As a start on this, equation (8-50) gives us a crude but useful rule of thumb; it shows that, unless we know that the systematic error is less than about of the random error, we cannot be sure that the average of a million data values is any more accurate or reliable than the average of ten1. As Henri Poincare put it: “The physicist is persuaded that one good measurement is worth many bad ones.” This has been well recognized by experimental physicists for generations; but warnings about it are conspicuously missing in the “soft” sciences whose practitioners are educated from those textbooks.

Or pg1019-1020 Chapter 10 “Physics of ‘Random Experiments’”:

…Nevertheless, the existence of such a strong connection is clearly only an ideal limiting case unlikely to be realized in any real application. For this reason, the law of large numbers and limit theorems of probability theory can be grossly misleading to a scientist or engineer who naively supposes them to be experimental facts, and tries to interpret them literally in his problems. Here are two simple examples:

  1. Suppose there is some random experiment in which you assign a probability p for some particular outcome A. It is important to estimate accurately the fraction f of times A will be true in the next million trials. If you try to use the laws of large numbers, it will tell you various things about f; for example, that it is quite likely to differ from p by less than a tenth of one percent, and enormously unlikely to differ from p by more than one percent. But now, imagine that in the first hundred trials, the observed frequency of A turned out to be entirely different from p. Would this lead you to suspect that something was wrong, and revise your probability assignment for the 101’st trial? If it would, then your state of knowledge is different from that required for the validity of the law of large numbers. You are not sure of the independence of different trials, and/or you are not sure of the correctness of the numerical value of p. Your prediction of f for a million trials is probably no more reliable than for a hundred.
  2. The common sense of a good experimental scientist tells him the same thing without any probability theory. Suppose someone is measuring the velocity of light. After making allowances for the known systematic errors, he could calculate a probability distribution for the various other errors, based on the noise level in his electronics, vibration amplitudes, etc. At this point, a naive application of the law of large numbers might lead him to think that he can add three significant figures to his measurement merely by repeating it a million times and averaging the results. But, of course, what he would actually do is to repeat some unknown systematic error a million times. It is idle to repeat a physical measurement an enormous number of times in the hope that “good statistics” will average out your errors, because we cannot know the full systematic error. This is the old “Emperor of China” fallacy…

Indeed, unless we know that all sources of systematic error - recognized or unrecognized - contribute less than about one-third the total error, we cannot be sure that the average of a million measurements is any more reliable than the average of ten. Our time is much better spent in designing a new experiment which will give a lower probable error per trial. As Poincare put it, “The physicist is persuaded that one good measurement is worth many bad ones.”2 In other words, the common sense of a scientist tells him that the probabilities he assigns to various errors do not have a strong connection with frequencies, and that methods of inference which presuppose such a connection could be disastrously misleading in his problems.

I excerpted & typed up these quotes for use in my DNB FAQ appendix on systematic problems; the applicability of Jaynes’s observations to things like publication bias is obvious. See also http://lesswrong.com/lw/g13/against_nhst/


  1. If I am understanding this right, Jaynes’s point here is that the random error shrinks towards zero as N increases, but this error is added onto the “common systematic error” S, so the total error approaches S no matter how many observations you make and this can force the total error up as well as down (variability, in this case, actually being helpful for once). So for example, ; with N=100, it’s 0.43; with N=1,000,000 it’s 0.334; and with N=1,000,000 it equals 0.333365 etc, and never going below the original systematic error of . This leads to the unfortunate consequence that the likely error of N=10 is 0.017<x<0.64956 while for N=1,000,000 it is the similar range 0.017<x<0.33433 - so it is possible that the estimate could be exactly as good (or bad) for the tiny sample as compared with the enormous sample, since neither can do better than 0.017!

  2. Possibly this is what Lord Rutherford meant when he said, “If your experiment needs statistics you ought to have done a better experiment”.

Against NHST

57 gwern 21 December 2012 04:45AM

A summary of standard non-Bayesian criticisms of common frequentist statistical practices, with pointers into the academic literature.

continue reading »

Statistical checks on some social science

17 NancyLebovitz 17 December 2012 05:23PM

Simonsohn, a social scientist, investigates bad use of statistics in his field.

A few good quotes:

The three social psychologists set up a test experiment, then played by current academic methodologies and widely permissible statistical rules. By going on what amounted to a fishing expedition (that is, by recording many, many variables but reporting only the results that came out to their liking); by failing to establish in advance the number of human subjects in an experiment; and by analyzing the data as they went, so they could end the experiment when the results suited them, they produced a howler of a result, a truly absurd finding. They then ran a series of computer simulations using other experimental data to show that these methods could increase the odds of a false-positive result—a statistical fluke, basically—to nearly two-thirds.

Laugh or cry?:"He prefers psychology’s close-up focus on the quirks of actual human minds to the sweeping theory and deduction involved in economics."

Last summer, not long after Sanna and Smeesters left their respective universities, Simonsohn laid out his approach to fraud-busting in an online article called “Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone”. Afterward, his inbox was flooded with tips from strangers. People wanted him to investigate election results, drug trials, the work of colleagues they’d long doubted. He has not replied to these messages. Making a couple of busts is one thing. Assuming the mantle of the social sciences’ full-time Grand Inquisitor would be quite another.

This looks like a clue that there's work available for anyone who knows statistics. Eventually, there will be an additional line of work for how to tell whether a forensic statistician is competent.

 

Analyzing FF.net reviews of 'Harry Potter and the Methods of Rationality'

25 gwern 03 November 2012 11:47PM

The unprecedented gap in Methods of Rationality updates prompts musing about whether readership is increasing enough & what statistics one would use; I write code to download FF.net reviews, clean it, parse it, load into R, summarize the data & depict it graphically, run linear regression on a subset & all reviews, note the poor fit, develop a quadratic fit instead, and use it to predict future review quantities.

Then, I run a similar analysis on a competing fanfiction to find out when they will have equal total review-counts. A try at logarithmic fits fails; fitting a linear model to the previous 100 days of _MoR_ and the competitor works much better, and they predict a convergence in <5 years.

Master version: http://www.gwern.net/hpmor#analysis

[Link] Think Bayes: Bayesian Statistics Made Simple

9 Jayson_Virissimo 10 October 2012 08:11AM

Allen B. Downey has made a rough draft version of his new introduction to Bayesian statistics textbook available for free online. You can download the pdf version here or read the HTML here.

Think Bayes is an introduction to Bayesian statistics using computational methods. This version of the book is a rough draft. I am making this draft available for comments, but it comes with the warning that it is probably full of errors.

The Fallacy of Large Numbers

20 dspeyer 12 August 2012 06:39PM

I've been seeing this a lot lately, and I don't think it's been written about here before

Let's start with a motivating example.  Suppose you have a fleet of 100 cars (or horses, or people, or whatever).  For any given car, on any given day, there's a 3% chance that it'll be out for repairs (or sick, or attending grandmothers' funerals, or whatever).  For simplicity's sake, assume all failures are uncorrelated.  How many cars can you afford to offer to customers each day?  Take a moment to think of a number.

Well, 3% failure means 97% success.  So we expect 97 to be available and can afford to offer 97.  Does that sound good?  Take a moment to answer.

Well, maybe not so good.  Sometimes we'll get unlucky.  And not being able to deliver on a contract is painful.  Maybe we should reserve 4 and only offer 96.  Or maybe we'll play it very safe and reserve twice the needed number.  6 in reserve, 94 for customers.  But is that overkill?  Take note of what you're thinking now.

The likelihood of having more than 4 unavailable is 18%.  The likelihood of having more than 6 unavailable is 3.1%.  About once a month.  Even reserving 8, requiring 9 failures to get you in trouble, gets you in trouble 0.3% of the time.  More than once a year.  Reserving 9 -- three times the expected -- gets the risk down to 0.087% or a little less than every three years.  A number we can finally feel safe with.

So much for expected values.  What happened to the Law of Large Numbers?  Short answer: 100 isn't large.

The Law of Large Numbers states that for sufficiently large samples, the results look like the expected value (for any reasonable definition of like).

The Fallacy of Large Numbers states that your numbers are sufficiently large.

This doesn't just apply to expected values.  It also applies to looking at a noisy signal and handwaving that the noise will average away with repeated measurements.  Before you can say something like that, you need to look at how many measurements, and how much noise, and crank out a lot of calculations.  This variant is particularly tricky because you often don't have numbers on how much noise there is, making it hard to do the calculation.  When the calculation is hard, the handwave is more tempting.  That doesn't make it more accurate.

I don't know of any general tools for saying when statistical approximations become safe.  The best thing I know is to spot-check like I did above.  Brute-forcing combinatorics sounds scary, but Wolfram Alpha can be your friend (as above).  So can python, which has native bignum support.  Python has a reputation as being slow for number crunching, but with n<1000 and a modern cpu it usually doesn't matter.

One warning sign is if your tools were developed in a very different context than where you're using them.  Some approximations were invented for dealing with radioactive decay, where n resembles Avogadro's Number.  Applying these tools to the American population is risky.  Some were developed for the American population.  Applying them to students in your classroom is risky.

Another danger is that your dataset can shrink.  If you've validated your tools for your entire dataset, and then thrown out some datapoints and divided the rest along several axes, don't be surprised if some of your data subsets are now too small for your tools.

This fallacy is related to "assuming events are uncorrelated" and "assuming distributions are normal".  It's a special case of "choosing statistical tools based on how easy they are to use whether they're applicable to your use-case or not".

[Retracted] Simpson's paradox strikes again: there is no great stagnation?

30 CarlShulman 30 July 2012 05:55PM

ETA: The table linked by Landsburg has been called into serious question by Evan Soltas [H.T. CronoDAS]. I edited the post to leave only the table to provide context for the comment discussion of its status.

Economist Steve Landsburg has a post [H.T. David Henderson] about the supposed stagnation of median wages in the United States in recent decades. In the linked table median wages have risen for: 

Hope Function

22 gwern 01 July 2012 03:40PM

Yesterday I finished transcribing "The Ups and Downs of the Hope Function In a Fruitless Search". This is a statistics & psychology paper describing a simple probabilistic search problem and the sheer difficulty subjects have in producing the correct Bayesian answer. Besides providing a great but simple illustration of the mind projection fallacy in action, the simple search problem maps onto a number of forecasting problems: the problem may be looking in a desk for a letter that may not be there, but we could also look at a problem in which we check every year for the creation of AI and ask how our beliefs change over time - which turns out to defuse a common scoffing criticism of past technological forecasting. (This last problem was why I went back and used it, after I first read of it.)

The math is all simple - arithmetic and one application of Bayes's law - so I think all LWers can enjoy it, and it has amusing examples to analyze. I have also taken the trouble to annotate it with Wikipedia links, relevant materials, and many PDF links (some jailbroken just for this transcript). I hope everyone finds it as interesting as I did.

I thank John Salvatier for doing the ILL request which got me a scan of this book chapter.

Let's all learn stats!

13 mstevens 12 June 2012 03:31PM

I want to become Stronger!

Udacity are running an Introduction to Statistics course starting on the 25th June 2012.

Many of us could stand to learn some more stats, I certainly could. This seems like a great opportunity!

It is mandatory for all LWers to enroll in this course.

Update: the last line was a joke. Obviously people are not finding it funny. Sorry.

Covariance in your sample vs covariance in the general population

27 RomeoStevens 16 May 2012 12:17AM

A popular-media take on a subtle problem in sampling.  I found the graph quite illustrative.

http://www.theatlantic.com/business/archive/2012/05/when-correlation-is-not-causation-but-something-much-more-screwy/256918/

"The Journal of Real Effects"

13 CarlShulman 05 March 2012 03:07AM

Luke's recent post mentioned that The Lancet has a policy encouraging the advance registration of clinical trials, while mine examined an apparent case study of data-peeking and on-the-fly transformation of studies. But how much variation is there across journals on such dimensions? Are there journals that buck the standards of their fields (demanding registration, p=0.01 rather than p=0.05 where the latter is typical in the field, advance specification of statistical analyses and subject numbers, etc)? What are some of the standouts? Are there fields without any such?

I wonder if there is a niche for a new open-access journal, along the lines of PLoS, with standards strict enough to reliably exclude false-positives. Some possible titles:

 

  • The Journal of Real Effects
  • (Settled) Science
  • Probably True
  • Journal of Non-Null Results, Really
  • Too Good to Be False
  • _________________?

 

Michael Nielsen explains Judea Pearl's causality

18 gwern 24 January 2012 07:35PM

Michael Nielsen has posted a long essay explaining his understanding of the Pearlean causal DAG model. I don't understand more than half, but that's much more than I got out of a few other papers. Strongly recommended for anyone interested in the topic.

Online Course in Evidence-Based Medicine

5 Normal_Anomaly 02 December 2011 12:22AM

The Foundation for Blood Research has created an online course in Evidence-Based Medicine, aimed at "advanced high school science students, college students, nursing students, and 1st or 2nd year medical students." It focuses on evaluating research papers and applying statistics to medical diagnosis. I have taken this course, and it was useful practice in Bayesian reasoning.

The course involves working through a couple case studies of ER patients. Students will observe the patient, review research on relevant diagnostic tests, and calculate posterior probabilities given the available information. For instance: once case involves a woman who may have bacterial meningitis, but her spinal fluid test results are mixed. Students then read parts of a paper describing the success of different components of the spinal fluid test as predictors of meningitis.

The course is self-paced and highly modular, alternating between videos, multiple choice or calculation questions, and short written submissions. There is no in-course interaction between students taking the same course, but it is divided into "class sections" for the convenience of teachers who want to observe their students. It works well with Firefox and Safari, and slightly less well (but still easily usable) with Internet Explorer.

Anyone who is interested or wants more information, look at their website or ask me in the comments. Once a decent number of people have shown some interest, I will contact one of the site administrators and he'll set up an official class section for us.

EDIT: I have contacted the site administrator, we should have a class section available soon. Section name and info on how to log in will be posted shortly.

EDIT2: The course section is up: go to the http://evidenceworksremote.com/courses/ and then find the Less Wrong Community course. When you click on the course listing you will be asked to register. Once you receive the acknowledging email return to the course and enter the "enrollment" key: LW101 . I will be able to see your responses to the questions and possibly able to provide feedback. Once you have completed the course, Dr. Allan, who is one of the developers, would appreciate feedback by email.

Help with a (potentially Bayesian) statistics / set theory problem?

2 joshkaufman 10 November 2011 10:30PM

Update: as it turns out, this is a voting system problem, which is a difficult but well-studied topic. Potential solutions include Ranked Pairs (complicated) and BestThing (simpler). Thanks to everyone for helping me think this through out loud, and for reminding me to kill flies with flyswatters instead of bazookas.


I'm working on a problem that I believe involves Bayes, I'm new to Bayes and a bit rusty on statistics, and I'm having a hard time figuring out where to start. (EDIT: it looks like set theory may also be involved.) Your help would be greatly appreciated.

Here's the problem: assume a set of 7 different objects. Two of these objects are presented at random to a participant, who selects whichever one of the two objects they prefer. (There is no "indifferent" option.) The order of these combinations is not important, and repeated combinations are not allowed.

Basic combination theory says there are 21 different possible combinations: (7!) / ( (2!) * (7-2)! ) = 21.

Now, assume the researcher wants to know which single option has the highest probability of being the "most preferred" to a new participant based on the responses of all previous participants. To complicate matters, each participant can leave at any time, without completing the entire set of 21 responses. Their responses should still factor into the final result, even if they only respond to a single combination.

At the beginning of the study, there are no priors. (CORRECTION via dlthomas: "There are necessarily priors... we start with no information about rankings... and so assume a 1:1 chance of either object being preferred.) If a participant selects B from {A,B}, the probability of B being the "most preferred" object should go up, and A should go down, if I'm understanding correctly.

NOTE: Direct ranking of objects 1-7 (instead of pairwise comparison) isn't ideal because it takes longer, which may encourage the participant to rationalize. The "pick-one-of-two" approach is designed to be fast, which is better for gut reactions when comparing simple objects like words, photos, etc.

The ideal output looks like this: "Based on ___ total responses, participants prefer Object A. Object A is preferred __% more than Object B (the second most preferred), and ___% more than Object C (the third most preferred)."

Questions:

1. Is Bayes actually the most straightforward way of calculating the "most preferred"? (If not, what is? I don't want to be Maslow's "man with a hammer" here.)

2. If so, can you please walk me through the beginning of how this calculation is done, assuming 10 participants?

Thanks in advance!

Thinking Statistically [ebook]

6 Dreaded_Anomaly 01 November 2011 03:36AM

Uri Bram, a recent Princeton graduate, has just published an ebook called Thinking Statistically. The book is aimed at conveying a few important statistical concepts (selection bias, endogeneity and correlation vs. causation, Bayes theorem and base rate neglect) to a general audience. The official product description:

This book will show you how to think like a statistician, without worrying about formal statistical techniques. Along the way we'll see why supposed Casanovas might actually be examples of the Base Rate Fallacy; how to use Bayes' Theorem to assess whether your partner is cheating on you; and why you should never use Mark Zuckerberg as an example for anything. See the world in a whole new light, and make better decisions and judgements without ever going near a t-test. Think. Think Statistically.

Less Wrong members will be familiar with these topics, but we should keep this book in mind as a convenient method of getting friends, relatives, acquaintances, and others interested in understanding rationality.

Eliezer's An Intuitive Explanation of Bayes' Theorem gets a shout-out in the Recommended Reading at the end.

Why epidemiology will not correct itself

38 gwern 11 August 2011 12:54AM

We're generally familiar here with the appalling state of medical and dietary research, where most correlations turn out to be bogus. (And if we're not, I have collected a number of links on the topic in my DNB FAQ that one can read, see http://www.gwern.net/DNB%20FAQ#flaws-in-mainstream-science-and-psychology - probably the best first link to read would be Ioannidis's “Why Most Published Research Findings Are False”.)

I recently found a talk arguing that this problem was worse than one might assume, with false positives in the >80% range, and more interestingly, why the rate is so high and will remain high for the foreseeable future. Young asserts, pointing to papers and textbooks by epidemiologists, that they are perfectly aware of what the Bonferroni correction does (and why one would use it) and that they choose to not use it because they do not want to risk any false negatives. (Young also conducts some surveys showing less interest in public sharing of data and other good things like that, but that seems to me to be much less important than the statistical tradeoffs.)

There are three papers online that seem representative:

  1. Rothman (1990)
  2. Perneger (1998)
  3. Vandenbroucke, PLoS Med (2008)

Reading them is a little horrifying when one considers the costs of the false positives, all the people trying to stay healthy by following what is only random noise, and the general (and justified!) contempt for science by those aware of the false positive rate. (I enlarge on this vein of thought on Reddit. The recent kerfluffle about whether salt really is bad for you - medical advice that has stressed millions and will cost more millions due to New York City's war on salt - is a reminder of what is at stake.)

The take-away, I think, is to resolutely ignore anything to do with diet & exercise that is not a randomized trial. Correlations may be worth paying attention to in other areas but not in health.

Individual Deniability, Statistical Honesty

43 Alicorn 09 August 2011 04:17AM

If you have a lot of people to question about something, and they have a motivation to lie, consider this clever use of a six-sided die.

If the farmer tossed the die and got a one, they had to respond "yes" to the surveyor's question. If they got a six, they had to say "no." The rest of the time, they were asked to answer honestly. The die was hidden from the person who was conducting the survey, so they never knew what number the farmer was responding to.

Suddenly, the number of "yes" responses to the leopard question started coming up by more than just one-sixth.

Free Stats Textbook: Principles of Uncertainty

20 badger 24 May 2011 07:45PM

Joseph Kadane, emeritus at Carnegie Mellon, released his new statistics textbook Principles of Uncertainty as a free pdf. The book is written from a Bayesian perspective, covering basic probability, decision theory, conjugate distribution analysis, hierarchical modeling, MCMC simulation, and game theory. The focus is mathematical, but computation with R is touched on. A solid understanding of calculus seems sufficient to use the book. Curiously, the author devotes a fair number of pages to developing the McShane integral, which is equivalent to Lebesgue integration on the real line. There are lots of other unusual topics you don't normally see in an intermediate statistics textbook.

Having came across this today, I can't say whether it is actually very good or not, but the range of topics seems perfectly suited to Less Wrong readers.

The Joys of Conjugate Priors

41 TCB 21 May 2011 02:41AM

(Warning: this post is a bit technical.)

Suppose you are a Bayesian reasoning agent.  While going about your daily activities, you observe an event of type .  Because you're a good Bayesian, you have some internal parameter  which represents your belief that  will occur.

Now, you're familiar with the Ways of Bayes, and therefore you know that your beliefs must be updated with every new datapoint you perceive.  Your observation of  is a datapoint, and thus you'll want to modify .  But how much should this datapoint influence ?  Well, that will depend on how sure you are of  in the first place.  If you calculated  based on a careful experiment involving hundreds of thousands of observations, then you're probably pretty confident in its value, and this single observation of  shouldn't have much impact.  But if your estimate of  is just a wild guess based on something your unreliable friend told you, then this datapoint is important and should be weighted much more heavily in your reestimation of .

Of course, when you reestimate , you'll also have to reestimate how confident you are in its value.  Or, to put it a different way, you'll want to compute a new probability distribution over possible values of .  This new distribution will be , and it can be computed using Bayes' rule:



Here, since  is a parameter used to specify the distribution from which  is drawn, it can be assumed that computing  is straightforward.   is your old distribution over , which you already have; it says how accurate you think different settings of the parameters are, and allows you to compute your confidence in any given value of .  So the numerator should be straightforward to compute; it's the denominator which might give you trouble, since for an arbitrary distribution, computing the integral is likely to be intractable.

But you're probably not really looking for a distribution over different parameter settings; you're looking for a single best setting of the parameters that you can use for making predictions.  If this is your goal, then once you've computed the distribution , you can pick the value of  that maximizes it.  This will be your new parameter, and because you have the formula , you'll know exactly how confident you are in this parameter.

In practice, picking the value of  which maximizes  is usually pretty difficult, thanks to the presence of local optima, as well as the general difficulty of optimization problems.  For simple enough distributions, you can use the EM algorithm, which is guarranteed to converge to a local optimum.  But for more complicated distributions, even this method is intractable, and approximate algorithms must be used.  Because of this concern, it's important to keep the distributions  and  simple.  Choosing the distribution  is a matter of model selection; more complicated models can capture deeper patterns in data, but will take more time and space to compute with.

It is assumed that the type of model is chosen before deciding on the form of the distribution .  So how do you choose a good distribution for ?  Notice that every time you see a new datapoint, you'll have to do the computation in the equation above.  Thus, in the course of observing data, you'll be multiplying lots of different probability distributions together.  If these distributions are chosen poorly,  could get quite messy very quickly.

If you're a smart Bayesian agent, then, you'll pick  to be a conjugate prior to the distribution .  The distribution  is conjugate to  if multiplying these two distributions together and normalizing results in another distribution of the same form as .

Let's consider a concrete example: flipping a biased coin.  Suppose you use the bernoulli distribution to model your coin.  Then it has a parameter  which represents the probability of gettings heads.  Assume that the value 1 corresponds to heads, and the value 0 corresponds to tails.  Then the distribution of the outcome  of the coin flip looks like this:



It turns out that the conjugate prior for the bernoulli distribution is something called the beta distribution.  It has two parameters,  and , which we call hyperparameters because they are parameters for a distribution over our parameters.  (Eek!)

The beta distribution looks like this:



Since  represents the probability of getting heads, it can take on any value between 0 and 1, and thus this function is normalized properly.

Suppose you observe a single coin flip  and want to update your beliefs regarding .  Since the denominator of the beta function in the equation above is just a normalizing constant, you can ignore it for the moment while computing , as long as you promise to normalize after completing the computation:



Normalizing this equation will, of course, give another beta distribution, confirming that this is indeed a conjugate prior for the bernoulli distribution.  Super cool, right?

If you are familiar with the binomial distribution, you should see that the numerator of the beta distribution in the equation for  looks remarkably similar to the non-factorial part of the binomial distribution.  This suggests a form for the normalization constant:



The beta and binomial distributions are almost identical.  The biggest difference between them is that the beta distribution is a function of , with  and  as prespecified parameters, while the binomial distribution is a function of , with  and  as prespecified parameters.  It should be clear that the beta distribution is also conjugate to the binomial distribution, making it just that much awesomer.

Another difference between the two distributions is that the beta distribution uses gammas where the binomial distribution uses factorials.  Recall that the gamma function is just a generalization of the factorial to the reals; thus, the beta distribution allows  and  to be any positive real number, while the binomial distribution is only defined for integers.  As a final note on the beta distribution, the -1 in the exponents is not philosophically significant; I think it is mostly there so that the gamma functions will not contain +1s.  For more information about the mathematics behind the gamma function and the beta distribution, I recommend checking out this pdf: http://www.mhtl.uwaterloo.ca/courses/me755/web_chap1.pdf.  It gives an actual derivation which shows that the first equation for  is equivalent to the second equation for , which is nice if you don't find the argument by analogy to the binomial distribution convincing.

So, what is the philosophical significance of the conjugate prior?  Is it just a pretty piece of mathematics that makes the computation work out the way we'd like it to?  No; there is deep philosophical significance to the form of the beta distribution.

Recall the intuition from above: if you've seen a lot of data already, then one more datapoint shouldn't change your understanding of the world too drastically.  If, on the other hand, you've seen relatively little data, then a single datapoint could influence your beliefs significantly.  This intuition is captured by the form of the conjugate prior.   and  can be viewed as keeping track of how many heads and tails you've seen, respectively.  So if you've already done some experiments with this coin, you can store that data in a beta distribution and use that as your conjugate prior.  The beta distribution captures the difference between claiming that the coin has 30% chance of coming up heads after seeing 3 heads and 7 tails, and claiming that the coin has a 30% chance of coming up heads after seeing 3000 heads and 7000 tails.

Suppose you haven't observed any coin flips yet, but you have some intuition about what the distribution should be.  Then you can choose values for  and  that represent your prior understanding of the coin.  Higher values of  indicate more confidence in your intuition; thus, choosing the appropriate hyperparameters is a method of quantifying your prior understanding so that it can be used in computation.   and  will act like "imaginary data"; when you update your distribution over  after observing a coin flip , it will be like you already saw  heads and  tails before that coin flip.
 
If you want to express that you have no prior knowledge about the system, you can do so by setting  and to 1.  This will turn the beta distribution into a uniform distribution.  You can also use the beta distribution to do add-N smoothing, by setting  and  to both be N+1.  Setting the hyperparameters to a value lower than 1 causes them to act like "negative data", which helps avoid overfitting  to noise in the actual data.

In conclusion, the beta distribution, which is a conjugate prior to the bernoulli and binomial distributions, is super awesome.  It makes it possible to do Bayesian reasoning in a computationally efficient manner, as well as having the philosophically satisfying interpretation of representing real or imaginary prior data.  Other conjugate priors, such as the dirichlet prior for the multinomial distribution, are similarly cool.

Probability updating question - 99.9999% chance of tails, heads on first flip

2 nuckingfutz 16 May 2011 12:58AM

This isn't intended as a full discussion, I'm just a little fuzzy on how a Bayesian update or any other kind of probability update would work in this situation.

You have a coin with a 99.9999% chance of coming up tails, and a 100% chance of coming up either tails or heads.

You've deduced these odds by studying the weight of the coin. You are 99% confident of your results. You have not yet flipped it.

You have no other information before flipping the coin.

You flip the coin once. It comes up heads.

How would you update your probability estimates?

 

(this isn't a homework assignment; rather I was discussing with someone how strong the anthropic principle is. Unfortunately my mathematic abilities can't quite comprehend how to assemble this into any form I can work with.)

 

A Problem with Human Intuition about Conventional Statistics:

-1 Kai-o-logos 20 April 2011 11:41PM

 

As an aspiring scientist, I hold the Truth above all. As Hodgell once said, "That which can be destroyed by the truth should be." But what if the thing that is holding our pursuit of the Truth back is our own system? I will share an example of an argument I overheard between a theist and an atheist once - showing an instance where human intuition might fail us.

*General Transcript*

Atheist: Prove to me that God exists!

Theist: He obviously exists – can’t you see that plants growing, humans thinking, [insert laundry list here], is all His work?

Atheist: Those can easily be explained by evolutionary mechanisms!

Theist: Well prove to me that God doesn’t exist!

Atheist: I don’t have to! There may be an invisible pink unicorn baby flying around my head, there is probably not. I can’t prove that there is no unicorn, that doesn’t mean it exists!

Theist: That’s just complete reductio ad ridiculo, you could do infrared, polaroid, uv, vacuum scans, and if nothing appears it is statistically unlikely that the unicorn exists! But God is something metaphysical, you can’t do that with Him!

Atheist: Well Nietzsche killed metaphysics when he killed God. God is dead!

Theist: That is just words without argument. Can you actually…..

As one can see, the biggest problem is determining burden of proof.  Statistically speaking, this is much like the problem of defining the null hypothesis.

A theist would define: H0 : God exists. Ha: God does not exist.

An atheist would define: H0: God does not exist. Ha God does exist.

Both conclude that there is no significant evidence hinting at Ha over H­0. Furthermore, and this is key, they both accept the null hypothesis. The correct statistical term for the proper conclusion if insignificant evidence exists for the acceptance of the alternate hypothesis is that one fails to reject the null hypothesis. However, human intuition fails to grasp this concept, and think in black and white, and instead we tend to accept the null hypothesis.

This is not so much a problem with statistics as it is with human intuition. Statistics usually take this form because simultaneous 100+ hypothesis considerations are taxing on the human brain. Therefore, we think of hypotheses to be defended or attacked, but not considered neutrally.

Considered a Bayesian outlook on this problem.

There are two possible outcomes: At least one deity exists(D). No deities exist(N).

Let us consider the natural evidence (Let’s call this E) before us.

P(D+N) = 1. P[(D+N)|E] = 1. P(D|E) + P(N|E) = 1. P(D|E) = 1- P(N|E).

Although the calculation of the prior probability of the probability of god existing is rather strange, and seems to reek of bias, I still argue that this is a better system of analysis than just the classical H0 and Ha, because it effectively compares the probability of D and N with no bias inherent in the brain’s perception of the system.

Example such as these, I believe, show the flaws that result from faulty interpretations of the classical system. If instead we introduced a Bayesian perspective – the faulty interpretation would vanish.

This is a case for the expanded introduction of Bayesian probability theory. Even if cannot be applied correctly to every problem, even if it is apparently more complicated than the standard method they teach in statistics class ( I disagree here), it teaches people to analyze situations from a more objective perspective.

And if we can avoid Truth-seekers going awry due to simple biases such as those mentioned above, won’t we be that much closer to finding Truth?

 

Visualizing Bayesian Inference [link]

11 Dreaded_Anomaly 14 March 2011 08:10PM

Galton Visualizing Bayesian Inference (article @ CHANCE)

Excerpt:

What does Bayes Theorem look like? I do not mean what does the formula—

—look like; these days, every statistician knows that. I mean, how can we visualize the cognitive content of the theorem? What picture can we appeal to with the hope that any person curious about the theorem may look at it, and, after a bit of study say, “Why, that is clear—I can indeed see what is happening!”

Francis Galton could produce just such a picture; in fact, he built and operated a machine in 1877 that performs that calculation. But, despite having published the picture in Nature and the Proceedings of the Royal Institution of Great Britain, he never referred to it again—and no reader seems to have appreciated what it could accomplish until recently.

Schematics for the machine and its algorithm can be found at the link. This is a really cool design, and maybe it can aid Eliezer's and others' efforts to help people understand Bayes' Theorem.

Bayesianism in the face of unknowns

1 rstarkov 12 March 2011 08:54PM

Suppose I tell you I have an infinite supply of unfair coins. I pick one randomly and flip it, recording the result. I've done this a total of 100 times and they all came out heads. I will pay you $1000 if the next throw is heads, and $10 if it's tails. Each unfair coin is entirely normal, whose "heads" follow a binomial distribution with an unknown p. This is all you know. How much would you pay to enter this game?

I suppose another way to phrase this question is "what is your best estimate of your expected winnings?", or, more generally, "how do you choose the maximum price you'll pay to play this game?"

Observe that the only fact you know about the distribution from which I'm drawing my coins is those 100 outcomes. Importantly, you don't know the distribution of each coin's p in my supply of unfair coins. Can you reasonably assume a specific distribution to make your calculation, and claim that it results in a better best estimate than any other distribution?

Most importantly, can one actually produce a "theoretically sound" expectation here? I.e. one that is calibrated so that if you pay your expected winnings every time and we perform this experiment lots of times then your average winnings will be zero - assuming I'm using the same source of unfair coins each time.

I suspect that the best one can do here is produce a range of values with confidence intervals. So you're 80% confident that the price you should pay to break even in the repeated game is between A80 and B80, 95% confident it's between A95 and B95, etc.

If this is really the best obtainable result, then what is a bayesianist to do with such a result to make their decision? Do you pick a price randomly from a specially crafted distribution, which is 95% likely to produce a value between A95..B95, etc? Or is there a more "bayesian" way?

The Decline Effect and the Scientific Method [link]

12 Dreaded_Anomaly 31 December 2010 01:23AM

The Decline Effect and the Scientific Method (article @ the New Yorker)

First, as a physicist, I do have to point out that this article concerns mainly softer sciences, e.g. psychology, medicine, etc.

A summary of explanations for this effect:

  • "The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out."
  • "Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found."
  • "Richard Palmer... suspects that an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. ... Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results."
  • "According to Ioannidis, the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. ... The current “obsession” with replicability distracts from the real problem, which is faulty design."

These problems are with the proper usage of the scientific method, not the principle of the method itself. Certainly, it's important to address them. I think the reason they appear so often in the softer sciences is that biological entities are enormously complex, and so higher-level ideas that make large generalizations are more susceptible to random error and statistical anomalies, as well as personal bias, conscious and unconscious.

For those who haven't read it, take a look at Richard Feynman on cargo cult science if you want a good lecture on experimental design.

View more: Next