You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

2016 LessWrong Diaspora Survey Analysis: Part Four (Politics, Calibration & Probability, Futurology, Charity & Effective Altruism)

10 ingres 10 September 2016 03:51AM

Politics

The LessWrong survey has a very involved section dedicated to politics. In previous analysis the benefits of this weren't fully realized. In the 2016 analysis we can look at not just the political affiliation of a respondent, but what beliefs are associated with a certain affiliation. The charts below summarize most of the results.

Political Opinions By Political Affiliation



































Miscellaneous Politics

There were also some other questions in this section which aren't covered by the above charts.

PoliticalInterest

On a scale from 1 (not interested at all) to 5 (extremely interested), how would you describe your level of interest in politics?

1: 67 (2.182%)

2: 257 (8.371%)

3: 461 (15.016%)

4: 595 (19.381%)

5: 312 (10.163%)

Voting

Did you vote in your country's last major national election? (LW Turnout Versus General Election Turnout By Country)
Group Turnout
LessWrong 68.9%
Austrailia 91%
Brazil 78.90%
Britain 66.4%
Canada 68.3%
Finland 70.1%
France 79.48%
Germany 71.5%
India 66.3%
Israel 72%
New Zealand 77.90%
Russia 65.25%
United States 54.9%
Numbers taken from Wikipedia, accurate as of the last general election in each country listed at time of writing.

AmericanParties

If you are an American, what party are you registered with?

Democratic Party: 358 (24.5%)

Republican Party: 72 (4.9%)

Libertarian Party: 26 (1.8%)

Other third party: 16 (1.1%)

Not registered for a party: 451 (30.8%)

(option for non-Americans who want an option): 541 (37.0%)

Calibration And Probability Questions

Calibration Questions

I just couldn't analyze these, sorry guys. I put many hours into trying to get them into a decent format I could even read and that sucked up an incredible amount of time. It's why this part of the survey took so long to get out. Thankfully another LessWrong user, Houshalter, has kindly done their own analysis.

All my calibration questions were meant to satisfy a few essential properties:

  1. They should be 'self contained'. I.E, something you can reasonably answer or at least try to answer with a 5th grade science education and normal life experience.
  2. They should, at least to a certain extent, be Fermi Estimable.
  3. They should progressively scale in difficulty so you can see whether somebody understands basic probability or not. (eg. In an 'or' question do they put a probability of less than 50% of being right?)

At least one person requested a workbook, so I might write more in the future. I'll obviously write more for the survey.

Probability Questions

Question Mean Median Mode Stdev
Please give the obvious answer to this question, so I can automatically throw away all surveys that don't follow the rules: What is the probability of a fair coin coming up heads? 49.821 50.0 50.0 3.033
What is the probability that the Many Worlds interpretation of quantum mechanics is more or less correct? 44.599 50.0 50.0 29.193
What is the probability that non-human, non-Earthly intelligent life exists in the observable universe? 75.727 90.0 99.0 31.893
...in the Milky Way galaxy? 45.966 50.0 10.0 38.395
What is the probability that supernatural events (including God, ghosts, magic, etc) have occurred since the beginning of the universe? 13.575 1.0 1.0 27.576
What is the probability that there is a god, defined as a supernatural intelligent entity who created the universe? 15.474 1.0 1.0 27.891
What is the probability that any of humankind's revealed religions is more or less correct? 10.624 0.5 1.0 26.257
What is the probability that an average person cryonically frozen today will be successfully restored to life at some future time, conditional on no global catastrophe destroying civilization before then? 21.225 10.0 5.0 26.782
What is the probability that at least one person living at this moment will reach an age of one thousand years, conditional on no global catastrophe destroying civilization in that time? 25.263 10.0 1.0 30.510
What is the probability that our universe is a simulation? 25.256 10.0 50.0 28.404
What is the probability that significant global warming is occurring or will soon occur, and is primarily caused by human actions? 83.307 90.0 90.0 23.167
What is the probability that the human race will make it to 2100 without any catastrophe that wipes out more than 90% of humanity? 76.310 80.0 80.0 22.933

 

Probability questions is probably the area of the survey I put the least effort into. My plan for next year is to overhaul these sections entirely and try including some Tetlock-esque forecasting questions, a link to some advice on how to make good predictions, etc.

Futurology

This section got a bit of a facelift this year. Including new cryonics questions, genetic engineering, and technological unemployment in addition to the previous years.

Cryonics

Cryonics

Are you signed up for cryonics?

Yes - signed up or just finishing up paperwork: 48 (2.9%)

No - would like to sign up but unavailable in my area: 104 (6.3%)

No - would like to sign up but haven't gotten around to it: 180 (10.9%)

No - would like to sign up but can't afford it: 229 (13.8%)

No - still considering it: 557 (33.7%)

No - and do not want to sign up for cryonics: 468 (28.3%)

Never thought about it / don't understand: 68 (4.1%)

CryonicsNow

Do you think cryonics, as currently practiced by Alcor/Cryonics Institute will work?

Yes: 106 (6.6%)

Maybe: 1041 (64.4%)

No: 470 (29.1%)

Interestingly enough, of those who think it will work with enough confidence to say 'yes', only 14 are actually signed up for cryonics.

sqlite> select count(*) from data where CryonicsNow="Yes" and Cryonics="Yes - signed up or just finishing up paperwork";

14

sqlite> select count(*) from data where CryonicsNow="Yes" and (Cryonics="Yes - signed up or just finishing up paperwork" OR Cryonics="No - would like to sign up but unavailable in my area" OR "No - would like to sign up but haven't gotten around to it" OR "No - would like to sign up but can't afford it");

34

CryonicsPossibility

Do you think cryonics works in principle?

Yes: 802 (49.3%)

Maybe: 701 (43.1%)

No: 125 (7.7%)

LessWrongers seem to be very bullish on the underlying physics of cryonics even if they're not as enthusiastic about current methods in use.

The Brain Preservation Foundation also did an analysis of cryonics responses to the LessWrong Survey.

Singularity

SingularityYear

By what year do you think the Singularity will occur? Answer such that you think, conditional on the Singularity occurring, there is an even chance of the Singularity falling before or after this year. If you think a singularity is so unlikely you don't even want to condition on it, leave this question blank.

Mean: 8.110300081581755e+16

Median: 2080.0

Mode: 2100.0

Stdev: 2.847858859055733e+18

I didn't bother to filter out the silly answers for this.

Obviously it's a bit hard to see without filtering out the uber-large answers, but the median doesn't seem to have changed much from the 2014 survey.

Genetic Engineering

ModifyOffspring

Would you ever consider having your child genetically modified for any reason?

Yes: 1552 (95.921%)

No: 66 (4.079%)

Well that's fairly overwhelming.

GeneticTreament

Would you be willing to have your child genetically modified to prevent them from getting an inheritable disease?

Yes: 1387 (85.5%)

Depends on the disease: 207 (12.8%)

No: 28 (1.7%)

I find it amusing how the strict "No" group shrinks considerably after this question.

GeneticImprovement

Would you be willing to have your child genetically modified for improvement purposes? (eg. To heighten their intelligence or reduce their risk of schizophrenia.)

Yes : 0 (0.0%)

Maybe a little: 176 (10.9%)

Depends on the strength of the improvements: 262 (16.2%)

No: 84 (5.2%)

Yes I know 'yes' is bugged, I don't know what causes this bug and despite my best efforts I couldn't track it down. There is also an issue here where 'reduce your risk of schizophrenia' is offered as an example which might confuse people, but the actual science of things cuts closer to that than it does to a clean separation between disease risk and 'improvement'.

 

This question is too important to just not have an answer to so I'll do it manually. Unfortunately I can't easily remove the 'excluded' entries so that we're dealing with the exact same distribution but only 13 or so responses are filtered out anyway.

sqlite> select count(*) from data where GeneticImprovement="Yes";

1100

>>> 1100 + 176 + 262 + 84
1622
>>> 1100 / 1622
0.6781750924784217

67.8% are willing to genetically engineer their children for improvements.

GeneticCosmetic

Would you be willing to have your child genetically modified for cosmetic reasons? (eg. To make them taller or have a certain eye color.)

Yes: 500 (31.0%)

Maybe a little: 381 (23.6%)

Depends on the strength of the improvements: 277 (17.2%)

No: 455 (28.2%)

These numbers go about how you would expect, with people being progressively less interested the more 'shallow' a genetic change is seen as.


GeneticOpinionD

What's your overall opinion of other people genetically modifying their children for disease prevention purposes?

Positive: 1177 (71.7%)

Mostly Positive: 311 (19.0%)

No strong opinion: 112 (6.8%)

Mostly Negative: 29 (1.8%)

Negative: 12 (0.7%)

GeneticOpinionI

What's your overall opinion of other people genetically modifying their children for improvement purposes?

Positive: 737 (44.9%)

Mostly Positive: 482 (29.4%)

No strong opinion: 273 (16.6%)

Mostly Negative: 111 (6.8%)

Negative: 38 (2.3%)

GeneticOpinionC

What's your overall opinion of other people genetically modifying their children for cosmetic reasons?

Positive: 291 (17.7%)

Mostly Positive: 290 (17.7%)

No strong opinion: 576 (35.1%)

Mostly Negative: 328 (20.0%)

Negative: 157 (9.6%)

All three of these seem largely consistent with peoples personal preferences about modification. Were I inclined I could do a deeper analysis that actually takes survey respondents row by row and looks at correlation between preference for ones own children and preference for others.

Technological Unemployment

LudditeFallacy

Do you think the Luddite's Fallacy is an actual fallacy?

Yes: 443 (30.936%)

No: 989 (69.064%)

We can use this as an overall measure of worry about technological unemployment, which would seem to be high among the LW demographic.

UnemploymentYear

By what year do you think the majority of people in your country will have trouble finding employment for automation related reasons? If you think this is something that will never happen leave this question blank.

Mean: 2102.9713740458014

Median: 2050.0

Mode: 2050.0

Stdev: 1180.2342850727339

Question is flawed because you can't distinguish answers of "never happen" from people who just didn't see it.

Interesting question that would be fun to take a look at in comparison to the estimates for the singularity.

EndOfWork

Do you think the "end of work" would be a good thing?

Yes: 1238 (81.287%)

No: 285 (18.713%)

Fairly overwhelming consensus, but with a significant minority of people who have a dissenting opinion.

EndOfWorkConcerns

If machines end all or almost all employment, what are your biggest worries? Pick two.

Question Count Percent
People will just idle about in destructive ways 513 16.71%
People need work to be fulfilled and if we eliminate work we'll all feel deep existential angst 543 17.687%
The rich are going to take all the resources for themselves and leave the rest of us to starve or live in poverty 1066 34.723%
The machines won't need us, and we'll starve to death or be otherwise liquidated 416 13.55%
Question is flawed because it demanded the user 'pick two' instead of up to two.

The plurality of worries are about elites who refuse to share their wealth.

Existential Risk

XRiskType

Which disaster do you think is most likely to wipe out greater than 90% of humanity before the year 2100?

Nuclear war: +4.800% 326 (20.6%)

Asteroid strike: -0.200% 64 (4.1%)

Unfriendly AI: +1.000% 271 (17.2%)

Nanotech / grey goo: -2.000% 18 (1.1%)

Pandemic (natural): +0.100% 120 (7.6%)

Pandemic (bioengineered): +1.900% 355 (22.5%)

Environmental collapse (including global warming): +1.500% 252 (16.0%)

Economic / political collapse: -1.400% 136 (8.6%)

Other: 35 (2.217%)

Significantly more people worried about Nuclear War than last year. Effect of new respondents, or geopolitical situation? Who knows.

Charity And Effective Altruism

Charitable Giving

Income

What is your approximate annual income in US dollars (non-Americans: convert at www.xe.com)? Obviously you don't need to answer this question if you don't want to. Please don't include commas or dollar signs.

Sum: 66054140.47384

Mean: 64569.052271593355

Median: 40000.0

Mode: 30000.0

Stdev: 107297.53606321265

IncomeCharityPortion

How much money, in number of dollars, have you donated to charity over the past year? (non-Americans: convert to dollars at http://www.xe.com/ ). Please don't include commas or dollar signs in your answer. For example, 4000

Sum: 2389900.6530000004

Mean: 2914.5129914634144

Median: 353.0

Mode: 100.0

Stdev: 9471.962766896671

XriskCharity

How much money have you donated to charities aiming to reduce existential risk (other than MIRI/CFAR) in the past year?

Sum: 169300.89

Mean: 1991.7751764705883

Median: 200.0

Mode: 100.0

Stdev: 9219.941506342007

CharityDonations

How much have you donated in US dollars to the following charities in the past year? (Non-americans: convert to dollars at http://www.xe.com/) Please don't include commas or dollar signs in your answer. Options starting with "any" aren't the name of a charity but a category of charity.

Question Sum Mean Median Mode Stdev
Against Malaria Foundation 483935.027 1905.256 300.0 None 7216.020
Schistosomiasis Control Initiative 47908.0 840.491 200.0 1000.0 1618.785
Deworm the World Initiative 28820.0 565.098 150.0 500.0 1432.712
GiveDirectly 154410.177 1429.723 450.0 50.0 3472.082
Any kind of animal rights charity 83130.47 1093.821 154.235 500.0 2313.493
Any kind of bug rights charity 1083.0 270.75 157.5 None 353.396
Machine Intelligence Research Institute 141792.5 1417.925 100.0 100.0 5370.485
Any charity combating nuclear existential risk 491.0 81.833 75.0 100.0 68.060
Any charity combating global warming 13012.0 245.509 100.0 10.0 365.542
Center For Applied Rationality 127101.0 3177.525 150.0 100.0 12969.096
Strategies for Engineered Negligible Senescence Research Foundation 9429.0 554.647 100.0 20.0 1156.431
Wikipedia 12765.5 53.189 20.0 10.0 126.444
Internet Archive 2975.04 80.406 30.0 50.0 173.791
Any campaign for political office 38443.99 366.133 50.0 50.0 1374.305
Other 564890.46 1661.442 200.0 100.0 4670.805
"Bug Rights" charity was supposed to be a troll fakeout but apparently...

This table is interesting given the recent debates about how much money certain causes are 'taking up' in Effective Altruism.

Effective Altruism

Vegetarian

Do you follow any dietary restrictions related to animal products?

Yes, I am vegan: 54 (3.4%)

Yes, I am vegetarian: 158 (10.0%)

Yes, I restrict meat some other way (pescetarian, flexitarian, try to only eat ethically sourced meat): 375 (23.7%)

No: 996 (62.9%)

EAKnowledge

Do you know what Effective Altruism is?

Yes: 1562 (89.3%)

No but I've heard of it: 114 (6.5%)

No: 74 (4.2%)

EAIdentity

Do you self-identify as an Effective Altruist?

Yes: 665 (39.233%)

No: 1030 (60.767%)

The distribution given by the 2014 survey results does not sum to one, so it's difficult to determine if Effective Altruism's membership actually went up or not but if we take the numbers at face value it experienced an 11.13% increase in membership.

EACommunity

Do you participate in the Effective Altruism community?

Yes: 314 (18.427%)

No: 1390 (81.573%)

Same issue as last, taking the numbers at face value community participation went up by 5.727%

EADonations

Has Effective Altruism caused you to make donations you otherwise wouldn't?

Yes: 666 (39.269%)

No: 1030 (60.731%)

Wowza!

Effective Altruist Anxiety

EAAnxiety

Have you ever had any kind of moral anxiety over Effective Altruism?

Yes: 501 (29.6%)

Yes but only because I worry about everything: 184 (10.9%)

No: 1008 (59.5%)


There's an ongoing debate in Effective Altruism about what kind of rhetorical strategy is best for getting people on board and whether Effective Altruism is causing people significant moral anxiety.

It certainly appears to be. But is moral anxiety effective? Let's look:

Sample Size: 244
Average amount of money donated by people anxious about EA who aren't EAs: 257.5409836065574

Sample Size: 679
Average amount of money donated by people who aren't anxious about EA who aren't EAs: 479.7501384388807

Sample Size: 249 Average amount of money donated by EAs anxious about EA: 1841.5292369477913

Sample Size: 314
Average amount of money donated by EAs not anxious about EA: 1837.8248407643312

It seems fairly conclusive that anxiety is not a good way to get people to donate more than they already are, but is it a good way to get people to become Effective Altruists?

Sample Size: 1685
P(Effective Altruist): 0.3940652818991098
P(EA Anxiety): 0.29554896142433235
P(Effective Altruist | EA Anxiety): 0.5

Maybe. There is of course an argument to be made that sufficient good done by causing people anxiety outweighs feeding into peoples scrupulosity, but it can be discussed after I get through explaining it on the phone to wealthy PR-conscious donors and telling the local all-kill shelter where I want my shipment of dead kittens.

EAOpinion

What's your overall opinion of Effective Altruism?

Positive: 809 (47.6%)

Mostly Positive: 535 (31.5%)

No strong opinion: 258 (15.2%)

Mostly Negative: 75 (4.4%)

Negative: 24 (1.4%)

EA appears to be doing a pretty good job of getting people to like them.

Interesting Tables

Charity Donations By Political Affilation
Affiliation Income Charity Contributions % Income Donated To Charity Total Survey Charity % Sample Size
Anarchist 1677900.0 72386.0 4.314% 3.004% 50
Communist 298700.0 19190.0 6.425% 0.796% 13
Conservative 1963000.04 62945.04 3.207% 2.612% 38
Futarchist 1497494.1099999999 166254.0 11.102% 6.899% 31
Left-Libertarian 9681635.613839999 416084.0 4.298% 17.266% 245
Libertarian 11698523.0 214101.0 1.83% 8.885% 190
Moderate 3225475.0 90518.0 2.806% 3.756% 67
Neoreactionary 1383976.0 30890.0 2.232% 1.282% 28
Objectivist 399000.0 1310.0 0.328% 0.054% 10
Other 3150618.0 85272.0 2.707% 3.539% 132
Pragmatist 5087007.609999999 266836.0 5.245% 11.073% 131
Progressive 8455500.440000001 368742.78 4.361% 15.302% 217
Social Democrat 8000266.54 218052.5 2.726% 9.049% 237
Socialist 2621693.66 78484.0 2.994% 3.257% 126


Number Of Effective Altruists In The Diaspora Communities
Community Count % In Community Sample Size
LessWrong 136 38.418% 354
LessWrong Meetups 109 50.463% 216
LessWrong Facebook Group 83 48.256% 172
LessWrong Slack 22 39.286% 56
SlateStarCodex 343 40.98% 837
Rationalist Tumblr 175 49.716% 352
Rationalist Facebook 89 58.94% 151
Rationalist Twitter 24 40.0% 60
Effective Altruism Hub 86 86.869% 99
Good Judgement(TM) Open 23 74.194% 31
PredictionBook 31 51.667% 60
Hacker News 91 35.968% 253
#lesswrong on freenode 19 24.675% 77
#slatestarcodex on freenode 9 24.324% 37
#chapelperilous on freenode 2 18.182% 11
/r/rational 117 42.545% 275
/r/HPMOR 110 47.414% 232
/r/SlateStarCodex 93 37.959% 245
One or more private 'rationalist' groups 91 47.15% 193


Effective Altruist Donations By Political Affiliation
Affiliation EA Income EA Charity Sample Size
Anarchist 761000.0 57500.0 18
Futarchist 559850.0 114830.0 15
Left-Libertarian 5332856.0 361975.0 112
Libertarian 2725390.0 114732.0 53
Moderate 583247.0 56495.0 22
Other 1428978.0 69950.0 49
Pragmatist 1442211.0 43780.0 43
Progressive 4004097.0 304337.78 107
Social Democrat 3423487.45 149199.0 93
Socialist 678360.0 34751.0 41

A note about calibration of confidence

12 jbay 04 January 2016 06:57AM

Background

In a recent Slate Star Codex Post (http://slatestarcodex.com/2016/01/02/2015-predictions-calibration-results/), Scott Alexander made a number of predictions and presented associated confidence levels, and then at the end of the year, scored his predictions in order to determine how well-calibrated he is. In the comments, however, there arose a controversy over how to deal with 50% confidence predictions. As an example, Scott has these predictions at 50% confidence, among his others:

Proposition

Scott's Prior

Result

A

Jeb Bush will be the top-polling Republican candidate

P(A) = 50%

A is False

B

Oil will end the year greater than $60 a barrel

P(B) = 50%

B is False

C

Scott will not get any new girlfriends

P(C) = 50%

C is False

D

At least one SSC post in the second half of 2015 will get > 100,000 hits: 70%

P(D) = 70%

D is False

E

Ebola will kill fewer people in second half of 2015 than the in first half

P(E) = 95%

E is True

 

Scott goes on to score himself as having made 0/3 correct predictions at the 50% confidence interval, which looks like significant overconfidence. He addresses this by noting that with only 3 data points it’s not much data to go by, and could easily have been correct if any of those results had turned out differently. His resulting calibration curve is this:

Scott Alexander's 2015 calibration curve

 

However, the commenters had other objections about the anomaly at 50%. After all, P(A) = 50% implies P(~A) = 50%, so the choice of “I will not get any new girlfriends: 50% confidence”  is logically equivalent to “I will get at least 1 new girlfriend: 50% confidence”, except that one results as true and the other false. Therefore, the question seems sensitive only to the particular phrasing chosen, independent of the outcome.

One commenter suggests that close to perfect calibration at 50% confidence can be achieved by choosing whether to represent propositions as positive or negative statements by flipping a fair coin. Another suggests replacing 50% confidence with 50.1% or some other number arbitrarily close to 50%, but not equal to it. Others suggest getting rid of the 50% confidence bin altogether.

Scott recognizes that predicting A and predicting ~A are logically equivalent, and choosing to use one or the other is arbitrary. But by choosing to only include A in his data set rather than ~A, he creates a problem that occurs when P(A) = 50%, where the arbitrary choice of making a prediction phrased as ~A would have changed the calibration results despite being the same prediction.

Symmetry

This conundrum illustrates an important point about these calibration exercises. Scott chooses all of his propositions to be in the form of statements to which he assigns greater or equal to 50% probability, by convention, recognizing that he doesn’t need to also do a calibration of probabilities less than 50%, as the upper-half of the calibration curve captures all the relevant information about his calibration.

This is because the calibration curve has a property of symmetry about the 50% mark, as implied by the mathematical relation P(X) = 1- P(~X) and of course P(~X) = 1 –P(X).

We can enforce that symmetry by recognizing that when we make the claim that proposition X has probability P(X), we are also simultaneously making the claim that proposition ~X has probability 1-P(X). So we add those to the list of predictions and do the bookkeeping on them too. Since we are making both claims, why not be clear about it in our bookkeeping?

When we do this, we get the full calibration curve, and the confusion about what to do about 50% probability disappears. Scott’s list of predictions looks like this:

Proposition

Scott's Prior

Result

A

Jeb Bush will be the top-polling Republican candidate

P(A) = 50%

A is False

~A

Jeb Bush will not be the top-polling Republican candidate

P(~A) = 50%

~A is True

B

Oil will end the year greater than $60 a barrel

P(B) = 50%

B is False

~B

Oil will not end the year greater than $60 a barrel

P(~B) = 50%

~B is True

C

Scott will not get any new girlfriends

P(C) = 50%

C is False

~C

Scott will get new girlfriend(s)

P(~C) = 50%

~C is True

D

At least one SSC post in the second half of 2015 will get > 100,000 hits: 70%

P(D) = 70%

D is False

~D

No SSC post in the second half of 2015 will get > 100,000 hits

P(~D) = 30%

~D is True

E

Ebola will kill fewer people in second half of 2015 than the in first half

P(E) = 95%

E is True

~E

Ebola will kill as many or more people in second half of 2015 than the in first half

P(~E) = 05%

~E is False

 

You will by now have noticed that there will always be an even number of predictions, and that half of the predictions always are true and half are always false. In most cases, like with E and ~E, that means you get a 95% likely prediction that is true and a 5%-likely prediction that is false, which is what you would expect. However, with 50%-likely predictions, they are always accompanied by another 50% prediction, one of which is true and one of which is false. As a result, it is actually not possible to make a binary prediction at 50% confidence that is out of calibration.

The resulting calibration curve, applied to Scott’s predictions, looks like this:

no error bars


Sensitivity

By the way, this graph doesn’t tell the whole calibration story; as Scott noted it’s still sensitive to how many predictions were made in each bucket. We can add “error bars” that show what would have resulted if Scott had made one more prediction in each bucket, and whether the result of that prediction had been true or false. The result is the following graph:

with error bars

Note that the error bars are zero about the point of 0.5. That’s because even if one additional prediction had been added to that bucket, it would have had no effect. That point is fixed by the inherent symmetry.

I believe that this kind of graph does a better job of showing someone’s true calibration. But it's not the whole story.

Ramifications for scoring calibration (updated)

Clearly, it is not possible to make a binary prediction with 50% confidence that is poorly calibrated. This shouldn’t come as a surprise; a prediction at 50% between two choices represents the correct prior for the case where you have no information that discriminates between X and ~X. However, that doesn’t mean that you can improve your ability to make correct predictions just by giving them all 50% confidence and claiming impeccable calibration! An easy way to "cheat" your way into apparently good calibration is to take a large number of predictions that you are highly (>99%) confident about, negate a fraction of them, and falsely record a lower confidence for those. If we're going to measure calibration, we need a scoring method that will encourage people to write down the true probabilities they believe, rather than faking low confidence and ignoring their data. We want people to only claim 50% confidence when they genuinely have 50% confidence, and we need to make sure our scoring method encourages that.

 

A first guess would be to look at that graph and do the classic assessment of fit: sum of squared errors. We can sum the squared error of our predictions against the ideal linear calibration curve. If we did this, we would want to make sure we summed all the individual predictions, rather than the averages of the bins, so that the binning process itself doesn’t bias our score.

If we do this, then our overall prediction score can be summarized by one number:

S = \frac{1}{N}\left(\sum_{i=1}^{N}(P(X_i)-X_i)^2 \right )

Here P(Xi) is the assigned confidence of the truth of Xi, and Xi is the ith proposition and has a value of 1 if it is True and 0 if it is False. S is the prediction score, and lower is better. Note that because these are binary predictions, the sum of squared errors gives an optimal score if you assign the probabilities you actually believe (ie, there is no way to "cheat" your way to a better score by giving false confidence).

In this case, Scott's score is S=0.139, much of this comes from the 0.4/0.6 bracket. The worst score possible would be S=1, and the best score possible is S=0. Attempting to fake a perfect calibration by everything by claiming 50% confidence for every prediction, regardless of the information you actually have available, yields S=0.25 and therefore isn't a particularly good strategy (at least, it won't make you look better-calibrated than Scott).

Several of the commenters pointed out that log scoring is another scoring rule that works better in the general case. Before posting this I ran the calculus to confirm that the least-squares error did encourage an optimal strategy of honest reporting of confidence, but I did have a feeling that it was an ad-hoc scoring rule and that there must be better ones out there.

The logarithmic scoring rule looks like this:

S = \frac{1}{N}\sum_{i=1}^{N}X_i\ln(P(X_i))

Here again Xi is the ith proposition and has a value of 1 if it is True and 0 if it is False. The base of the logarithm is arbitrary so I've chosen base "e" as it makes it easier to take derivatives. This scoring method gives a negative number and the closer to zero the better. The log scoring rule has the same honesty-encouraging properties as the sum-of-squared-errors, plus the additional nice property that it penalizes wrong predictions of 100% or 0% confidence with an appropriate score of minus-infinity. When you claim 100% confidence and are wrong, you are infinitely wrong. Don't claim 100% confidence!

In this case, Scott's score is calculated to be S=-0.42. For reference, the worst possible score would be minus-infinity, and claiming nothing but 50% confidence for every prediction results in a score of S=-0.69. This just goes to show that you can't win by cheating.

Example: Pretend underconfidence to fake good calibration

In an attempt to appear like I have better calibration than Scott Alexander, I am going to make the following predictions. For clarity I have included the inverse propositions in the list (as those are also predictions that I am making), but at the end of the list so you can see the point I am getting at a bit better.

Proposition

Quoted Prior

Result

A

I will not win the lottery on Monday

P(A) = 50%

A is True

B

I will not win the lottery on Tuesday

P(B) = 66%

B is True

C

I will not win the lottery on Wednesday

P(C) = 66%

C is True

D

I will win the lottery on Thursday

P(D) =66%

D is False

E

I will not win the lottery on Friday

P(E) = 75%

E is True

F

I will not win the lottery on Saturday

P(F) = 75%

F is True

G

I will not win the lottery on Sunday

P(G) = 75%

G is True

H

I will win the lottery next Monday

P(H) = 75%

H is False

 

 

 

~A

I will win the lottery on Monday

P(~A) = 50%

~A is False

~B

I will win the lottery on Tuesday

P(~B) = 34%

~B is False

~C

I will win the lottery on Wednesday

P(~C) = 34%

~C is False

 

 

 

Look carefully at this table. I've thrown in a particular mix of predictions that I will or will not win the lottery on certain days, in order to use my extreme certainty about the result to generate a particular mix of correct and incorrect predictions.

To make things even easier for me, I’m not even planning to buy any lottery tickets. Knowing this information, an honest estimate of the odds of me winning the lottery are astronomically small. The odds of winning the lottery are about  1 in 14 million (for the Canadian 6/49 lottery). I’d have to win by accident (one of my relatives buying me a lottery ticket?). Not only that, but since the lottery is only held on Wednesday and Saturday, that makes most of these scenarios even more implausible since the lottery corporation would have to hold the draw by mistake.

I am confident I could make at least 1 billion similar statements of this exact nature and get them all right, so my true confidence must be upwards of (100% - 0.0000001%).

If I assemble 50 of these types of strategically-underconfident predictions (and their 50 opposites) and plot them on a graph, here’s what I get:

 Looks like good calibration...? Not so fast.

You can see that the problem with cheating doesn’t occur only at 50%. It can occur anywhere!

But here’s the trick: The log scoring algorithm rates me -0.37. If I had made the same 100 predictions all at my true confidence (99.9999999%), then my score would have been -0.000000001. A much better score! My attempt to cheat in order to make a pretty graph has only sabotaged my score.

By the way, what if I had gotten one of those wrong, and actually won the lottery one of those times without even buying a ticket? In that case my score is -0.41 (the wrong prediction had a probability of 1 in 10^9 which is about 1 in e^21, so it’s worth -21 points, but then that averages down to -0.41 due to the 49 correct predictions that are collectively worth a negligible fraction of a point).* Not terrible! The log scoring rule is pretty gentle about being very badly wrong sometimes, just as long as you aren’t infinitely wrong. However, if I had been a little less confident and said the chance of winning each time was only 1 in a million, rather than 1 in a billion, my score would have improved to -0.28, and if I had expressed only 98% confidence I would have scored -0.098, the best possible score for someone who is wrong one in every fifty times.

This has another important ramification: If you're going to honestly test your calibration, you shouldn't pick the predictions you'll make. It is easy to improve your score by throwing in a couple predictions that you are very certain about, like that you won't win the lottery, and by making few predictions that you are genuinely uncertain about. It is fairer to use a list of propositions that is generated by somebody else, and then pick your probabilities. Scott demonstrates his honesty by making public predictions about a mix of things he was genuinely uncertain about, but if he wanted to cook his way to a better score in the future, he would avoid making any predictions at the 50% category that he wasn't forced to.

 

Input and comments are welcome! Let me know what you think!

* This result surprises me enough that I would appreciate if someone in the comments can double-check it on their own. What is the proper score for being right 49 times with 1-1 in a billion certainty, but wrong once?

The mystery of Brahms

5 PhilGoetz 21 October 2015 05:12AM

I'm interested in how people form valuations of the opinions of others. One domain to study is art. We have a long historic record of how the elite arbiters of taste have decided what artists and what artworks were great.

This is more relevant to 21st century American thought than many of you probably think. The defaults we assume, the stories that are told on television and in our movies, the things taught in our colleges, were partly determined by assertions made by continental philosophers and psychologists of the 18th through 20th centuries, most of which they just made up. [1]

The process by which philosophers eventually get their views accepted into the Western canon looks the same to me as the process by which musicians or painters are accepted into or cast out of the Western canon. Neither has much to do with the quality of the product.

continue reading »

Vegetarianism Ideological Turing Test Results

21 Raelifin 14 October 2015 12:34AM

Back in August I ran a Caplan Test (or more commonly an "Ideological Turing Test") both on Less Wrong and in my local rationality meetup. The topic was diet, specifically: Vegetarian or Omnivore?

If you're not familiar with Caplan Tests, I suggest reading Palladias' post on the subject or reading Wikipedia. The test I ran was pretty standard; thirteen blurbs were presented to the judges, selected by the toss of a coin to either be from a vegetarian or from an omnivore, and also randomly selected to be genuine or an impostor trying to pass themselves off as the alternative. My main contribution, which I haven't seen in previous tests, was using credence/probability instead of a simple "I think they're X".

I originally chose vegetarianism because I felt like it's an issue which splits our community (and particularly my local community) pretty well. A third of test participants were vegetarians, and according to the 2014 census, only 56% of LWers identify as omnivores.

Before you see the results of the test, please take a moment to say aloud how well you think you can do at predicting whether someone participating in the test was genuine or a fake.

.

.

.

.

.

.

.

.

.

.

.

.

.

If you think you can do better than chance you're probably fooling yourself. If you think you can do significantly better than chance you're almost certainly wrong. Here are some statistics to back that claim up.

I got 53 people to judge the test. 43 were from LessWrong, and 10 were from my local group. Averaging across the entire group, 51.1% of judgments were correct. If my Chi^2 math is correct, the p-value for the null hypothesis is 57% on this data. (Note that this includes people who judged an entry as 50%. If we don't include those folks the success rate drops to 49.4%.)

In retrospect, this seemed rather obvious to me. Vegetarians aren't significantly different from omnivores. Unlike a religion or a political party there aren't many cultural centerpieces to diet. Vegetarian judges did no better than omnivore judges, even when judging vegetarian entries. In other words, in this instance the minority doesn't possess any special powers for detecting other members of the in-group. This test shows null results; the thing that distinguishes vegetarians from omnivores is not familiarity with the other sides' arguments or culture, at least not to the degree that we can distinguish at a glance.

More interesting, in my opinion, than the null results were the results I got on the calibration of the judges. Back when I asked you to say aloud how good you'd be, what did you say? Did the last three paragraphs seem obvious? Would it surprise you to learn that not a single one of the 53 judges held their guesses to a confidence band of 40%-60%? In other words, every single judge thought themselves decently able to discern genuine writing from fakery. The numbers suggest that every single judge was wrong.

(The flip-side to this is, of course, that every entrant to the test won! Congratulations rationalists: signs point to you being able to pass as vegetarians/omnivores when you try, even if you're not in that category. The average credibility of an impostor entry was 59%, while the average credibility of a genuine response was 55%. No impostors got an average credibility below 49%.)

Using the logarithmic scoring rule for the calibration game we can measure the error of the community. The average judge got a score of -543. For comparison, a judge that answered 50% ("I don't know") to all questions would've gotten a score of 0. Only eight judges got a positive score, and only one had a score higher than 100 (consistent with random chance). This is actually one area where Less Wrong should feel good. We're not at all calibrated... but for this test at least, the judges from the website were much better calibrated than my local community (who mostly just lurk). If we separate the two groups we see that the average score for my community was -949, while LW had an average of -448. Given that I restricted the choices to multiples of 10, a random selection of credences gives an average score of -921.

In short, the LW community didn't prove to be any better at discerning fact from fiction, but it was significantly less overconfident. More de-biasing needs to be done, however! The next time you think of a probability to reflect your credence, ask yourself "Is this the sort of thing that anyone would know? Is this the sort of thing I would know?" That answer will probably be "no" a lot more than it feels like from the inside.

Full data (minus contact info) can be found here.

Those of you who submitted a piece of writing that I used, or who judged the test and left their contact information: I will be sending out personal scores very soon (probably by this weekend). Deep apologies regarding the delay on this post. I had a vacation in late August and it threw off my attention to this project.

EDIT: Here's a histogram of the identification accuracy. 

Histogram

 

EDIT 2: For reference, here are the entries that were judged.

Predict - "Log your predictions" app

13 Gust 17 August 2015 04:20PM

As an exercise on programming Android, I've made an app to log predictions you make and keep score of your results. Like PredictionBook, but taking more of a personal daily exercise feel, in line with this post.

The "statistics" right now are only a score I copied from the old Credence calibration game, and a calibration bar chart.

I'm hoping for suggestionss for features and criticism on the app design.

Here's the link for the apk (v0.4), and here's the source code repository. You can download it at Google Play Store.

Pending/Possible/Requested Features:

  • Set check-in dates for predictions
  • Tags (and stats by tag)
  • Stats by timeframe
  • Beeminder integration
  • Trivia questions you can answer if you don't have any personal prediction to make
  • Ring pie chart to choose probability

Edit:

2015-08-26 - Fixed bug that broke on Android 5.0.2 (thanks Bobertron)

2015-08-28 - Change layout for landscape mode, and add a better icon

2015-08-31 -

  • Daily notifications
  • Buttons at the expanded-item-layout (ht dutchie)
  • Show points won/lost in the snackbar when a prediction is answered
  • Translation to portuguese

 

Mental Calibration for Bayesian Updates?

1 [deleted] 13 August 2015 04:46AM

Hey all,

After reading "How to Measure Anything" I've experimented a bit with calibration training and using his calibration tools, and after being convinced by his data on the usefulness of calibration in forecasting for the real world, have seen a big update in my own calibration.

I'm wondering if anybody knows of similar tools and studies on calibration of Bayesian updating.  Broadly,I imagine it would look like:

1. Using the tools and calibration methods I already use to figure out how the feeling of "correctness" of my prior correlates to a numerical value.

2. Using similar (but probably not identical) tools to figure out how "convincing" the new data feels correlates to specific numbers.

3. Calibrating these two numbers to bayes theorom, such that I know approximately how much to update the original feeling to reflect the new information

4. Using mmenomic or visualization techniques to pair the new feeling with the belief, so that next time I remembered the belief, I'd feel the slightly different calibration.

Anyways, I'm curious if anyone has experimented with these processes, if there's any research on it, or it has been previously experimented with on lesswrong. I'd definitely like to lock down a similar procedure for myself.

I should note that many times, I already do this naturally... but my guess is I systematically over and under update the feeling based on confirmation bias.  I'd like to recalibrate my recalibration :).

A question and a tail

2 Romashka 06 July 2015 09:34AM

This is a rambling post, and I will appreciate your criticism to help dry it or delete it altogether.

It seems that however little a question I research by reviewing [botanical] literature, there is always a much more complex, and rather difficult to rigorously put, question that I have to ask for the first one to be meaningful. The second answer (or tier of answers) doesn't add much to the information I will build upon, but it might - just might! - add uncertainty to the result or allow predictions in advance. How do we use it in advance? We don't apply formal reasoning, usually, and yet somehow we use it!

1.

Consider: a certain invasive plant has a host of adaptations beneficial to its success. (They probably wouldn't be sufficient if there were some actual effort to manage manmade ecosystems, but duh.) A trait many IP share is the ability to increase their ploidy - from 2 to 3, 4, 6, 8 or even 10 sets of homologous chromosomes, etc. (Polyploidization sometimes happens even in single cells in somatic (= non-reproductive) tissues, so it's really a heavily used shortcut.)

Now, suppose I want to see how a different specific property of the species behaves abroad. I will have to check the ploidy level, of course! Quick, what does the literature say, how many chromosomes can it have?

...but wait. Make no mistake, I do have to count them; but what if there is a continent-wide study showing that it generally has 4n in Eastern Europe?.. That would allow me to at least expect 4n, or whatever amount they found, and see if there is any research specifically dealing with this situation within its native range.

...but wait. Of course, those findings will be useful in discussion if I find 4n, but if I don't, they will be just a point in the overall space of possibilities. Still relevant, but not worth putting much explanatory weight on.

Something in my brain evaluated the usefulness of a piece of data other people have found, which I myself have yet to look up, of whose exact composition I have no idea - perhaps there are simply no other reports! - and placed it in context of what I really expect to do.


2.

Okay, if I can think so about other people's writings without even reading them, then maybe I can compile a dummy set of data I expect right now and compare them to those I will find in the literature. And later, to actual data. Here's a simplified problem that doesn't approach labwork on any scale (I don't want to add too many qualifiers).

Let us 'measure' 8 parameters, and check if there have been studies that have found correlations between at least some of them (and maybe with some other ones), and then try to see if our expectations based on knowledge of study area and casual surveys fit our expectations based on published research in any specific way. We are not ready to put forth any causal structure - no real data yet - though we strongly suspect (80%) that all the parameters are in some way linked to each other.

The following table is rough and repetitive, but I think useful as an illustration of how things brew in [my own] a not-much-clever student's head. The numbers are 'dimensionless', distributions are normal, total number of studies measuring each parameter is 7 or less, and all correlations are no less that 0.8.

 

Parameter

Total range

Our expected data ±SE

Reported data range*

Our imaginary correlations

Reported correlations

A

1-12

8±1

4-10

A&F, A&H

A&D, A&F, A doesn't correlate with anything if nothing else correlates with anything

B

1-5

2±1

1-4

B&C, B&E, B&G, B&H

B doesn't correlate with E if F&H

C

1-100

35±20

80±7 (only one other study)

C&B, C&F, C&H

Unknown

D

1-28

6±2

2-18

D&F

D&G (and then E&F)

E

1-500

200±46

150-480

E&B, E&G

E&F if D&G

F

1-50

47±8

8-45

F&A, F&C, F&D, F&H if A&H

F&A, F&H (and then B doesn't correlate with E)

G

1-25

18±2

11-20

G&B, G&E

G&D (and then E&F)

H

1-40

23±10

1-40

H&A, H&B, H&C, H&F (and then H&A)

H&F (and then B doesn't correlate with E)

*as in, 'for this species, out of 1-12 that are altogerther possible, only 4-10 have been so far observed. It might mean that 4-10 is the actual range, but the prior for that is about 60% due to difference in methodologies used by various researchers and to the fact that only a part of the species's habitats have been studied' etc.


Now I understand that this is hardly the most profitable presentation method and statistics has advanced much since Pearson and eveything. It is just that I find it difficult to compare graphs with diagrams with clouds along axes as they are published in different papers. I only want to guesstimate if my data fit a pattern, to discuss them qualitatively. To stratify the parameters in such a way that I will place explanative weight on some of them, and report the others to give a full picture. I have to do this explicitly, because I know I am doing this implicitly – it's a feeling I get, of brain working and deciding and not showing me what it has.

I cannot speak about A, only that maybe A, H and F do have something in common – perhaps I haven't measured it. B looks rather suspicious; I will need to reread that other report. C is intriguing, but ultimately belongs to the 'lower value stratum', and maybe those correlations I found are spurious; if only there was a way to reduce the variability... but it won't be cost-efficient. E, F, D and G also might be worth discussing together. F by itself doesn't seem very meaningful, unless there is a causal connection to the others; too bad one can imagine many plausible explanations for that. I will probably start discussion with H, since it probably has been studied for other plants and at least something has already been proposed.

Now when I have my own data I will see where they deviate from my expectations, and that will be some knowledge I can put into words, and I will hopefully start calibrating myself on these matters. And on matters of Discussion structuring:)

PredictIt, a prediction market out of New Zealand, now in beta.

15 Jayson_Virissimo 16 March 2015 02:02AM

From their website:

PredictIt is an exciting new, real money site that tests your knowledge of political and financial events by letting you make and trade predictions on the future.

Taking part in PredictIt is simple and easy. Pick an event you know something about and see what other traders believe is the likelihood it will happen. Do you think they have it right? Or do you think you have the knowledge to beat the wisdom of the crowd?

The key to success at PredictIt is timing. Make your predictions when most people disagree with you and the price is low. When it turns out that your view may be right, the value of your predictions will rise. You’ll need to choose the best time to sell!

Keep in mind that, although the stakes are limited, PredictIt involves real money so the consequences of being wrong can be painful. Of course, winning can also be extra sweet.

For detailed instructions on participating in PredictIt, How It Works.

PredictIt is an educational purpose project of Victoria University, Wellington of New Zealand, a not-for-profit university, with support provided by Aristotle International, Inc., a U.S. provider of processing and verification services. Prediction markets, like this one, are attracting a lot of academic and practical interest (see our Research section). So, you get to challenge yourself and also help the experts better understand the wisdom of the crowd.

How to calibrate your political beliefs

2 Macaulay 12 May 2013 08:09PM

So you're playing the credence game, and you’re getting a pretty good sense of which level of confidence to assign to your beliefs. Later, when you’re discussing politics, you wonder how you can calibrate your political beliefs as well (beliefs of the form "policy X will result in outcome Y"). Here there's no easy way to assess whether a belief is true or false, in contrast to the trivia questions in the credence game. Moreover, it’s very easy to become mindkilled by politics. What do you do?

In the credence game, you get direct feedback that allows you to learn about your internal proxies for credence, i.e., emotional and heuristic cues about how much to trust yourself. With political beliefs, however, there is no such feedback. One workaround would be to assign high confidence only to beliefs for which you have read n academic papers on the subject. For example, only assign 90% confidence if you've read ten academic papers.

To account for mindkilling, use a second criterion: assign high confidence only to beliefs for which you are ideologically Turing-capable (i.e., able to pass an ideological Turing test). As a proxy for an actual ideological Turing test, you should be able to accurately restate your opponent’s position, or be able to state the strongest counterargument to your position.

In sum, to calibrate your political beliefs, only assign high confidence to beliefs which satisfy extremely demanding epistemic standards.

What information has surprised you most recently?

11 FiftyTwo 09 December 2012 04:43AM

Information that surprises you is interesting as it exposes where you have been miscalibrated, and allows you to correct for that. 

I suspect the users of LessWrong have fairly similar beliefs, so it is probable that information that has surprised you would surprise others here, so it would be useful for them if you shared them. 

Example: In a discussion with a friend recently I realised I had massively miscalibrated on the percentage of the UK population who shared my beliefs on certain subjects, in general the population was far more conservative than I had expected.

In retrospect I was assuming my own personal experience was more representative than it was, even when attempting to correct for that. 

Credence calibration game FAQ

13 Academian 26 November 2012 12:52AM

Hey rationality friends, I just made this FAQ for the credence calibration game.  So if you have people you'd like to introduce to it --- for example, to get them used to thinking of belief strengths as probabilities --- now is a good time :)

Also, shameless promotion: please tweet/g+/like it; I want the world to be thinking in probabilities ASAP!

*Also*, please email me (critch@math.berkeley.edu) if you're good at making apps quickly and are interested in improving the game or making a variant of it; I'm swamped in job applications right now, but could easily have a Skype or phone conversation about our cache of ideas for improvements / variations (e.g. collecting user data on a server, more question types, a variant awarding gambles rather than deterministic scores, a variant with clickable emotion buttons for the user...).

Cheers!

Needed: A large database of statements for true/false exercises

3 Academian 13 April 2012 02:26AM

Does anybody know where to find a large database of statements that are roughly 50% likely to be true or false?  These would be used for confidence calibration / Bayesian updating exercises for CMR/HRP.

One way to make such a database would be to buy a bunch of trivia games with True/False questions, and type each statement and its negation into a computer.  A problem with this might be that trivia questions are selected to have surprising/counterintuitive truth values; I'm not sure if that's true.  I'd be happy to acquire an already-made database of this form, but ideally I'd like statements that are "more neutral" in terms of how counterintuitive they are.

Any thoughts on where we might find a database like this to use/buy?

Thanks for any help!

Revision: We actually want a database of two-choice answer questions. This way, the player won't get trained on a base rate of 50% of statements in the world being true... they'll just get trained that when there are two possible answers, one is always true.  In the end, the database should look something like this (warning: I made up the "correct" answers):

Question: "Which is diagnosed more often in America (2011)?"; 
Answers: (a) "the cold", (b) allergies"; 
Correct Answer: (a); 
Tags: {medical}

Question: "Which city has a higher average altitude?"; 
Answers: (a) "Chicago", (b) "Las Vegas"; 
Correct Answer: (a)
Tags: {geography}

Question: "Who sold more albums while living"?; 
Answers: (a) "Michael Jackson", (b) "Elvis Presley"; 
Correct Answer: (b)
Tags: {history, pop-culture, music}

Question: "Was the price of IBM stock higher or lower at the start of the month after the Berlin wall fell, compared with the start of the previous month?"; 
Answers: (a) "higher", (b) "lower"; 
Correct Answer: (a)
Tags: {history, finance}

 

 

Harry Potter and the Methods of Rationality predictions

6 gwern 09 April 2012 09:49PM

The recent spate of updates has reminded me that while each chapter is enjoyable, the approaching end of MoR, as awesome as it no doubt will be, also means the end of our ability to learn from predicting the truth of the MoR-verse and its future.

With that in mind, I have compiled a page of predictions on sundry topics, much like my other page on predictions for Neon Genesis Evangelion; I encourage people to suggest plausible predictions that I've omitted, register their probabilities on PredictionBook.com, and come up with their own predictions. Then we can all look back when MoR finishes and reflect on what we (or Eliezer) did poorly or well.  

The page is currently up to >182 predictions.

What does your accuracy tell you about your confidence interval?

5 HonoreDB 02 November 2011 07:21PM

Yvain's 2011 Less Wrong Census/Survey is still ongoing throughout November, 2011.  If you haven't taken it, please do before reading on, or at least write down your answers to the calibration questions so they won't get skewed by the following discussion.

continue reading »

Naming the Highest Virtue of Epistemic Rationality

-3 potato 24 October 2011 11:00PM

Edit: Looking back at this a few years later. It is pretty embarrassing, but I'm going to leave it up. 

Why don't we start treating the log2 of the probability — conditional on every available piece of information — you assign to the great conjunction, as the best measure of your epistemic success? Let's call:  log_2(P(the great conjunction|your available information)), your "Bayesian competence". It is a deductive fact that no other proper scoring rule could possibly give: Score(P(A|B)) + Score(P(B)) = Score(P(A&B)), and obviously, you should get the same score for assigning P(A|B) to A, after observing B, and assigning P(B) to B a priori, as you would get for assigning P(A&B) to A&B a priori. The great conjunction is the conjunction of all true statements expressible in your idiolect. Your available information may be treated as the ordered set of your retained stimulus.

If this doesn't make sense, or you aren't familiar with these ideas, checkout Technical Explanation after checking out Intuitive Explanation.

It is standard LW doctrine that we should not name the highest value of rationality, and it is often defended quite brilliantly:

You may try to name the highest principle with names such as “the map that reflects the territory” or “experience of success and failure” or “Bayesian decision theory”. But perhaps you describe incorrectly the nameless virtue. How will you discover your mistake? Not by comparing your description to itself, but by comparing it to that which you did not name.

and of course also:

How can you improve your conception of rationality? Not by saying to yourself, “It is my duty to be rational.” By this you only enshrine your mistaken conception. Perhaps your conception of rationality is that it is rational to believe the words of the Great Teacher, and the Great Teacher says, “The sky is green,” and you look up at the sky and see blue. If you think: “It may look like the sky is blue, but rationality is to believe the words of the Great Teacher,” you lose a chance to discover your mistake. 

These quotes are from the end of Twelve Virtues

Should we really be wondering if there's a virtue higher than bayesian competence? Is there really a probability worth worrying about that the description of bayesian competence above is misunderstood? Is the description not simple enough to be mathematical? What mistake might I discover in my understanding of bayesian competence by comparing it to that which I did not name, after I've already given a proof that bayesian competence is proper, and that the restrictions: score(P(B)*P(A|B)) = score(P(B)) + score(P(A|B)), and: must be a proper scoring rule, uniquely specify Logb?

I really want answers to these questions. I am still undecided about them; and change my mind about them far too often.

Of course, your bayesian competence is ridiculously difficult to compute. But I am not proposing the measure for practical reasons. I am proposing the measure to demonstrate that degree of rationality is an objective quantity that you could compute given the source code to the universe, even though there are likely no variables in the source that ever take on this value. This may be of little to no value to the most obsessively pragmatic practitioners of rationality. But it would be a very interesting result to philosophers of science and rationality.

 

 


 

Updated to better express view of author, and take feedback into account. Apologies to any commenter who's comment may have been nullified.

The comment below:

The general reason Eliezer advocates not naming the highest virtue (as I understand it) is that there may be some type of problem for which bayesian updating (and the scoring rule referred to) yields the wrong answer. This idea sounds rather improbable to me, but there is a non-negligible probability that bayes will yield a wrong answer on some question. Not naming the virtue is supposed to be a reminder that if bayes ever gives the wrong answer, we go with the right answer, not bayes.

has changed my mind about the openness of the questions I asked.

Link: Compare your moral values to the general population

9 lunchbox 28 November 2010 03:21AM

Jonathan Haidt, a professor at UVA, runs an online lab with quizzes that will compare your moral values to the rest of the population. I have found the test results useful for avoiding the typical mind fallacy. When someone disagrees with me on a belief/opinion I feel certain about, it's often difficult to tease apart how much of this disagreement stems from them not "getting it", and how much stems from them having a different fundamental value system. One of the tests alerted me that I am an outlier in certain aspects of how I judge morality (green = me; blue = liberals; red = conservatives):

Another benefit of these quizzes is that they can point out potential blind spots. For example, one quiz asks for opinions about punishment for crimes. If I discover I'm an outlier w.r.t. the population, I should reconsider whether my opinions are based on solid evidence (or did I see one study that found tit-for-tat punishment effective in a certain context, and take that as gospel?).

Extra reading: Haidt wrote a WSJ article last month that applied the learnings of these moral quizzes to better understanding the Tea Party.