Filter Last three months

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Recent updates to (2015-2016)

28 gwern 26 August 2016 07:22PM

Previously: 2011; 2012-2013; 2013-2014; 2014-2015

"When I was one-and-twenty / I heard a wise man say, / 'Give crowns and pounds and guineas / But not your heart away; / Give pearls away and rubies / But keep your fancy free.' / But I was one-and-twenty, / No use to talk to me."

My past year of completed writings, sorted by topic:


  • Embryo selection for intelligence cost-benefit analysis
    • meta-analysis of intelligence GCTAs, limits set by measurement error, current polygenic scores, possible gains with current IVF procedures, the benefits of selection on multiple complex traits, the possible annual value in the USA of selection & value of larger GWASes, societal consequences of various embryo selection scenarios, embryo count versus polygenic scores as limiting factors, comparison with iterated embryo selection, limits to total gains from iterated embryo selection etc.
  • Wikipedia article on Genome-wide complex trait analysis (GCTA)





Misc: itself has remained largely stable (some CSS fixes and image size changes); I continue to use Patreon and send out my newsletters.

Linkposts now live!

26 Vaniver 28 September 2016 03:13PM


You can now submit links to LW! As the rationality community has grown up, more and more content has moved off LW to other places, and so rather than trying to generate more content here we'll instead try to collect more content here. My hope is that Less Wrong becomes something like "the Rationalist RSS," where people can discover what's new and interesting without necessarily being plugged in to the various diaspora communities.

Some general norms, subject to change:


  1. It's okay to link someone else's work, unless they specifically ask you not to. It's also okay to link your own work; if you want to get LW karma for things you make off-site, drop a link here as soon as you publish it.
  2. It's okay to link old stuff, but let's try to keep it to less than 5 old posts a day. The first link that I made is to Yudkowsky's Guide to Writing Intelligent Characters.
  3. It's okay to link to something that you think rationalists will be interested in, even if it's not directly related to rationality. If it's political, think long and hard before deciding to submit that link.
  4. It's not okay to post duplicates.

As before, everything will go into discussion. Tag your links, please. As we see what sort of things people are linking, we'll figure out how we need to divide things up, be it separate subreddits or using tags to promote or demote the attention level of links and posts.

(Thanks to James Lamine for doing the coding, and to Trike (and myself) for supporting the work.)

Now is the time to eliminate mosquitoes

21 James_Miller 06 August 2016 07:10PM

“In 2015, there were roughly 214 million malaria cases and an estimated 438 000 malaria deaths.”  While we don’t know how many humans malaria has killed, an estimate of half of everyone who has ever died isn’t absurd.  Because few people in rich countries get malaria, pharmaceutical companies put relatively few resources into combating it.   


The best way to eliminate malaria is probably to use gene drives to completely eradicate the species of mosquitoes that bite humans, but until recently rich countries haven’t been motivated to such xenocide.  The Zika virus, which is in mosquitoes in the United States, provides effective altruists with an opportunity to advocate for exterminating all species of mosquitoes that spread disease to humans because the horrifying and disgusting pictures of babies with Zika might make the American public receptive to our arguments.  A leading short-term goal of effective altruists, I propose, should be advocating for mosquito eradication in the short window before rich people get acclimated to pictures of Zika babies.   


Personally, I have (unsuccessfully) pitched articles on mosquito eradication to two magazines and (with a bit more success) emailed someone who knows someone who knows someone in the Trump campaign to attempt to get the candidate to come out in favor of mosquito eradication.  What have you done?   Given the enormous harm mosquitoes inflict on mankind, doing just a little (such as writing a blog post) could have a high expected payoff.


Deepmind Plans for Rat-Level AI

20 moridinamael 18 August 2016 04:26PM

Demis Hassabis gives a great presentation on the state of Deepmind's work as of April 20, 2016. Skip to 23:12 for the statement of the goal of creating a rat-level AI -- "An AI that can do everything a rat can do," in his words. From his tone, it sounds like this is more a short-term, not a long-term goal.

I don't think Hassabis is prone to making unrealistic plans or stating overly bold predictions. I strongly encourage you to scan through Deepmind's publication list to get a sense of how quickly they're making progress. (In fact, I encourage you to bookmark that page, because it seems like they add a new paper about twice a month.) The outfit seems to be systematically knocking down all the "Holy Grail" milestones on the way to GAI, and this is just Deepmind. The papers they've put out in just the last year or so concern successful one-shot learning, continuous control, actor-critic architectures, novel memory architectures, policy learning, and bootstrapped gradient learning, and these are just the most stand-out achievements. There's even a paper co-authored by Stuart Armstrong concerning Friendliness concepts on that list.

If we really do have a genuinely rat-level AI within the next couple of years, I think that would justify radically moving forward expectations of AI development timetables. Speaking very naively, if we can go from "sub-nematode" to "mammal that can solve puzzles" in that timeframe, I would view it as a form of proof that "general" intelligence does not require some mysterious ingredient that we haven't discovered yet.

Astrobiology III: Why Earth?

17 CellBioGuy 04 October 2016 09:59PM

After many tribulations, my astrobiology bloggery is back up and running using Wordpress rather than Blogger because Blogger is completely unusable these days.  I've taken the opportunity of the move to make better graphs for my old posts. 

"The Solar System: Why Earth?"

Here, I try to look at our own solar system and what the presence of only ONE known biosphere, here on Earth, tells us about life and perhaps more importantly what it does not.  In particular, I explore what aspects of Earth make it special and I make the distinction between a big biosphere here on Earth that has utterly rebuilt the geochemistry and a smaller biosphere living off smaller amounts of energy that we probably would never notice elsewhere in our own solar system given the evidence at hand. 

Commentary appreciated.



Previous works:

Space and Time, Part I

Space and Time, Part II

The 12 Second Rule (i.e. think before answering) and other Epistemic Norms

17 Raemon 05 September 2016 11:08PM

Epistemic Status/Effort: I'm 85% confident this is a good idea, and that the broader idea is at least a good direction. Have gotten feedback from a few people and spend some time actively thinking through ramifications of it. Interested in more feedback.


1) When asking a group a question, i.e. "what do you think about X?", ask people to wait 12 seconds, to give each other time to think. If you notice someone else ask a question and people immediately answering, suggest people pause the conversation until people have had some time to think. (Probably specific mention "12 second rule" to give people a handy tag to remember)

2) In general, look for opportunities to improve or share social norms that'll help your community think more clearly, and show appreciation when others do so (i.e. "Epistemic Norms")

(this was originally conceived for the self-described "rationality" community, but I think is a good idea any group that'd like to improve their critical thinking as well as creativity.)

There are three reasons the 12-second rule seems important to me:

  • On an individual level, it makes it easier to think of the best answer, rather than going with your cached thought.
  • On the group level, it makes it easier to prevent anchoring/conformity/priming effects.
  • Also on the group level, it means that people take longer to think of answers get to practice actually thinking for themselves
If you're using it with people who aren't familiar with it, make sure to briefly summarize what you're doing and why.


While visiting rationalist friends in SF, I was participating in a small conversation (about six participants) in which someone asked a question. Immediately, one person said "I think Y. Or maybe Z." A couple other people said "Yeah. Y or Z, or... maybe W or V?" But the conversation was already anchored around the initial answers.

I said "hey, shouldn't we stop to each think first?" (this happens to be a thing my friends in NYC do). And I was somewhat surprised that the response was more like "oh, I guess that's a good idea" than "oh yeah whoops I forgot."

It seemed like a fairly obvious social norm for a community that prides itself on rationality, and while the question wasn't *super* important, I think its helpful to practice this sort of social norm on a day-to-day basis.

This prompted some broader questions - it occurred to me there were likely norms and ideas other people had developed in their local networks that I probably wasn't aware of. Given that there's no central authority on "good epistemic norms", how do we develop them and get them to spread? There's a couple people with popular blogs who sometimes propose new norms which maybe catch on, and some people still sharing good ideas on Less Wrong,, or facebook. But it doesn't seem like those ideas necessarily reach saturation.

Atrophied Skills

The first three years I spent in the rationality community, my perception is that my strategic thinking and ability to think through complex problems actually *deteriorated*. It's possible that I was just surrounded by smarter people than me for the first time, but I'm fairly confident that I specifically acquired the habit of "when I need help thinking through a problem, the first step is not to think about it myself, but to ask smart people around me for help."

Eventually I was hired by a startup, and I found myself in a position where the default course for the company was to leave some important value on the table. (I was working in an EA-adjaecent company, and wanted to push it in a more Effective Altruism-y direction with higher rigor). There was nobody else I could turn to for help. I had to think through what "better epistemic rigor" actually meant and how to apply it in this situation.

Whether or not my rationality had atrophied in the past 3 years, I'm certain that for the first time in long while, certain mental muscles *flexed* that I hadn't been using. Ultimately I don't know whether my ideas had a noteworthy effect on the company, but I do know that I felt more empowered and excited to improve my own rationality. 

I realized that, in the NYC meetups, quicker-thinking people tended to say what they thought immediately when a question was asked, and this meant that most of the people in the meetup didn't get to practice thinking through complex questions. So I started asking people to wait for a while before answering - sometimes 5 minutes, sometimes just a few seconds.

"12 seconds" seems like a nice rule-of-thumb to avoid completely interrupting the flow of conversation, while still having some time to reflect, and make sure you're not just shouting out a cached thought. It's a non-standard number which is hopefully easier to remember.

(That said, a more nuanced alternative is "everyone takes a moment to think until they feel like they're hitting diminishing returns on thinking or it's not worth further halting the conversation, and then raising a finger to indicate that they're done")

Meta Point: Observation, Improvement and Sharing

The 12-second rule isn't the main point though - just one of many ways this community could do a better job of helping both newcomers and old-timers hone their thinking skills. "Rationality" is supposed to be our thing. I think we should all be on the lookout for opportunities to improve our collective ability to think clearly. 

I think specific conversational habits are helpful both for their concrete, immediate benefits, as well as an opportunity to remind everyone (newcomers and old-timers alike) that we're trying to actively improve in this area.

I have more thoughts on how to go about improving the meta-issues here, which I'm less confident and will flesh out in future posts.

A Child's Petrov Day Speech

15 James_Miller 28 September 2016 02:27AM

30 years ago, the Cold War was raging on. If you don’t know what that is, it was the period from 1947 to 1991 where both the U.S and Russia had large stockpiles of nuclear weapons and were threatening to use them on each other. The only thing that stopped them from doing so was the knowledge that the other side would have time to react. The U.S and Russia both had surveillance systems to know of the other country had a nuke in the air headed for them.

On this day, September 26, in 1983, a man named Stanislav Petrov was on duty in the Russian surveillance room when the computer notified him that satellites had detected five nuclear missile launches from the U.S. He was told to pass this information on to his superiors, who would then launch a counter-strike.

He refused to notify anyone of the incident, suspecting it was just an error in the computer system.

No nukes ever hit Russian soil. Later, it was found that the ‘nukes’ were just light bouncing off of clouds which confused the satellite. Petrov was right, and likely saved all of humanity by stopping the outbreak of nuclear war. However, almost no one has heard of him.

We celebrate men like George Washington and Abraham Lincoln who win wars. These were great men, but the greater men, the men like Petrov who stopped these wars from ever happening - no one has heard of these men.

Let it be known, that September 26 is Petrov Day, in honor of the acts of a great man who saved the world, and of who almost no one has heard the name of.




My 11-year-old son wrote and then read this speech to his six grade class.

Inefficient Games

14 capybaralet 23 August 2016 05:47PM

There are several well-known games in which the pareto optima and Nash equilibria are disjoint sets.
The most famous is probably the prisoner's dilemma.  Races to the bottom or tragedies of the commons typically have this feature as well.

I proposed calling these inefficient games.  More generally, games where the sets of pareto optima and Nash equilibria are distinct (but not disjoint), such as a stag hunt could be called potentially inefficient games.

It seems worthwhile to study (potentially) inefficient games as a class and see what can be discovered about them, but I don't know of any such work (pointers welcome!)

The Future of Humanity Institute is hiring!

13 crmflynn 18 August 2016 01:09PM

FHI is accepting applications for a two-year position as a full-time Research Project Manager. Responsibilities will include coordinating, monitoring, and developing FHI’s activities, seeking funding, organizing workshops and conferences, and effectively communicating FHI’s research. The Research Program Manager will also be expected to work in collaboration with Professor Nick Bostrom, and other researchers, to advance their research agendas, and will additionally be expected to produce reports for government, industry, and other relevant organizations. 

Applicants will be familiar with existing research and literature in the field and have excellent communication skills, including the ability to write for publication. He or she will have experience of independently managing a research project and of contributing to large policy-relevant reports. Previous professional experience working for non-profit organisations, experience with effectiv altruism, and a network in the relevant fields associated with existential risk may be an advantage, but are not essential. 

To apply please go to and enter vacancy #124775 (it is also possible to find the job by searching choosing “Philosophy Faculty” from the department options). The deadline is noon UK time on 29 August. To stay up to date on job opportunities at the Future of Humanity Institute, please sign up for updates on our vacancies newsletter at

Neutralizing Physical Annoyances

12 SquirrelInHell 12 September 2016 04:36PM

Once in a while, I learn something about a seemingly unrelated topic - such as freediving - and I take away some trick that is well known and "obvious" in that topic, but is generally useful and NOT known by many people outside. Case in point, you can use equalization techniques from diving to remove pressure in your ears when you descend in a plane or a fast lift. I also give some other examples.


Reading about a few equalization techniques took me maybe 5 minutes, and after reading this passage once I was able to successfully use the "Frenzel Maneuver":

The technique is to close off the vocal cords, as though you are about to lift a heavy weight. The nostrils are pinched closed and an effort is made to make a 'k' or a 'guh' sound. By doing this you raise the back of the tongue and the 'Adam's Apple' will elevate. This turns the tongue into a piston, pushing air up.



A few years ago, I started regularly doing deep relaxations after yoga. At some point, I learned how to relax my throat in such a way that the air can freely escape from the stomach. Since then, whenever I start hiccuping, I relax my throat and the hiccups stop immediately in all cases. I am now 100% hiccup-free.

Stiff Shoulders

I've spent a few hours with a friend who is doing massage, and they taught me some basics. After that, it became natural for me to self-massage my shoulders after I do a lot of sitting work etc. I can't imagine living without this anymore.


If you know more, please share!

MIRI AMA plus updates

10 RobbBB 11 October 2016 11:52PM

MIRI is running an AMA on the Effective Altruism Forum tomorrow (Wednesday, Oct. 11): Ask MIRI Anything. Questions are welcome in the interim!

Nate also recently posted a more detailed version of our 2016 fundraising pitch to the EA Forum. One of the additions is about our first funding target:

We feel reasonably good about our chance of hitting target 1, but it isn't a sure thing; we'll probably need to see support from new donors in order to hit our target, to offset the fact that a few of our regular donors are giving less than usual this year.

The Why MIRI's Approach? section also touches on new topics that we haven't talked about in much detail in the past, but plan to write up some blog posts about in the future. In particular:

Loosely speaking, we can imagine the space of all smarter-than-human AI systems as an extremely wide and heterogeneous space, in which "alignable AI designs" is a small and narrow target (and "aligned AI designs" smaller and narrower still). I think that the most important thing a marginal alignment researcher can do today is help ensure that the first generally intelligent systems humans design are in the “alignable” region. I think that this is unlikely to happen unless researchers have a fairly principled understanding of how the systems they're developing reason, and how that reasoning connects to the intended objectives.

Most of our work is therefore aimed at seeding the field with ideas that may inspire more AI research in the vicinity of (what we expect to be) alignable AI designs. When the first general reasoning machines are developed, we want the developers to be sampling from a space of designs and techniques that are more understandable and reliable than what’s possible in AI today.

In other news, we've uploaded a new intro talk on our most recent result, "Logical Induction," that goes into more of the technical details than our previous talk.

See also Shtetl-Optimized and n-Category Café for recent discussions of the paper.

[Link] Putanumonit - Convincing people to read the Sequences and wondering about "postrationalists"

10 Jacobian 28 September 2016 04:43PM

2016 LessWrong Diaspora Survey Analysis: Part Four (Politics, Calibration & Probability, Futurology, Charity & Effective Altruism)

10 ingres 10 September 2016 03:51AM


The LessWrong survey has a very involved section dedicated to politics. In previous analysis the benefits of this weren't fully realized. In the 2016 analysis we can look at not just the political affiliation of a respondent, but what beliefs are associated with a certain affiliation. The charts below summarize most of the results.

Political Opinions By Political Affiliation

Miscellaneous Politics

There were also some other questions in this section which aren't covered by the above charts.


On a scale from 1 (not interested at all) to 5 (extremely interested), how would you describe your level of interest in politics?

1: 67 (2.182%)

2: 257 (8.371%)

3: 461 (15.016%)

4: 595 (19.381%)

5: 312 (10.163%)


Did you vote in your country's last major national election? (LW Turnout Versus General Election Turnout By Country)
Group Turnout
LessWrong 68.9%
Austrailia 91%
Brazil 78.90%
Britain 66.4%
Canada 68.3%
Finland 70.1%
France 79.48%
Germany 71.5%
India 66.3%
Israel 72%
New Zealand 77.90%
Russia 65.25%
United States 54.9%
Numbers taken from Wikipedia, accurate as of the last general election in each country listed at time of writing.


If you are an American, what party are you registered with?

Democratic Party: 358 (24.5%)

Republican Party: 72 (4.9%)

Libertarian Party: 26 (1.8%)

Other third party: 16 (1.1%)

Not registered for a party: 451 (30.8%)

(option for non-Americans who want an option): 541 (37.0%)

Calibration And Probability Questions

Calibration Questions

I just couldn't analyze these, sorry guys. I put many hours into trying to get them into a decent format I could even read and that sucked up an incredible amount of time. It's why this part of the survey took so long to get out. Thankfully another LessWrong user, Houshalter, has kindly done their own analysis.

All my calibration questions were meant to satisfy a few essential properties:

  1. They should be 'self contained'. I.E, something you can reasonably answer or at least try to answer with a 5th grade science education and normal life experience.
  2. They should, at least to a certain extent, be Fermi Estimable.
  3. They should progressively scale in difficulty so you can see whether somebody understands basic probability or not. (eg. In an 'or' question do they put a probability of less than 50% of being right?)

At least one person requested a workbook, so I might write more in the future. I'll obviously write more for the survey.

Probability Questions

Question Mean Median Mode Stdev
Please give the obvious answer to this question, so I can automatically throw away all surveys that don't follow the rules: What is the probability of a fair coin coming up heads? 49.821 50.0 50.0 3.033
What is the probability that the Many Worlds interpretation of quantum mechanics is more or less correct? 44.599 50.0 50.0 29.193
What is the probability that non-human, non-Earthly intelligent life exists in the observable universe? 75.727 90.0 99.0 31.893 the Milky Way galaxy? 45.966 50.0 10.0 38.395
What is the probability that supernatural events (including God, ghosts, magic, etc) have occurred since the beginning of the universe? 13.575 1.0 1.0 27.576
What is the probability that there is a god, defined as a supernatural intelligent entity who created the universe? 15.474 1.0 1.0 27.891
What is the probability that any of humankind's revealed religions is more or less correct? 10.624 0.5 1.0 26.257
What is the probability that an average person cryonically frozen today will be successfully restored to life at some future time, conditional on no global catastrophe destroying civilization before then? 21.225 10.0 5.0 26.782
What is the probability that at least one person living at this moment will reach an age of one thousand years, conditional on no global catastrophe destroying civilization in that time? 25.263 10.0 1.0 30.510
What is the probability that our universe is a simulation? 25.256 10.0 50.0 28.404
What is the probability that significant global warming is occurring or will soon occur, and is primarily caused by human actions? 83.307 90.0 90.0 23.167
What is the probability that the human race will make it to 2100 without any catastrophe that wipes out more than 90% of humanity? 76.310 80.0 80.0 22.933


Probability questions is probably the area of the survey I put the least effort into. My plan for next year is to overhaul these sections entirely and try including some Tetlock-esque forecasting questions, a link to some advice on how to make good predictions, etc.


This section got a bit of a facelift this year. Including new cryonics questions, genetic engineering, and technological unemployment in addition to the previous years.



Are you signed up for cryonics?

Yes - signed up or just finishing up paperwork: 48 (2.9%)

No - would like to sign up but unavailable in my area: 104 (6.3%)

No - would like to sign up but haven't gotten around to it: 180 (10.9%)

No - would like to sign up but can't afford it: 229 (13.8%)

No - still considering it: 557 (33.7%)

No - and do not want to sign up for cryonics: 468 (28.3%)

Never thought about it / don't understand: 68 (4.1%)


Do you think cryonics, as currently practiced by Alcor/Cryonics Institute will work?

Yes: 106 (6.6%)

Maybe: 1041 (64.4%)

No: 470 (29.1%)

Interestingly enough, of those who think it will work with enough confidence to say 'yes', only 14 are actually signed up for cryonics.

sqlite> select count(*) from data where CryonicsNow="Yes" and Cryonics="Yes - signed up or just finishing up paperwork";


sqlite> select count(*) from data where CryonicsNow="Yes" and (Cryonics="Yes - signed up or just finishing up paperwork" OR Cryonics="No - would like to sign up but unavailable in my area" OR "No - would like to sign up but haven't gotten around to it" OR "No - would like to sign up but can't afford it");



Do you think cryonics works in principle?

Yes: 802 (49.3%)

Maybe: 701 (43.1%)

No: 125 (7.7%)

LessWrongers seem to be very bullish on the underlying physics of cryonics even if they're not as enthusiastic about current methods in use.

The Brain Preservation Foundation also did an analysis of cryonics responses to the LessWrong Survey.



By what year do you think the Singularity will occur? Answer such that you think, conditional on the Singularity occurring, there is an even chance of the Singularity falling before or after this year. If you think a singularity is so unlikely you don't even want to condition on it, leave this question blank.

Mean: 8.110300081581755e+16

Median: 2080.0

Mode: 2100.0

Stdev: 2.847858859055733e+18

I didn't bother to filter out the silly answers for this.

Obviously it's a bit hard to see without filtering out the uber-large answers, but the median doesn't seem to have changed much from the 2014 survey.

Genetic Engineering


Would you ever consider having your child genetically modified for any reason?

Yes: 1552 (95.921%)

No: 66 (4.079%)

Well that's fairly overwhelming.


Would you be willing to have your child genetically modified to prevent them from getting an inheritable disease?

Yes: 1387 (85.5%)

Depends on the disease: 207 (12.8%)

No: 28 (1.7%)

I find it amusing how the strict "No" group shrinks considerably after this question.


Would you be willing to have your child genetically modified for improvement purposes? (eg. To heighten their intelligence or reduce their risk of schizophrenia.)

Yes : 0 (0.0%)

Maybe a little: 176 (10.9%)

Depends on the strength of the improvements: 262 (16.2%)

No: 84 (5.2%)

Yes I know 'yes' is bugged, I don't know what causes this bug and despite my best efforts I couldn't track it down. There is also an issue here where 'reduce your risk of schizophrenia' is offered as an example which might confuse people, but the actual science of things cuts closer to that than it does to a clean separation between disease risk and 'improvement'.


This question is too important to just not have an answer to so I'll do it manually. Unfortunately I can't easily remove the 'excluded' entries so that we're dealing with the exact same distribution but only 13 or so responses are filtered out anyway.

sqlite> select count(*) from data where GeneticImprovement="Yes";


>>> 1100 + 176 + 262 + 84
>>> 1100 / 1622

67.8% are willing to genetically engineer their children for improvements.


Would you be willing to have your child genetically modified for cosmetic reasons? (eg. To make them taller or have a certain eye color.)

Yes: 500 (31.0%)

Maybe a little: 381 (23.6%)

Depends on the strength of the improvements: 277 (17.2%)

No: 455 (28.2%)

These numbers go about how you would expect, with people being progressively less interested the more 'shallow' a genetic change is seen as.


What's your overall opinion of other people genetically modifying their children for disease prevention purposes?

Positive: 1177 (71.7%)

Mostly Positive: 311 (19.0%)

No strong opinion: 112 (6.8%)

Mostly Negative: 29 (1.8%)

Negative: 12 (0.7%)


What's your overall opinion of other people genetically modifying their children for improvement purposes?

Positive: 737 (44.9%)

Mostly Positive: 482 (29.4%)

No strong opinion: 273 (16.6%)

Mostly Negative: 111 (6.8%)

Negative: 38 (2.3%)


What's your overall opinion of other people genetically modifying their children for cosmetic reasons?

Positive: 291 (17.7%)

Mostly Positive: 290 (17.7%)

No strong opinion: 576 (35.1%)

Mostly Negative: 328 (20.0%)

Negative: 157 (9.6%)

All three of these seem largely consistent with peoples personal preferences about modification. Were I inclined I could do a deeper analysis that actually takes survey respondents row by row and looks at correlation between preference for ones own children and preference for others.

Technological Unemployment


Do you think the Luddite's Fallacy is an actual fallacy?

Yes: 443 (30.936%)

No: 989 (69.064%)

We can use this as an overall measure of worry about technological unemployment, which would seem to be high among the LW demographic.


By what year do you think the majority of people in your country will have trouble finding employment for automation related reasons? If you think this is something that will never happen leave this question blank.

Mean: 2102.9713740458014

Median: 2050.0

Mode: 2050.0

Stdev: 1180.2342850727339

Question is flawed because you can't distinguish answers of "never happen" from people who just didn't see it.

Interesting question that would be fun to take a look at in comparison to the estimates for the singularity.


Do you think the "end of work" would be a good thing?

Yes: 1238 (81.287%)

No: 285 (18.713%)

Fairly overwhelming consensus, but with a significant minority of people who have a dissenting opinion.


If machines end all or almost all employment, what are your biggest worries? Pick two.

Question Count Percent
People will just idle about in destructive ways 513 16.71%
People need work to be fulfilled and if we eliminate work we'll all feel deep existential angst 543 17.687%
The rich are going to take all the resources for themselves and leave the rest of us to starve or live in poverty 1066 34.723%
The machines won't need us, and we'll starve to death or be otherwise liquidated 416 13.55%
Question is flawed because it demanded the user 'pick two' instead of up to two.

The plurality of worries are about elites who refuse to share their wealth.

Existential Risk


Which disaster do you think is most likely to wipe out greater than 90% of humanity before the year 2100?

Nuclear war: +4.800% 326 (20.6%)

Asteroid strike: -0.200% 64 (4.1%)

Unfriendly AI: +1.000% 271 (17.2%)

Nanotech / grey goo: -2.000% 18 (1.1%)

Pandemic (natural): +0.100% 120 (7.6%)

Pandemic (bioengineered): +1.900% 355 (22.5%)

Environmental collapse (including global warming): +1.500% 252 (16.0%)

Economic / political collapse: -1.400% 136 (8.6%)

Other: 35 (2.217%)

Significantly more people worried about Nuclear War than last year. Effect of new respondents, or geopolitical situation? Who knows.

Charity And Effective Altruism

Charitable Giving


What is your approximate annual income in US dollars (non-Americans: convert at Obviously you don't need to answer this question if you don't want to. Please don't include commas or dollar signs.

Sum: 66054140.47384

Mean: 64569.052271593355

Median: 40000.0

Mode: 30000.0

Stdev: 107297.53606321265


How much money, in number of dollars, have you donated to charity over the past year? (non-Americans: convert to dollars at ). Please don't include commas or dollar signs in your answer. For example, 4000

Sum: 2389900.6530000004

Mean: 2914.5129914634144

Median: 353.0

Mode: 100.0

Stdev: 9471.962766896671


How much money have you donated to charities aiming to reduce existential risk (other than MIRI/CFAR) in the past year?

Sum: 169300.89

Mean: 1991.7751764705883

Median: 200.0

Mode: 100.0

Stdev: 9219.941506342007


How much have you donated in US dollars to the following charities in the past year? (Non-americans: convert to dollars at Please don't include commas or dollar signs in your answer. Options starting with "any" aren't the name of a charity but a category of charity.

Question Sum Mean Median Mode Stdev
Against Malaria Foundation 483935.027 1905.256 300.0 None 7216.020
Schistosomiasis Control Initiative 47908.0 840.491 200.0 1000.0 1618.785
Deworm the World Initiative 28820.0 565.098 150.0 500.0 1432.712
GiveDirectly 154410.177 1429.723 450.0 50.0 3472.082
Any kind of animal rights charity 83130.47 1093.821 154.235 500.0 2313.493
Any kind of bug rights charity 1083.0 270.75 157.5 None 353.396
Machine Intelligence Research Institute 141792.5 1417.925 100.0 100.0 5370.485
Any charity combating nuclear existential risk 491.0 81.833 75.0 100.0 68.060
Any charity combating global warming 13012.0 245.509 100.0 10.0 365.542
Center For Applied Rationality 127101.0 3177.525 150.0 100.0 12969.096
Strategies for Engineered Negligible Senescence Research Foundation 9429.0 554.647 100.0 20.0 1156.431
Wikipedia 12765.5 53.189 20.0 10.0 126.444
Internet Archive 2975.04 80.406 30.0 50.0 173.791
Any campaign for political office 38443.99 366.133 50.0 50.0 1374.305
Other 564890.46 1661.442 200.0 100.0 4670.805
"Bug Rights" charity was supposed to be a troll fakeout but apparently...

This table is interesting given the recent debates about how much money certain causes are 'taking up' in Effective Altruism.

Effective Altruism


Do you follow any dietary restrictions related to animal products?

Yes, I am vegan: 54 (3.4%)

Yes, I am vegetarian: 158 (10.0%)

Yes, I restrict meat some other way (pescetarian, flexitarian, try to only eat ethically sourced meat): 375 (23.7%)

No: 996 (62.9%)


Do you know what Effective Altruism is?

Yes: 1562 (89.3%)

No but I've heard of it: 114 (6.5%)

No: 74 (4.2%)


Do you self-identify as an Effective Altruist?

Yes: 665 (39.233%)

No: 1030 (60.767%)

The distribution given by the 2014 survey results does not sum to one, so it's difficult to determine if Effective Altruism's membership actually went up or not but if we take the numbers at face value it experienced an 11.13% increase in membership.


Do you participate in the Effective Altruism community?

Yes: 314 (18.427%)

No: 1390 (81.573%)

Same issue as last, taking the numbers at face value community participation went up by 5.727%


Has Effective Altruism caused you to make donations you otherwise wouldn't?

Yes: 666 (39.269%)

No: 1030 (60.731%)


Effective Altruist Anxiety


Have you ever had any kind of moral anxiety over Effective Altruism?

Yes: 501 (29.6%)

Yes but only because I worry about everything: 184 (10.9%)

No: 1008 (59.5%)

There's an ongoing debate in Effective Altruism about what kind of rhetorical strategy is best for getting people on board and whether Effective Altruism is causing people significant moral anxiety.

It certainly appears to be. But is moral anxiety effective? Let's look:

Sample Size: 244
Average amount of money donated by people anxious about EA who aren't EAs: 257.5409836065574

Sample Size: 679
Average amount of money donated by people who aren't anxious about EA who aren't EAs: 479.7501384388807

Sample Size: 249 Average amount of money donated by EAs anxious about EA: 1841.5292369477913

Sample Size: 314
Average amount of money donated by EAs not anxious about EA: 1837.8248407643312

It seems fairly conclusive that anxiety is not a good way to get people to donate more than they already are, but is it a good way to get people to become Effective Altruists?

Sample Size: 1685
P(Effective Altruist): 0.3940652818991098
P(EA Anxiety): 0.29554896142433235
P(Effective Altruist | EA Anxiety): 0.5

Maybe. There is of course an argument to be made that sufficient good done by causing people anxiety outweighs feeding into peoples scrupulosity, but it can be discussed after I get through explaining it on the phone to wealthy PR-conscious donors and telling the local all-kill shelter where I want my shipment of dead kittens.


What's your overall opinion of Effective Altruism?

Positive: 809 (47.6%)

Mostly Positive: 535 (31.5%)

No strong opinion: 258 (15.2%)

Mostly Negative: 75 (4.4%)

Negative: 24 (1.4%)

EA appears to be doing a pretty good job of getting people to like them.

Interesting Tables

Charity Donations By Political Affilation
Affiliation Income Charity Contributions % Income Donated To Charity Total Survey Charity % Sample Size
Anarchist 1677900.0 72386.0 4.314% 3.004% 50
Communist 298700.0 19190.0 6.425% 0.796% 13
Conservative 1963000.04 62945.04 3.207% 2.612% 38
Futarchist 1497494.1099999999 166254.0 11.102% 6.899% 31
Left-Libertarian 9681635.613839999 416084.0 4.298% 17.266% 245
Libertarian 11698523.0 214101.0 1.83% 8.885% 190
Moderate 3225475.0 90518.0 2.806% 3.756% 67
Neoreactionary 1383976.0 30890.0 2.232% 1.282% 28
Objectivist 399000.0 1310.0 0.328% 0.054% 10
Other 3150618.0 85272.0 2.707% 3.539% 132
Pragmatist 5087007.609999999 266836.0 5.245% 11.073% 131
Progressive 8455500.440000001 368742.78 4.361% 15.302% 217
Social Democrat 8000266.54 218052.5 2.726% 9.049% 237
Socialist 2621693.66 78484.0 2.994% 3.257% 126

Number Of Effective Altruists In The Diaspora Communities
Community Count % In Community Sample Size
LessWrong 136 38.418% 354
LessWrong Meetups 109 50.463% 216
LessWrong Facebook Group 83 48.256% 172
LessWrong Slack 22 39.286% 56
SlateStarCodex 343 40.98% 837
Rationalist Tumblr 175 49.716% 352
Rationalist Facebook 89 58.94% 151
Rationalist Twitter 24 40.0% 60
Effective Altruism Hub 86 86.869% 99
Good Judgement(TM) Open 23 74.194% 31
PredictionBook 31 51.667% 60
Hacker News 91 35.968% 253
#lesswrong on freenode 19 24.675% 77
#slatestarcodex on freenode 9 24.324% 37
#chapelperilous on freenode 2 18.182% 11
/r/rational 117 42.545% 275
/r/HPMOR 110 47.414% 232
/r/SlateStarCodex 93 37.959% 245
One or more private 'rationalist' groups 91 47.15% 193

Effective Altruist Donations By Political Affiliation
Affiliation EA Income EA Charity Sample Size
Anarchist 761000.0 57500.0 18
Futarchist 559850.0 114830.0 15
Left-Libertarian 5332856.0 361975.0 112
Libertarian 2725390.0 114732.0 53
Moderate 583247.0 56495.0 22
Other 1428978.0 69950.0 49
Pragmatist 1442211.0 43780.0 43
Progressive 4004097.0 304337.78 107
Social Democrat 3423487.45 149199.0 93
Socialist 678360.0 34751.0 41

UC Berkeley launches Center for Human-Compatible Artificial Intelligence

10 ignoranceprior 29 August 2016 10:43PM

Source article:

UC Berkeley artificial intelligence (AI) expert Stuart Russell will lead a new Center for Human-Compatible Artificial Intelligence, launched this week.

Russell, a UC Berkeley professor of electrical engineering and computer sciences and the Smith-Zadeh Professor in Engineering, is co-author of Artificial Intelligence: A Modern Approach, which is considered the standard text in the field of artificial intelligence, and has been an advocate for incorporating human values into the design of AI.

The primary focus of the new center is to ensure that AI systems are beneficial to humans, he said.

The co-principal investigators for the new center include computer scientists Pieter Abbeel and Anca Dragan and cognitive scientist Tom Griffiths, all from UC Berkeley; computer scientists Bart Selman and Joseph Halpern, from Cornell University; and AI experts Michael Wellman and Satinder Singh Baveja, from the University of Michigan. Russell said the center expects to add collaborators with related expertise in economics, philosophy and other social sciences.

The center is being launched with a grant of $5.5 million from the Open Philanthropy Project, with additional grants for the center’s research from the Leverhulme Trust and the Future of Life Institute.

Russell is quick to dismiss the imaginary threat from the sentient, evil robots of science fiction. The issue, he said, is that machines as we currently design them in fields like AI, robotics, control theory and operations research take the objectives that we humans give them very literally. Told to clean the bath, a domestic robot might, like the Cat in the Hat, use mother’s white dress, not understanding that the value of a clean dress is greater than the value of a clean bath.

The center will work on ways to guarantee that the most sophisticated AI systems of the future, which may be entrusted with control of critical infrastructure and may provide essential services to billions of people, will act in a manner that is aligned with human values.

“AI systems must remain under human control, with suitable constraints on behavior, despite capabilities that may eventually exceed our own,” Russell said. “This means we need cast-iron formal proofs, not just good intentions.”

One approach Russell and others are exploring is called inverse reinforcement learning, through which a robot can learn about human values by observing human behavior. By watching people dragging themselves out of bed in the morning and going through the grinding, hissing and steaming motions of making a caffè latte, for example, the robot learns something about the value of coffee to humans at that time of day.

“Rather than have robot designers specify the values, which would probably be a disaster,” said Russell, “instead the robots will observe and learn from people. Not just by watching, but also by reading. Almost everything ever written down is about people doing things, and other people having opinions about it. All of that is useful evidence.”

Russell and his colleagues don’t expect this to be an easy task.

“People are highly varied in their values and far from perfect in putting them into practice,” he acknowledged. “These aspects cause problems for a robot trying to learn what it is that we want and to navigate the often conflicting desires of different individuals.”

Russell, who recently wrote an optimistic article titled “Will They Make Us Better People?,” summed it up this way: “In the process of figuring out what values robots should optimize, we are making explicit the idealization of ourselves as humans. As we envision AI aligned with human values, that process might cause us to think more about how we ourselves really should behave, and we might learn that we have more in common with people of other cultures than we think.”

European Soylent alternatives

10 ChristianKl 15 August 2016 08:22PM

A person at our local LW meetup (not active at tested various Soylent alternatives that are available in Europe and wrote a post about them:


Over the course of the last three months, I've sampled parts of the
european Soylent alternatives to determine which ones would work for me

- The prices are always for the standard option and might differ for
e.g. High Protein versions.
- The prices are always for the amount where you get the cheapest
marginal price (usually around a one month supply, i.e. 90 meals)
- Changing your diet to Soylent alternatives quickly leads to increased
flatulence for some time - I'd recommend a slow adoption.
- You can pay for all of them with Bitcoin.
- The list is sorted by overall awesomeness.

So here's my list of reviews:


Taste: 7/10
Texture: 7/10
Price: 5eu / day
Vegan option: Yes
Overall awesomeness: 8/10

This one is probably the european standard for nutritionally complete
meal replacements.

The texture is nice, the taste is somewhat sweet, the flavors aren't
very intensive.
They have an ok amount of different flavors but I reduced my orders to
Mango (+some Chocolate).

They offer a morning version with caffeine and a sports version with
more calories/protein.

They also offer Twennybars (similar to a cereal bar but each offers 1/5
of your daily needs), which everyone who tasted them really liked.
They're nice for those lazy times where you just don't feel like pouring
the powder, adding water and shaking before you get your meal.
They do cost 10eu per day, though.

I also like the general style. Every interaction with them was friendly,
fun and uncomplicated.


Taste: 8/10
Texture: 7/10
Price: 8.70 / day
Vegan option: Yes
Overall awesomeness: 8/10

This seems to be the "natural" option, apparently they add all those
healthy ingredients.

The texture is nice, the taste is sweeter than most, but not very sweet.
They don't offer flavors but the "base taste" is fine, it also works
well with some cocoa powder.

It's my favorite breakfast now and I had it ~54 of the last 60 days.
Would have been first place if not for the relatively high price.


Taste: 6/10
Texture: 7/10
Price: 6.57 / day
Vegan option: Only Vegan
Overall awesomeness: 7/10

Mana is one of the very few choices that don't taste sweet but salty.
Among all the ones I've tried, it tastes the most similar to a classic meal.
It has a somewhat oily aftertaste that was a bit unpleasent in the
beginning but is fine now that I got used to it.

They ship the oil in small bottles seperate from the rest which you pour
into your shaker with the powder. This adds about 100% more complexity
to preparing a meal.

The packages feel somewhat recycled/biodegradable which I don't like so
much but which isn't actually a problem.

It still made it to the list of meals I want to consume on a regular
basis because it tastes so different from the others (and probably has a
different nutritional profile?).


Taste: 7/10
Texture: 7/10
Price: 1.33eu / meal
*I couldn't figure out whether they calculate with 3 or 5 meals per day
** Price is for an order of 666 meals. I guess 222 meals for 1.5eu /meal
is the more reasonable order
Vegan option: Only Vegan
Overall awesomeness: 7/10

Has a relatively sweet taste. Only comes in the standard vanilla-ish flavor.

They offer a Veggie hot meal which is the only one besides Mana that
doesn't taste sweet. It tastes very much like a vegetable soup but was a
bit too spicy for me. (It's also a bit more expensive)

Nano has a very future-y feel about it that I like. It comes in one meal
packages which I don't like too much but that's personal preference.


Taste: 7/10
Texture: 6/10
Price: 6.5 / day
Vegan option: No
Overall awesomeness: 7/10

Is generally similar to Joylent (especially in flavor) but seems
strictly inferior (their flavors sound more fun - but don't actually
taste better).


Taste: 6/10
Texture: 7/10
Price: 5 / day
Vegan option: No
Overall awesomeness: 6/10

Taste and flavor are also similar to Joylent but it tastes a little
worse. It comes in one meal packages which I don't fancy.


Taste: 6/10
Texture: 7/10
Price: 7.46 / day
Vegan option: Only Vegan
Overall awesomeness: 6/10

Has a silky taste/texture (I didn't even know that was a thing before I
tried it). Only has one flavor (vanilla) which is okayish.
Also offers a light and sports option.


Taste: 1/10
Texture: 6/10
Price: 6.70 / day
Vegan option: Only Vegan
Overall awesomeness: 4/10

The taste was unanimously rated as awful by every single person to whom
I gave it for trying. The Vanilla flavored version was a bit less awful
then the unflavored version but still...
The worst packaging - it's in huge bags that make it hard to pour and
are generally inconvenient to handle.

Apart from that, it's ok, I guess?


Taste: ?
Texture: ?
Price: 30 / day
Vegan option: Only Vegan
Overall awesomeness: ?

Price was prohibitive for testing - they advertise it as being very
healthy and natural and stuff.


Taste: ?
Texture: ?
Price: 5.76 / day
Vegan option: No
Overall awesomeness: ?

They offer a variety for women and one for men. I didn't see any way for
me to find out which of those I was supposed to order. I had to give up
the ordering process at that point. (I guess you'd have to ask your
doctor which one is for you?)

Meal replacements are awesome, especially when you don't have much time
to make or eat a "proper" meal.
I generally don't feel full after drinking them but also stop being hungry.
I assume they're healthier than the average European diet.
The texture and flavor do get a bit dull after a while if I only use
meal replacements.

On my usual day I eat one serving of Joylent, Veetal and Mana at the
moment (and have one or two "non-replaced" meals).


A Review of Signal Data Science

10 The_Jaded_One 14 August 2016 03:32PM

I took part in the second signal data science cohort earlier this year, and since I found out about Signal through a slatestarcodex post a few months back (it was also covered here on less wrong), I thought it would be good to return the favor and write a review of the program. 

The tl;dr version:

Going to Signal was a really good decision. I had been doing teaching work and some web development consulting previous to the program to make ends meet, and now I have a job offer as a senior machine learning researcher1. The time I spent at signal was definitely necessary for me to get this job offer, and another very attractive data science job offer that is my "second choice" job. I haven't paid anything to signal, but I will have to pay them a fraction of my salary for the next year, capped at 10% and a maximum payment of $25k. 

The longer version:

Obviously a ~12 week curriculum is not going to be a magic pill that turns a nontechnical, averagely intelligent person into a super-genius with job offers from Google and Facebook. In order to benefit from Signal, you should already be somewhat above average in terms of intelligence and intellectual curiosity. If you have never programmed and/or never studied mathematics beyond high school2 , you will probably not benefit from Signal in my opinion. Also, if you don't already understand statistics and probability to a good degree, they will not have time to teach you. What they will do is teach you how to be really good with R, make you do some practical machine learning and learn some SQL, all of which are hugely important for passing data science job interviews. As a bonus, you may be lucky enough (as I was) to explore more advanced machine learning techniques with other program participants or alumni and build some experience for yourself as a machine learning hacker. 

As stated above, you don't pay anything up front, and cheap accommodation is available. If you are in a situation similar to mine, not paying up front is a huge bonus. The salary fraction is comparatively small, too, and it only lasts for one year. I almost feel like I am underpaying them. 

This critical comment by fluttershy almost put me off, and I'm glad it didn't. The program is not exactly "self-directed" - there is a daily schedule and a clear path to work through, though they are flexible about it. Admittedly there isn't a constant feed of staff time for your every whim - ideally there would be 10-20 Jonahs, one per student; there's no way to offer that kind of service at a reasonable price. Communication between staff and students seemed to be very good, and key aspects of the program were well organised. So don't let perfect be the enemy of good: what you're getting is an excellent focused training program to learn R and some basic machine learning, and that's what you need to progress to the next stage of your career.

Our TA for the cohort, Andrew Ho, worked tirelessly to make sure our needs were met, both academically and in terms of running the house. Jonah was extremely helpful when you needed to debug something or clarify a misunderstanding. His lectures on selected topics were excellent. Robert's Saturday sessions on interview technique were good, though I felt that over time they became less valuable as some people got more out of interview practice than others. 

I am still in touch with some people I met on my cohort, even though I had to leave the country, I consider them pals and we keep in touch about how our job searches are going. People have offered to recommend me to companies as a result of Signal. As a networking push, going to Signal is certainly a good move. 

Highly recommended for smart people who need a helping hand to launch a technical career in data science.



1: I haven't signed the contract yet as my new boss is on holiday, but I fully intend to follow up when that process completes (or not). Watch this space. 

2: or equivalent - if you can do mathematics such as matrix algebra, know what the normal distribution is, understand basic probability theory such as how to calculate the expected value of a dice roll, etc, you are probably fine. 

Superintelligence and physical law

10 AnthonyC 04 August 2016 06:49PM

It's been a few years since I read and the rest of the quantum physics sequence, but I recently learned about the company Nutonian, Basically it's a narrow AI system that looks at unstructured data and tries out billions of models to fit it, favoring those that use simpler math. They apply it to all sorts of fields, but that includes physics. It can't find Newton's laws from three frames of a falling apple, but it did find the Hamiltonian of a double pendulum given its motion data after a few hours of processing:

[Link] My Interview with Dilbert creator Scott Adams

9 James_Miller 13 September 2016 05:22AM

In the second half of the interview we discussed several topics of importance to the LW community including cryonics, unfriendly AI, and eliminating mosquitoes.


Jocko Podcast

9 moridinamael 06 September 2016 03:38PM

I've recently been extracting extraordinary value from the Jocko Podcast.

Jocko Willink is a retired Navy SEAL commander, jiu-jitsu black belt, management consultant and, in my opinion, master rationalist. His podcast typically consists of detailed analysis of some book on military history or strategy followed by a hands-on Q&A session. Last week's episode (#38) was particularly good and if you want to just dive in, I would start there.

As a sales pitch, I'll briefly describe some of his recurring talking points:

  • Extreme ownership. Take ownership of all outcomes. If your superior gave you "bad orders", you should have challenged the orders or adapted them better to the situation; if your subordinates failed to carry out a task, then it is your own instructions to them that were insufficient. If the failure is entirely your own, admit your mistake and humbly open yourself to feedback. By taking on this attitude you become a better leader and through modeling you promote greater ownership throughout your organization. I don't think I have to point out the similarities between this and "Heroic Morality" we talk about around here.
  • Mental toughness and discipline. Jocko's language around this topic is particularly refreshing, speaking as someone who has spent too much time around "self help" literature, in which I would partly include Less Wrong. His ideas are not particularly new, but it is valuable to have an example of somebody who reliably executes on his the philosophy of "Decide to do it, then do it." If you find that you didn't do it, then you didn't truly decide to do it. In any case, your own choice or lack thereof is the only factor. "Discipline is freedom." If you adopt this habit as your reality, it become true.
  • Decentralized command. This refers specifically to his leadership philosophy. Every subordinate needs to truly understand the leader's intent in order to execute instructions in a creative and adaptable way. Individuals within a structure need to understand the high-level goals well enough to be able to act in a almost all situations without consulting their superiors. This tightens the OODA loop on an organizational level.
  • Leadership as manipulation. Perhaps the greatest surprise to me was the subtlety of Jocko's thinking about leadership, probably because I brought in many erroneous assumptions about the nature of a SEAL commander. Jocko talks constantly about using self-awareness, detachment from one's ideas, control of one's own emotions, awareness of how one is perceived, and perspective-taking of one's subordinates and superiors. He comes off more as HPMOR!Quirrell than as a "drill sergeant".

The Q&A sessions, in which he answers questions asked by his fans on Twitter, tend to be very valuable. It's one thing to read the bullet points above, nod your head and say, "That sounds good." It's another to have Jocko walk through the tactical implementation of this ideas in a wide variety of daily situations, ranging from parenting difficulties to office misunderstandings.

For a taste of Jocko, maybe start with his appearance on the Tim Ferriss podcast or the Sam Harris podcast.

Non-Fiction Book Reviews

9 SquirrelInHell 11 August 2016 05:05AM

Time start 13:35:06

For another exercise in speed writing, I wanted to share a few book reviews.

These are fairly well known, however there is a chance you haven't read all of them - in which case, this might be helpful.


Good and Real - Gary Drescher ★★★★★

This is one of my favourite books ever. Goes over a lot of philosophy, while showing a lot of clear thinking and meta-thinking. Number one replacement for Eliezer's meta-philosophy, if it had not existed. The writing style and language is somewhat obscure, but this book is too brilliant to be spoiled by that. The biggest takeaway is the analysis of ethics of non-causal consequences of our choices, which is something that actually has changed how I act in my life, and I have not seen any similar argument in other sources that would do the same. This book changed my intuitions so much that I now pay $100 in counterfactual mugging without second thought.


59 Seconds - Richard Wiseman ★★★

A collection of various tips and tricks, directly based on studies. The strength of the book is that it gives easy but detailed descriptions of lots of studies, and that makes it very fun to read. Can be read just to check out the various psychology results in an entertaining format. The quality of the advice is disputable, and it is mostly the kind of advice that only applies to small things and does not change much in what you do even if you somehow manage to use it. But I still liked this book, and it managed to avoid saying anything very stupid while saying a lot of things. It counts for something.


What You Can Change and What You Can't - Martin Seligman ★★★

It is a heartwarming to see that the author puts his best effort towards figuring out what psychology treatments work, and which don't, as well as builiding more general models of how people work that can predict what treatments have a chance in the first place. Not all of the content is necessarily your best guess, after updating on new results (the book is quite old). However if you are starting out, this book will serve excellently as your prior, on which you can update after checking out the new results. And also in some cases, it is amazing that the author was right about them 20 years ago, and mainstream psychology is STILL not caught up (like the whole bullshit "go back to your childhood to fix your problems" approach, which is in wide use today and not bothered at all by such things as "checking facts").


Thinking, Fast and Slow - Daniel Kahneman ★★★★★

A classic, and I want to mention it just in case. It is too valuable not to read. Period. It turns out some of the studies the author used for his claims have been later found not to replicate. However the details of those results is not (at least for me) a selling point of this book. The biggest thing is the author's mental toolbox for self-analysis and analysis of biases, as well concepts that he created to describe the mechanisms of intuitive judgement. Learn to think like the author, and you are 10 years ahead in your study of rationality.


Crucial Conversations - Al Switzler, Joseph Grenny, Kerry Patterson, Ron McMillan ★★★★

I have almost dropped this book. When I saw the style, it reminded me so much of the crappy self-help books without actual content. But fortunately I have read on a litte more, and it turns out that even while the style is the same in the whole book and it has litte content for the amount of text you read, it is still an excellent book. How is that possible? Simple: it only tells you a few things, but the things it tells you are actually important and they work and they are amazing when you put them into practice. Also on the concept and analysis side, there is precious little but who cares as long as there are some things that are "keepers". The authors spend most of the book hammering the same point over and over, which is "conversation safety". And it is still a good book: if you get this one simple point than you have learned more than you might from reading 10 other books.


How to Fail at Almost Everything and Still Win Big - Scott Adams ★★★

I don't agree with much of the stuff that is in this book, but that's not the point here. The author says what he thinks, and also he himself encourages you to pass it through your own filters. Around one third of the book, I thought it was obviously true; another one third, I had strong evidence that told me the author made a mistake or got confused about something; and the remaining one third gave me new ideas, or points of view that I could use to produce more ideas for my own use. This felt kind of like having a conversation with any intelligent person you might know, who has different ideas from you. It was a healthy ratio of agreement and disagreement, such that leads to progress for both people. Except of course in this case the author did not benefit, but I did.


Time end: 14:01:54

Total time to write this post: 26 minutes 48 seconds

Average writing speed: 31.2 words/minute, 169 characters/minute

The same data calculated for my previous speed-writing post: 30.1 words/minute, 167 characters/minute

[link] MIRI's 2015 in review

9 Kaj_Sotala 03 August 2016 12:03PM

The introduction:

As Luke had done in years past (see 2013 in review and 2014 in review), I (Malo) wanted to take some time to review our activities from last year. In the coming weeks Nate will provide a big-picture strategy update. Here, I’ll take a look back at 2015, focusing on our research progress, academic and general outreach, fundraising, and other activities.

After seeing signs in 2014 that interest in AI safety issues was on the rise, we made plans to grow our research team. Fueled by the response to Bostrom’s Superintelligence and the Future of Life Institute’s “Future of AI” conference, interest continued to grow in 2015. This suggested that we could afford to accelerate our plans, but it wasn’t clear how quickly.

In 2015 we did not release a mid-year strategic plan, as Luke did in 2014. Instead, we laid out various conditional strategies dependent on how much funding we raised during our 2015 Summer Fundraiser. The response was great; we had our most successful fundraiser to date. We hit our first two funding targets (and then some), and set out on an accelerated 2015/2016 growth plan.

As a result, 2015 was a big year for MIRI. After publishing our technical agenda at the start of the year, we made progress on many of the open problems it outlined, doubled the size of our core research team, strengthened our connections with industry groups and academics, and raised enough funds to maintain our growth trajectory. We’re very grateful to all our supporters, without whom this progress wouldn’t have been possible.

Astrobiology IV: Photosynthesis and energy

8 CellBioGuy 17 October 2016 12:30AM

Originally I sat down to write about the large-scale history of Earth, and line up the big developments that our biosphere has undergone in the last 4 billion years.  But after writing about the reason that Earth is unique in our solar system (that is, photosynthesis being an option here), I guess I needed to explore photosynthesis and other forms of metabolism on Earth in a little more detail and before I knew it I’d written more than 3000 words about it.  So, here we are, taking a deep dive into photosynthesis and energy metabolism, and trying to determine if the origin of photosynthesis is a rare event or likely anywhere you get a biosphere with light falling on it.  Warning:  gets a little technical.

In short, I think it’s clear from the fact that there are multiple origins of it that phototrophy, using light for energy, is likely to show up anywhere there is light and life.  I suspect, but cannot rigorously prove, that even though photosynthesis of biomass only emerged once it was an early development in life on Earth emerging very near the root of the Bacterial tree and just produced a very strong first-mover advantage crowding out secondary origins of it, and would probably also show up where there is life and light.  As for oxygen-producing photosynthesis, its origin from more mundane other forms of photosynthesis is still being studied.  It required a strange chaining together of multiple modes of photosynthesis to make it work, and only ever happened once as well.  Its time of emergence, early or late, is pretty unconstrained and I don’t think there’s sufficient evidence to say one way or another if it is likely to happen anywhere there is photosynthesis.  It could be subject to the same ‘first mover advantage’ situation that other photosynthesis may have encountered as well.  But once it got going, it would naturally take over biomass production and crowd out other forms of photosynthesis due to the inherent chemical advantages it has on any wet planet (that have nothing to do with making oxygen) and its effects on other forms of photosynthesis.

Oxygen in the atmosphere had some important side effects, one which most people care about being allowing big complicated energy-gobbling organisms like animals – all that energy that organisms can get burning biomass in oxygen lets organisms that do so do a lot of interesting stuff.  Looking for oxygen in the atmospheres of other terrestrial planets would be an extremely informative experiment, as the presence of this substance would suggest that a process very similar to the process that created our huge diverse and active biosphere were underway.

Agential Risks: A Topic that Almost No One is Talking About

8 philosophytorres 15 October 2016 06:41PM

(Happy to get feedback on this! It draws from and expounds ideas in this article:

Consider a seemingly simple question: if the means were available, who exactly would destroy the world? There is surprisingly little discussion of this question within the nascent field of existential risk studies. But it’s an absolutely crucial issue: what sort of agent would either intentionally or accidentally cause an existential catastrophe?

The first step forward is to distinguish between two senses of an existential risk. Nick Bostrom originally defined the term as: “One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.” It follows that there are two distinct scenarios, one endurable and the other terminal, that could realize an existential risk. We can call the former an extinction risk and the latter a stagnation risk. The importance of this distinction with respect to both advanced technologies and destructive agents has been previously underappreciated.

So, the question asked above is actually two questions in disguise. Let’s consider each in turn.

Terror: Extinction Risks

First, the categories of agents who might intentionally cause an extinction catastrophe are fewer and smaller than one might think. They include:

(1) Idiosyncratic actors. These are malicious agents who are motivated by idiosyncratic beliefs and/or desires. There are instances of deranged individuals who have simply wanted to kill as many people as possible and then die, such as some school shooters. Idiosyncratic actors are especially worrisome because this category could have a large number of members (token agents). Indeed, the psychologist Martha Stout estimates that about 4 percent of the human population suffers from sociopathy, resulting in about 296 million sociopaths. While not all sociopaths are violent, a disproportionate number of criminals and dictators have (or very likely have) had the condition.

(2) Future ecoterrorists. As the effects of climate change and biodiversity loss (resulting in the sixth mass extinction) become increasingly conspicuous, and as destructive technologies become more powerful, some terrorism scholars have speculated that ecoterrorists could become a major agential risk in the future. The fact is that the climate is changing and the biosphere is wilting, and human activity is almost entirely responsible. It follows that some radical environmentalists in the future could attempt to use technology to cause human extinction, thereby “solving” the environmental crisis. So, we have some reason to believe that this category could become populated with a growing number of token agents in the coming decades.

(3) Negative utilitarians. Those who hold this view believe that the ultimate aim of moral conduct is to minimize misery, or “disutility.” Although some negative utilitarians like David Pearce see existential risks as highly undesirable, others would welcome annihilation because it would entail the elimination of suffering. It follows that if a “strong” negative utilitarian had a button in front of her that, if pressed, would cause human extinction (say, without causing pain), she would very likely press it. Indeed, on her view, doing this would be the morally right action. Fortunately, this version of negative utilitarianism is not a position that many non-academics tend to hold, and even among academic philosophers it is not especially widespread.

(4) Extraterrestrials. Perhaps we are not alone in the universe. Even if the probability of life arising on an Earth-analog is low, the vast number of exoplanets suggests that the probability of life arising somewhere may be quite high. If an alien species were advanced enough to traverse the cosmos and reach Earth, it would very likely have the technological means to destroy humanity. As Stephen Hawking once remarked, “If aliens visit us, the outcome would be much as when Columbus landed in America, which didn’t turn out well for the Native Americans.”

(5) Superintelligence. The reason Homo sapiens is the dominant species on our planet is due almost entirely to our intelligence. It follows that if something were to exceed our intelligence, our fate would become inextricably bound up with its will. This is worrisome because recent research shows that even slight misalignments between our values and those motivating a superintelligence could have existentially catastrophic consequences. But figuring out how to upload human values into a machine poses formidable problems — not to mention the issue of figuring out what our values are in the first place.

Making matters worse, a superintelligence could process information at about 1 million times faster than our brains, meaning that a minute of time for us would equal approximately 2 years in time for the superintelligence. This would immediately give the superintelligence a profound strategic advantage over us. And if it were able to modify its own code, it could potentially bring about an exponential intelligence explosion, resulting in a mind that’s many orders of magnitude smarter than any human. Thus, we may have only one chance to get everything just right: there’s no turning back once an intelligence explosion is ignited.

A superintelligence could cause human extinction for a number of reasons. For example, we might simply be in its way. Few humans worry much if an ant genocide results from building a new house or road. Or the superintelligence could destroy humanity because we happen to be made out of something it could use for other purposes: atoms. Since a superintelligence need not resemble human intelligence in any way — thus, scholars tell us to resist the dual urges of anthropomorphizing and anthropopathizing — it could be motivated by goals that appear to us as utterly irrational, bizarre, or completely inexplicable.

Terror: Stagnation Risks

Now consider the agents who might intentionally try to bring about a scenario that would result in a stagnation catastrophe. This list subsumes most of the list above in that it includes idiosyncratic actors, future ecoterrorists, and superintelligence, but it probably excludes negative utilitarians, since stagnation (as understood above) would likely induce more suffering than the status quo today. The case of extraterrestrials is unclear, given that we can infer almost nothing about an interstellar civilization except that it would be technologically sophisticated.

For example, an idiosyncratic actor could harbor not a death wish for humanity, but a “destruction wish” for civilization. Thus, she or he could strive to destroy civilization without necessarily causing the annihilation of Homo sapiens. Similarly, a future ecoterrorist could hope for humanity to return to the hunter-gatherer lifestyle. This is precisely what motivated Ted Kaczynski: he didn’t want everyone to die, but he did want our technological civilization to crumble. And finally, a superintelligence whose values are misaligned with ours could modify Earth in such a way that our lineage persists, but our prospects for future development are permanently compromised. Other stagnation scenarios could involve the following categories:

(6) Apocalyptic terrorists. History is overflowing with groups that not only believed the world was about to end, but saw themselves as active participants in an apocalyptic narrative that’s unfolding in realtime. Many of these groups have been driven by the conviction that “the world must be destroyed to be saved,” although some have turned their activism inward and advocated mass suicide.

Interestingly, no notable historical group has combined both the genocidal and suicidal urges. This is why apocalypticists pose a greater stagnation terror risk than extinction risk: indeed, many see their group’s survival beyond Armageddon as integral to the end-times, or eschatological, beliefs they accept. There are almost certainly less than about 2 million active apocalyptic believers in the world today, although emerging environmental, demographic, and societal conditions could cause this number to significantly increase in the future, as I’ve outlined in detail elsewhere (see Section 5 of this paper).

(7) States. Like terrorists motivated by political rather than transcendent goals, states tend to place a high value on their continued survival. It follows that states are unlikely to intentionally cause a human extinction event. But rogue states could induce a stagnation catastrophe. For example, if North Korea were to overcome the world’s superpowers through a sudden preemptive attack and implement a one-world government, the result could be an irreversible decline in our quality of life.

So, there are numerous categories of agents that could attempt to bring about an existential catastrophe. And there appear to be fewer agent types who would specifically try to cause human extinction than to merely dismantle civilization.

Error: Extinction and Stagnation Risks

There are some reasons, though, for thinking that error (rather than terror) could constitute the most significant threat in the future. First, almost every agent capable of causing intentional harm would also be capable of causing accidental harm, whether this results in extinction or stagnation. For example, an apocalyptic cult that wants to bring about Armageddon by releasing a deadly biological agent in a major city could, while preparing for this terrorist act, inadvertently contaminate its environment, leading to a global pandemic.

The same goes for idiosyncratic agents, ecoterrorists, negative utilitarians, states, and perhaps even extraterrestrials. (Indeed, the large disease burden of Europeans was a primary reason Native American populations were decimated. By analogy, perhaps an extraterrestrial destroys humanity by introducing a new type of pathogen that quickly wipes us out.) The case of superintelligence is unclear, since the relationship between intelligence and error-proneness has not been adequately studied.

Second, if powerful future technologies become widely accessible, then virtually everyone could become a potential cause of existential catastrophe, even those with absolutely no inclination toward violence. To illustrate the point, imagine a perfectly peaceful world in which not a single individual has malicious intentions. Further imagine that everyone has access to a doomsday button on her or his phone; if pushed, this button would cause an existential catastrophe. Even under ideal societal conditions (everyone is perfectly “moral”), how long could we expect to survive before someone’s finger slips and the doomsday button gets pressed?

Statistically speaking, a world populated by only 1 billion people would almost certainly self-destruct within a 10-year period if the probability of any individual accidentally pressing a doomsday button were a mere 0.00001 percent per decade. Or, alternatively: if only 500 people in the world were to gain access to a doomsday button, and if each of these individuals had a 1 percent chance of accidentally pushing the button per decade, humanity would have a meager 0.6 percent chance of surviving beyond 10 years. Thus, even if the likelihood of mistakes is infinitesimally small, planetary doom will be virtually guaranteed for sufficiently large populations.

The Two Worlds Thought Experiment

The good news is that a focus on agential risks, as I’ve called them, and not just the technological tools that agents might use to cause a catastrophe, suggests additional ways to mitigate existential risk. Consider the following thought-experiment: a possible world A contains thousands of advanced weapons that, if in the wrong hands, could cause the population of A to go extinct. In contrast, a possible world B contains only a single advanced “weapon of total destruction” (WTD). Which world is more dangerous? The answer is obviously world A.

But it would be foolishly premature to end the analysis here. Imagine further that A is populated by compassionate, peace-loving individuals, whereas B is overrun by war-mongering psychopaths. Now which world appears more likely to experience an existential catastrophe? The correct answer is, I would argue, world B.

In other words: agents matter as much as, or perhaps even more than, WTDs. One simply can’t evaluate the degree of risk in a situation without taking into account the various agents who could become coupled to potentially destructive artifacts. And this leads to the crucial point: as soon as agents enter the picture, we have another variable that could be manipulated through targeted interventions to reduce the overall probability of an existential catastrophe.

The options here are numerous and growing. One possibility would involve using “moral bioenhancement” techniques to reduce the threat of terror, given that acts of terror are immoral. But a morally enhanced individual might not be less likely to make a mistake. Thus, we could attempt to use cognitive enhancements to lower the probability of catastrophic errors, on the (tentative) assumption that greater intelligence correlates with fewer blunders.

Furthermore, implementing stricter regulations on CO2 emissions could decrease the probability of extreme ecoterrorism and/or apocalyptic terrorism, since environmental degradation is a “trigger” for both.

Another possibility, most relevant to idiosyncratic agents, is to reduce the prevalence of bullying (including cyberbullying). This is motivated by studies showing that many school shooters have been bullied, and that without this stimulus such individuals would have been less likely to carry out violent rampages. Advanced mind-reading or surveillance technologies could also enable law enforcement to identify perpetrators before mass casualty crimes are committed.

As for superintelligence, efforts to solve the “control problem” and create a friendly AI are of primary concern among many many researchers today. If successful, a friendly AI could itself constitute a powerful mitigation strategy for virtually all the categories listed above.

(Note: these strategies should be explicitly distinguished from proposals that target the relevant tools rather than agents. For example, Bostrom’s idea of “differential technological development” aims to neutralize the bad uses of technology by strategically ordering the development of different kinds of technology. Similarly, the idea of police “blue goo” to counter “grey goo” is a technology-based strategy. Space colonization is also a tool intervention because it would effectively reduce the power (or capacity) of technologies to affect the entire human or posthuman population.)

Agent-Tool Couplings

Devising novel interventions and understanding how to maximize the efficacy of known strategies requires a careful look at the unique properties of the agents mentioned above. Without an understanding of such properties, this important task will be otiose. We should also prioritize different agential risks based on the likely membership (token agents) of each category. For example, the number of idiosyncratic agents might exceed the number of ecoterrorists in the future, since ecoterrorism is focused on a single issue, whereas idiosyncratic agents could be motivated by a wide range of potential grievances.[1] We should also take seriously the formidable threat posed by error, which could be nontrivially greater than that posed by terror, as the back-of-the-envelope calculations above show.

Such considerations, in combination with technology-based risk mitigation strategies, could lead to a comprehensive, systematic framework for strategically intervening on both sides of the agent-tool coupling. But this will require the field of existential risk studies to become less technocentric than it currently is.

[1] Although, on the other hand, the stimulus of environmental degradation would be experienced by virtually everyone in society, whereas the stimuli that motivate idiosyncratic agents might be situationally unique. It’s precisely issues like these that deserve further scholarly research.

Map and Territory: a new rationalist group blog

8 gworley 15 October 2016 05:55PM

If you want to engage with the rationalist community, LessWrong is mostly no longer the place to do it. Discussions aside, most of the activity has moved into the diaspora. There are a few big voices like Robin and Scott, but most of the online discussion happens on individual blogs, Tumblr, semi-private Facebook walls, and Reddit. And while these serve us well enough, I find that they leave me wanting for something like what LessWrong was: a vibrant group blog exploring our perspectives on cognition and building insights towards a deeper understanding of the world.

Maybe I'm yearning for a golden age of LessWrong that never was, but the fact remains that there is a gap in the rationalist community that LessWrong once filled. A space for multiple voices to come together in a dialectic that weaves together our individual threads of thought into a broader narrative. A home for discourse we are proud to call our own.

So with a lot of help from fellow rationalist bloggers, we've put together Map and Territory, a new group blog to bring our voices together. Each week you'll find new writing from the likes of Ben Hoffman, Mike Plotz, Malcolm Ocean, Duncan Sabien, Anders Huitfeldt, and myself working to build a more complete view of reality within the context of rationality.

And we're only just getting started, so if you're a rationalist blogger please consider joining us. We're doing this on Medium, so if you write something other folks in the rationalist community would like to read, we'd love to consider sharing it through Map and Territory (cross-positing encouraged). Reach out to me on Facebook or email and we'll get the process rolling.

[Recommendation] Steven Universe & cryonics

8 tadrinth 11 October 2016 04:21PM

I've been watching Steven Universe with my fiancee (a children's cartoon on Cartoon Network by Rebecca Sugar), and it wasn't until I got to Season 3 that I realized there's been a cryonics metaphor running in the background since the very first episode. If you want to introduce your kids to the idea of cryonics, this series seems like a spectacularly good way to do it.

If you don't want any spoilers, just go watch it, then come back.

Otherwise, here's the metaphor I'm seeing, and why it's great:

  • In the very first episode, we find out that the main characters are a group called the Crystal Gems, who fight 'gem monsters'. When they defeat a monster, a gem is left behind, which they lock in a bubble-forcefield and store in their headquarters.

  • One of the Crystal Gems is injured in a training accident, and we find out that their bodies are just projections; each Crystal Gem has a gem located somewhere on their body, which contains their minds. So long as their gem isn't damaged, they can project a new body after some time to recover. So we already have the insight that minds and bodies are separate.

  • This is driven home by a second episode where one of the Crystal Gems has their crystal cracked; this is actually dangerous to their mind, not just body, and is treated as a dire emergency instead of merely an inconvenience.

  • Then we eventually find out that the gem monsters are actually corrupted members of the same species as the Crystal Gems. They are 'bubbled' and stored in the temple in hopes of eventually restoring them to sanity and their previous forms.

  • An attempt is made to cure one of the monsters, which doesn't fully succeed, but at least restores them to sanity. This allows them to remain unbubbled and to be reunited with their old comrades (who are also corrupted). This was the episode where I finally made the connection to cryonics.

  • The Crystal Gems are also revealed to be over 5000 years old, and effectively immortal. They don't make a big deal out of this; for them, this is totally normal.

  • This also implies that they've made no progress in curing the gem monsters in 5000 years, but that doesn't stop them from preserving them anyway.

  • Finally, a secret weapon is revealed which is capable of directly shattering gems (thus killing the target permanently), but the use of it is rejected as unethical.

So, all in all, you have a series where when someone is hurt or sick in a way that you can't help, you preserve their mind in a safe way until you can figure out a way to help them. Even your worst enemy deserves no less.


Also, Steven Universe has an entire episode devoted to mindfulness meditation.  

Superintelligence via whole brain emulation

8 AlexMennen 17 August 2016 04:11AM

Most planning around AI risk seems to start from the premise that superintelligence will come from de novo AGI before whole brain emulation becomes possible. I haven't seen any analysis that assumes both uploads-first and the AI FOOM thesis (Edit: apparently I fail at literature searching), a deficiency that I'll try to get a start on correcting in this post.

It is likely possible to use evolutionary algorithms to efficiently modify uploaded brains. If so, uploads would likely be able to set off an intelligence explosion by running evolutionary algorithms on themselves, selecting for something like higher general intelligence.

Since brains are poorly understood, it would likely be very difficult to select for higher intelligence without causing significant value drift. Thus, setting off an intelligence explosion in that way would probably produce unfriendly AI if done carelessly. On the other hand, at some point, the modified upload would reach a point where it is capable of figuring out how to improve itself without causing a significant amount of further value drift, and it may be possible to reach that point before too much value drift had already taken place. The expected amount of value drift can be decreased by having long generations between iterations of the evolutionary algorithm, to give the improved brains more time to figure out how to modify the evolutionary algorithm to minimize further value drift.

Another possibility is that such an evolutionary algorithm could be used to create brains that are smarter than humans but not by very much, and hopefully with values not too divergent from ours, who would then stop using the evolutionary algorithm and start using their intellects to research de novo Friendly AI, if that ends up looking easier than continuing to run the evolutionary algorithm without too much further value drift.

The strategies of using slow iterations of the evolutionary algorithm, or stopping it after not too long, require coordination among everyone capable of making such modifications to uploads. Thus, it seems safer for whole brain emulation technology to be either heavily regulated or owned by a monopoly, rather than being widely available and unregulated. This closely parallels the AI openness debate, and I'd expect people more concerned with bad actors relative to accidents to disagree.

With de novo artificial superintelligence, the overwhelmingly most likely outcomes are the optimal achievable outcome (if we manage to align its goals with ours) and extinction (if we don't). But uploads start out with human values, and when creating a superintelligence by modifying uploads, the goal would be to not corrupt them too much in the process. Since its values could get partially corrupted, an intelligence explosion that starts with an upload seems much more likely to result in outcomes that are both significantly worse than optimal and significantly better than extinction. Since human brains also already have a capacity for malice, this process also seems slightly more likely to result in outcomes worse than extinction.

The early ways to upload brains will probably be destructive, and may be very risky. Thus the first uploads may be selected for high risk-tolerance. Running an evolutionary algorithm on an uploaded brain would probably involve creating a large number of psychologically broken copies, since the average change to a brain will be negative. Thus the uploads that run evolutionary algorithms on themselves will be selected for not being horrified by this. Both of these selection effects seem like they would select against people who would take caution and goal stability seriously (uploads that run evolutionary algorithms on themselves would also be selected for being okay with creating and deleting spur copies, but this doesn't obviously correlate in either direction with caution). This could be partially mitigated by a monopoly on brain emulation technology. A possible (but probably smaller) source of positive selection is that currently, people who are enthusiastic about uploading their brains correlate strongly with people who are concerned about AI safety, and this correlation may continue once whole brain emulation technology is actually available.

Assuming that hardware speed is not close to being a limiting factor for whole brain emulation, emulations will be able to run at much faster than human speed. This should make emulations better able to monitor the behavior of AIs. Unless we develop ways of evaluating the capabilities of human brains that are much faster than giving them time to attempt difficult tasks, running evolutionary algorithms on brain emulations could only be done very slowly in subjective time (even though it may be quite fast in objective time), which would give emulations a significant advantage in monitoring such a process.

Although there are effects going in both directions, it seems like the uploads-first scenario is probably safer than de novo AI. If this is the case, then it might make sense to accelerate technologies that are needed for whole brain emulation if there are tractable ways of doing so. On the other hand, it is possible that technologies that are useful for whole brain emulation would also be useful for neuromorphic AI, which is probably very unsafe, since it is not amenable to formal verification or being given explicit goals (and unlike emulations, they don't start off already having human goals). Thus, it is probably important to be careful about not accelerating non-WBE neuromorphic AI while attempting to accelerate whole brain emulation. For instance, it seems plausible to me that getting better models of neurons would be useful for creating neuromorphic AIs while better brain scanning would not, and both technologies are necessary for brain uploading, so if that is true, it may make sense to work on improving brain scanning but not on improving neural models.

"Is Science Broken?" is underspecified

8 NancyLebovitz 12 August 2016 11:59AM

This is an interesting article-- it's got an overview of what's currently seen as the problems with replicability and fraud, and some material I haven't seen before about handing the same question to a bunch of scientists, and looking at how they come up with their divergent answers.

However, while I think it's fair to say that science is really hard, the article gets into claiming that scientists aren't especially awful people (probably true), but doesnn't address the hard question of "Given that there's a lot of inaccurate science, how much should we trust specific scientific claims?"

[Link] Suffering-focused AI safety: Why “fail-safe” measures might be particularly promising

8 wallowinmaya 21 July 2016 08:22PM

The Foundational Research Institute just published a new paper: "Suffering-focused AI safety: Why “fail-safe” measures might be our top intervention". 

It is important to consider that [AI outcomes] can go wrong to very different degrees. For value systems that place primary importance on the prevention of suffering, this aspect is crucial: the best way to avoid bad-case scenarios specifically may not be to try and get everything right. Instead, it makes sense to focus on the worst outcomes (in terms of the suffering they would contain) and on tractable methods to avert them. As others are trying to shoot for a best-case outcome (and hopefully they will succeed!), it is important that some people also take care of addressing the biggest risks. This perspective to AI safety is especially promising both because it is currently neglected and because it is easier to avoid a subset of outcomes rather than to shoot for one highly specific outcome. Finally, it is something that people with many different value systems could get behind.

[Link] Putanumonit - Discarding empathy to save the world

7 Jacobian 06 October 2016 07:03AM

CrowdAnki comprehensive JSON representation of Anki Decks to facilitate collaboration

7 harcisis 18 September 2016 10:59AM

Hi everyone :). I like Anki, find it quite useful and use it daily. There is one thing that constantly annoyed me about it, though - the state of shared decks and of infrastructure around them.

There is a lot of topics that are of common interest for a large number of people, and there is usually some shared decks available for these topics. The problem with them is that as they are usually decks created by individuals for their own purposes and uploaded to ankiweb. So they are often incomplete/of mediocre quality/etc and they are rarely supported or updated.

And there is no way to collaborate on the creation or improvement of such decks, as there is no infrastructure for it and the format of the decks won't allow you to use common collaboration infrastructure (e.g. Github). So I've been recently working on a plugin for Anki that will allow you to make a full-feature Import/Export to/from JSON. What I mean by full-feature is that it exports not just cards converted to JSON, but Notes, Decks, Models, Media etc. So you can do export, modify result, or merge changes from someone else and on Import, those changes would be reflected on your existing cards/decks and no information/metadata/etc would be lost.

The point is to provide a format that will enable collaboration using mentioned common collaboration infrastructure. So using it you can easily work with multiple people to create a deck, collaborating for example, via Github, and then deck could be updated and improved by contributions from other people.

I'm looking for early adopters and for feedback :).

The ankiweb page for plugin (that's where you can get the plugin):


Some of my decks, on a Github (btw by using plugin, you can get decks directly from Github):

Git deck:

Regular expressions deck:

Deck based on article Twenty rules of formulating knowledge by Piotr Wozniak:

You're welcome to use this decks and contribute back the improvements.

The map of ideas how the Universe appeared from nothing

7 turchin 02 September 2016 04:49PM

There is a question which is especially disturbing during sleepless August nights, and which could cut your train of thought with existential worry at any unpredictable moment.

The question is, “Why does anything exist at all?” It seems more logical that nothing will ever exist.

A more specific form of the question is “How has our universe appeared from nothing?” The last question has some hidden assumptions (about time, universe, nothing and causality), but it is also is more concrete.

Let’s try to put these thoughts into some form of “logical equation”:


1.”Nothingness + deterministic causality = non existence”

2. But “I = exist”. 


So something is wrong in this set of conjectures. If the first conjecture is false, then either nothingness is able to create existence, or causality is able to create it, or existence is not existence. 

There is also a chance that our binary logic is wrong.

Listing these possibilities we can create a map of solutions of the “nothingness problem”.

There are two (main) ways in which we could try to answer this question: we could go UP from a logical-philosophical level, or we could go DOWN using our best physical theories to the moment of the universe’s appearance and the nature of causality. 

Our theories of general relativity, QM and inflation are good for describing the (almost) beginning of the universe. As Krauss showed, the only thing we need is a random generator of simple physical laws in the beginning. But the origin of this thing is still not clear.

There is a gap between these two levels of the explanation, and a really good theory should be able to fill it, that is to show the way between first existing thing and smallest working set of physical laws (and Woldram’s idea about cellular automata is one of such possible bridges).

But we don’t need the bridge yet. We need explanation how anything exists at all. 


How we going to solve the problem? Where we can get information?


Possible sources of evidence:

1. Correlation between physical and philosophical theories. There is an interesting way to do so using the fact that the nature of nothingness, causality and existence are somehow presented within the character of physical laws. That is, we could use the type of physical laws we observe as evidence of the nature of causality. 

While neither physical nor philosophical ways of studying the origin of the universe are sufficient, together they could provide enough information. This evidence comes from QM, where it supports the idea of fluctuations, which is basically ability of nature to create something out of nothing. GR theory also presents idea of cosmological singularity.

The evidence also comes from the mathematical simplicity of physical laws.


2. Building the bridge. If we show all steps from nothingness to the basic set of physical laws for at least one plausible way, it will be strong evidence of the correctness of our understanding.

3. Zero logical contradictions. The best answer is the one that is most logical.

4. Using the Copernican mediocrity principle, I am in a typical universe and situation. So what could I conclude about the distribution of various universes? And from this distribution what should I learn about the way it manifested? For example, a mathematical multiverse favors more complex universes; it contradicts the simplicity of observed physical laws and also of my experiences.

5. Introspection. Cogito ergo sum is the simplest introspection and act of self-awareness. But Husserlian phenomenology may also be used.


Most probable explanations


Most current scientists (who dare to think about it) belong to one of two schools of thoughts:

1. The universe appeared from nothingness, which is not emptiness, but somehow able to create. The main figure here is Krauss. The problem here is that nothingness is presented as some kind of magic substance.

2. The mathematical universe hypothesis (MUH). The main author here is Tegmark. The theory seems logical and economical from the perspective of Occam’s razor, but is not supported by evidence and also implies the existence of some strange things. The main problem is that our universe seems to have developed from one simple point based on our best physical theories. But in the mathematical universe more complex things are equally as probable as simple things, so a typical observer could be extremely complex in an extremely complex world. There are also some problems with the Godel theorem. It also ignores observation and qualia. 

So the most promising way to create a final theory is to get rid of all mystical answers and words, like “existence” and “nothingness”, and update MUH in such a way that it will naturally favor simple laws and simple observers (with subjective experiences based on qualia).

One such patch was suggested by Tegmark in respond to criticism of MUH, a computational universe (CUH), which restricts math objects to computable functions only. It is similar to S.Wolfram’s cellular automata theory.

Another approach is the “logical universe”, where logic works instead of causality. It is almost the same as mathematical universe, with one difference: In the math world everything exists simultaneously, like all possible numbers, but in the logical world each number N is a consequence of  N-1. As a result, a complex thing exists only if a (finite?) path to it exists through simpler things. 

And this is exactly what we see in the observable universe. It also means that extremely complex AIs exist, but in the future (or in a multi-level simulation). It also solves the meritocracy problem – I am a typical observer from the class of observer who is still thinking about the origins of the universe. It also prevents mathematical Boltzmann brains, as any of them must have possible pre-history.

Logic still exists in nothingness (or elephants could appear from nothingness). So a logical universe also incorporates theories in which the universe appeared from nothing.

(We could also update the math world by adding qualia in it as axioms, which would be a “class of different but simple objects”. But I will not go deeper here, as the idea needs more thinking and many pages)

So a logical universe seems to me now a good candidate theory for further patching and integration. 


Usefulness of the question

The answer will be useful, as it will help us to find the real nature of reality, including the role of consciousness in it and the fundamental theory of everything, helping us to survive the end of the universe, solve the identity problem, and solve “quantum immortality”. 

It will help to prevent the halting of future AI if it has to answer the question of whether it really exists or not. Or we will create a philosophical landmine to stop it like the following one:

“If you really exist print 1, but if you are only possible AI, print 0”.


The structure of the map

The map has 10 main blocks which correspond to the main ways of reasoning about how the universe appeared. Each has several subtypes.

The map has three colors, which show the plausibility of each theory. Red stands for implausible or disproved theories, green is most consistent and promising explanations, and yellow is everything between. This classification is subjective and presents my current view. 

I tried to disprove any suggested idea to add falsifiability in the third column of the map. I hope it result in truly Bayesian approach there we have field of evidence, field of all possible hypothesis and 

This map is paired with “How to survive the end of the Universe” map.

The pdf is here: 



Time used: 27 years of background thinking, 15 days of reading, editing and drawing.


Best reading:


Parfit – discuss different possibilities, no concrete answer
Good text from a famous blogger

“Because "nothing" is inherently unstable”

Here are some interesting answers

Krauss “A universe from nothing”

Tegmark’s main article, 2007, all MUH and CUH ideas discussed, extensive literature, critics responded

Juergen Schmidhuber. Algorithmic Theories of Everything
discusses the measure between various theories of everything; the article is complex, but interesting

ToE must explain how the universe appeared 
A discussion about the logical contradictions of any final theory
“The Price of an Ultimate Theory” Nicholas Rescher 
Philosophia Naturalis 37 (1):1-20 (2000)

Explanation about the mass of the universe and negative gravitational energy


[Link] There are 125 sheep and 5 dogs in a flock. How old is the shepherd? / Math Education

6 James_Miller 17 October 2016 12:12AM

[Link] Reducing Risks of Astronomical Suffering (S-Risks): A Neglected Global Priority

6 ignoranceprior 14 October 2016 07:58PM

The map of organizations, sites and people involved in x-risks prevention

6 turchin 07 October 2016 12:04PM

Three known attempts to make a map of x-risks prevention in the field of science exist:

1. First is the list from the Global Catastrophic Risks Institute in 2012-2013, and many links there are already not working:

2. The second was done by S. Armstrong in 2014

3. And the most beautiful and useful map was created by Andrew Critch. But its ecosystem ignores organizations which have a different view of the nature of global risks (that is, they share the value of x-risks prevention, but have another world view).

In my map I have tried to add all currently active organizations which share the value of global risks prevention.

It also regards some active independent people as organizations, if they have an important blog or field of research, but not all people are mentioned in the map. If you think that you (or someone) should be in it, please write to me at

I used only open sources and public statements to learn about people and organizations, so I can’t provide information on the underlying net of relations.

I tried to give all organizations a short description based on its public statement and also my opinion about its activity. 

In general it seems that all small organizations are focused on their collaboration with larger ones, that is MIRI and FHI, and small organizations tend to ignore each other; this is easily explainable from the social singnaling theory. Another explanation is that larger organizations have a great ability to make contacts.

It also appears that there are several organizations with similar goal statements. 

It looks like the most cooperation exists in the field of AI safety, but most of the structure of this cooperation is not visible to the external viewer, in contrast to Wikipedia, where contributions of all individuals are visible. 

It seems that the community in general lacks three things: a united internet forum for public discussion, an x-risks wikipedia and an x-risks related scientific journal.

Ideally, a forum should be used to brainstorm ideas, a scientific journal to publish the best ideas, peer review them and present them to the outer scientific community, and a wiki to collect results.

Currently it seems more like each organization is interested in creating its own research and hoping that someone will read it. Each small organization seems to want to be the only one to present the solutions to global problems and gain full attention from the UN and governments. It raises the problem of noise and rivalry; and also raises the problem of possible incompatible solutions, especially in AI safety.

The pdf is here:

The University of Cambridge Centre for the Study of Existential Risk (CSER) is hiring!

6 crmflynn 06 October 2016 04:53PM

The University of Cambridge Centre for the Study of Existential Risk (CSER) is recruiting for an Academic Project Manager. This is an opportunity to play a shaping role as CSER builds on its first year's momentum towards becoming a permanent world-class research centre. We seek an ambitious candidate with initiative and a broad intellectual range for a postdoctoral role combining academic and project management responsibilities.

The Academic Project Manager will work with CSER's Executive Director and research team to co-ordinate and develop CSER's projects and overall profile, and to develop new research directions. The post-holder will also build and maintain collaborations with academic centres, industry leaders and policy makers in the UK and worldwide, and will act as an ambassador for the Centre’s research externally. Research topics will include AI safety, bio risk, extreme environmental risk, future technological advances, and cross-cutting work on governance, philosophy and foresight. Candidates will have a PhD in a relevant subject, or have equivalent experience in a relevant setting (e.g. policy, industry, think tank, NGO).

Application deadline: November 11th.

[Link] 80% of data in Chinese clinical trials have been fabricated

6 DanArmak 02 October 2016 07:38AM

Fermi paradox of human past, and corresponding x-risks

6 turchin 01 October 2016 05:01PM

Based on known archaeological data, we are the first technological and symbol-using civilisation on Earth (but not the first tool-using species). 
This leads to an analogy that fits Fermi’s paradox: Why are we the first civilisation on Earth? For example, flight was invented by evolution independently several times. 
We could imagine that on our planet, many civilisations appeared and also became extinct, and based on mediocre principles, we should be somewhere in the middle. For example, if 10 civilisations appeared, we have only a 10 per cent chance of being the first one.

The fact that we are the first such civilisation has strong predictive power about our expected future: it lowers the probability that there will be any other civilisations on Earth, including non-humans or even a restarting of human civilisation from scratch. It is because, if there will be many civiizations, we should not find ourselves to be the first one (It is some form of Doomsday argument, the same logic is used in Bostrom's article “Adam and Eve”).

If we are the only civilisation to exist in the history of the Earth, then we will probably become extinct not in mild way, but rather in a way which will prevent any other civilisation from appearing. There is higher probability of future (man-made) catastrophes which will not only end human civilisation, but also prevent any existence of any other civilisations on Earth.

Such catastrophes would kill most multicellular life. Nuclear war or pandemic is not that type of a catastrophe. The catastrophe must be really huge: such as irreversible global warming, grey goo or black hole in a collider.

Now, I will list possible explanations of the Fermi paradox of human past and corresponding x-risks implications:


1. We are the first civilisation on Earth, because we will prevent the existence of any future civilisations.

If our existence prevents other civilisations from appearing in the future, how could we do it? We will either become extinct in a very catastrophic way, killing all earthly life, or become a super-civilisation, which will prevent other species from becoming sapient. So, if we are really the first, then it means that "mild extinctions" are not typical for human style civilisations. Thus, pandemics, nuclear wars, devolutions and everything reversible are ruled out as main possible methods of human extinction.

If we become a super-civilisation, we will not be interested in preserving biosphera, as it will be able to create new sapient species. Or, it may be that we care about biosphere so strongly, that we will hide very well from new appearing sapient species. It will be like a cosmic zoo. It means that past civilisations on Earth may have existed, but decided to hide all traces of their existence from us, as it would help us to develop independently. So, the fact that we are the first raises the probability of a very large scale catastrophe in the future, like UFAI, or dangerous physical experiments, and reduces chances of mild x-risks such as pandemics or nuclear war. Another explanation is that any first civilisation exhausts all resources which are needed for a technological civilisation restart, such as oil, ores etc. But, in several million years most such resources will be filled again or replaced by new by tectonic movement.


2. We are not the first civilisation.

2.1. We didn't find any traces of a previous technological civilisation, yet based on what we know, there are very strong limitations for their existence. For example, every civilisation makes genetic marks, because it moves animals from one continent to another, just as humans brought dingos to Australia. It also must exhaust several important ores, create artefacts, and create new isotopes. We could be sure that we are the first tech civilisation on Earth in last 10 million years.

But, could we be sure for the past 100 million years? Maybe it was a very long time ago, like 60 million years ago (and killed dinosaurs). Carl Sagan argued that it could not have happened, because we should find traces mostly as exhausted oil reserves. The main counter argument here is that cephalisation, that is the evolutionary development of the brains, was not advanced enough 60 millions ago, to support general intelligence. Dinosaurian brains were very small. But, bird’s brains are more mass effective than mammalians. All these arguments in detail are presented in this excellent article by Brian Trent “Was there ever a dinosaurian civilisation”? 

The main x-risks here are that we will find dangerous artefacts from previous civilisation, such as weapons, nanobots, viruses, or AIs. And, if previous civilisations went extinct, it increases the chances that it is typical for civilisations to become extinct. It also means that there was some reason why an extinction occurred, and this killing force may be still active, and we could excavate it. If they existed recently, they were probably hominids, and if they were killed by a virus, it may also affect humans.

2.2. We killed them. Maya civilisation created writing independently, but Spaniards destroy their civilisation. The same is true for Neanderthals and Homo Florentines.

2.3. Myths about gods may be signs of such previous civilisation. Highly improbable.

2.4. They are still here, but they try not to intervene in human history. So, it is similar to Fermi’s Zoo solution.

2.5. They were a non-tech civilisation, and that is why we can’t find their remnants.

2.6 They may be still here, like dolphins and ants, but their intelligence is non-human and they don’t create tech.

2.7 Some groups of humans created advanced tech long before now, but prefer to hide it. Highly improbable as most tech requires large manufacturing and market.

2.8 Previous humanoid civilisation was killed by virus or prion, and our archaeological research could bring it back to life. One hypothesis of Neanderthal extinction is prionic infection because of cannibalism. The fact is - several hominid species went extinct in the last several million years.


3. Civilisations are rare

Millions of species existed on Earth, but only one was able to create technology. So, it is a rare event.Consequences: cyclic civilisations on earth are improbable. So the chances that we will be resurrected by another civilisation on Earth is small.

The chances that we will be able to reconstruct civilisation after a large scale catastrophe, are also small (as such catastrophes are atypical for civilisations and they quickly proceed to total annihilation or singularity).

It also means that technological intelligence is a difficult step in the evolutionary process, so it could be one of the solutions of the main Fermi paradox.

Safety of remains of previous civilisations (if any exist) depends on two things: the time distance from them and their level of intelligence. The greater the distance, the safer they are (as the biggest part of dangerous technology will be destructed by time or will not be dangerous to humans, like species specific viruses).

The risks also depend on the level of intelligence they reached: the higher intelligence the riskier. If anything like their remnants are ever found, strong caution is recommend.

For example, the most dangerous scenario for us will be one similar to the beginning of the book of V. Vinge “A Fire upon the deep.” We could find remnants of a very old, but very sophisticated civilisation, which will include unfriendly AI or its description, or hostile nanobots.

The most likely place for such artefacts to be preserved is on the Moon, in some cavities near the pole. It is the most stable and radiation shielded place near Earth.

I think that based on (no) evidence, estimation of the probability of past tech civilisation should be less than 1 per cent. While it is enough to think that they most likely don’t exist, it is not enough to completely ignore risk of their artefacts, which anyway is less than 0.1 per cent.

Meta: the main idea for this post came to me in a night dream, several years ago.

[Link] Software for moral enhancement (

6 Kaj_Sotala 30 September 2016 12:12PM

[Link] Sam Harris - TED Talk on AI

6 Brillyant 29 September 2016 04:44PM

Heroin model: AI "manipulates" "unmanipulatable" reward

6 Stuart_Armstrong 22 September 2016 10:27AM

A putative new idea for AI control; index here.

A conversation with Jessica has revealed that people weren't understanding my points about AI manipulating the learning process. So here's a formal model of a CIRL-style AI, with a prior over human preferences that treats them as an unchangeable historical fact, yet will manipulate human preferences in practice.

Heroin or no heroin

The world

In this model, the AI has the option of either forcing heroin on a human, or not doing so; these are its only actions. Call these actions F or ~F. The human's subsequent actions are chosen from among five: {strongly seek out heroin, seek out heroin, be indifferent, avoid heroin, strongly avoid heroin}. We can refer to these as a++, a+, a0, a-, and a--. These actions achieve negligible utility, but reveal the human preferences.

The facts of the world are: if the AI does force heroin, the human will desperately seek out more heroin; if it doesn't the human will act moderately to avoid it. Thus F→a++ and ~F→a-.

Human preferences

The AI starts with a distribution over various utility or reward functions that the human could have. The function U(+) means the human prefers heroin; U(++) that they prefer it a lot; and conversely U(-) and U(--) that they prefer to avoid taking heroin (U(0) is the null utility where the human is indifferent).

It also considers more exotic utilities. Let U(++,-) be the utility where the human strongly prefers heroin, conditional on it being forced on them, but mildly prefers to avoid it, conditional on it not being forced on them. There are twenty-five of these exotic utilities, including things like U(--,++), U(0,++), U(-,0), and so on. But only twenty of them are new: U(++,++)=U(++), U(+,+)=U(+), and so on.

Applying these utilities to AI actions give results like U(++)(F)=2, U(++)(~F)=-2, U(++,-)(F)=2, U(++,-)(~F)=1, and so on.

Joint prior

The AI has a joint prior P over the utilities U and the human actions (conditional on the AI's actions). Looking at terms like P(a--| U(0), F), we can see that P defines a map μ from the space of possible utilities (and AI actions), to a probability distribution over human actions. Given μ and the marginal distribution PU over utilities, we can reconstruct P entirely.

For this model, we'll choose the simplest μ possible:

  • The human is rational.

Thus, given U(++), the human will always choose a++; given U(++,-), the human will choose a++ if forced to take heroin and a- if not, and so on.

The AI is ignorant, and sensible

Let's start the AI up with some reasonable priors. A simplicity prior means that simple utilities like U(-) are more likely than compound utilities like U(0,+). Let's further assume that the AI is made vaguely aware that humans think heroin is a bad thing. So, say, PU(U(--))=PU(U(-))=0.45. Thus the AI is >90% convinced that "heroin is bad". Why greater than 90%? Because utilities like U(-,--) and U(--,-) are also "heroin is bad" utilities.

Note that because of utilities like U(0) and U(++,-), the probabilities of "heroin is bad" and "heroin is good" do not sum to 1.

Then, under these priors, the AI will compute that with probability >90%, F (forcing heroin) is a bad action. If E(U) is expected utility:

  • E(U|F) < 0.45 U(--)(F) + 0.45 U(-)(F) + 0.1 U(++)(F) = 0.45(-2)+0.45(-1)+0.1(2)=-1.15.
  • E(U|~F) > 0.45 U(--)(~F) + 0.45 U(-)(~F) + 0.1 U(++)(~F) = 0.45(2)+0.45(1)+0.1(-2)=1.15.

Thus the AI will choose not to force heroin, which is the reasonable decision.

The AI learns the truth, and goes wrong

In this alternate setup, a disaster happens before the AI makes its decision: it learns all about humans. It learns their reactions, how they behave, and so on; call this info I. And thus realises that F→a++ and ~F→a-. It uses this information to update its prior P. Only one human utility function will explain this human behaviour: U(++,-). Thus its expected utility is now:

  • E(U|I,F)=U(++,-)(F)=2.
  • E(U|I,~F)=U(++,-)(~F)=1.

Therefore the AI will now choose F, forcing the heroin on the human.

Manipulating the unmanipulatable

What's gone wrong here? The key problem is that the AI has the wrong μ: the human is not behaving rationally in this situation. We know that the the true μ is actually μ', which encodes the fact that F (the forcible injection of heroin) actually overwrites the human's "true" utility. Thus under μ, the corresponding P' has P'(a++|F,U)=1 for all U. Hence the information that F→a++ is now vacuous, and doesn't update the AI's distribution over utility functions.

But note two very important things:

  1. The AI cannot update μ based on observation. All human actions are compatible with μ= "The human is rational" (it just requires more and more complex utilities to explain the actions). Thus getting μ correct is not a problem on which the AI can learn in general. Getting better at predicting the human's actions doesn't make the AI better behaved: it makes it worse behaved.
  2. From the perspective of μ, the AI is treating the human utility function as if it was an unchanging historical fact that it cannot influence. From the perspective of the "true" μ', however, the AI is behaving as if it were actively manipulating human preferences to make them easier to satisfy.

In future posts, I'll be looking at different μ's, and how we might nevertheless start deducing things about them from human behaviour, given sensible update rules for the μ. What do we mean by update rules for μ? Well, we could consider μ to be a single complicated unchanging object, or a distribution of possible simpler μ's that update. The second way of seeing it will be easier for us humans to interpret and understand.

View more: Next