Filter This month

## Announcement: The Sequences eBook will be released in mid-March

47 03 March 2015 01:58AM

The Sequences are being released as an eBook, titled Rationality: From AI to Zombies, on March 12.

We went with the name "Rationality: From AI to Zombies" (based on shminux's suggestion) to make it clearer to people — who might otherwise be expecting a self-help book, or an academic text — that the style and contents of the Sequences are rather unusual. We want to filter for readers who have a wide-ranging interest in (/ tolerance for) weird intellectual topics. Alternative options tended to obscure what the book is about, or obscure its breadth / eclecticism.

#### The book's contents

Around 340 of Eliezer's essays from 2009 and earlier will be included, collected into twenty-six sections ("sequences"), compiled into six books:

1. Map and Territory: sequences on the Bayesian conceptions of rationality, belief, evidence, and explanation.
2. How to Actually Change Your Mind: sequences on confirmation bias and motivated reasoning.
3. The Machine in the Ghost: sequences on optimization processes, cognition, and concepts.
4. Mere Reality: sequences on science and the physical world.
5. Mere Goodness: sequences on human values.
6. Becoming Stronger: sequences on self-improvement and group rationality.

The six books will be released as a single sprawling eBook, making it easy to hop back and forth between different parts of the book. The whole book will be about 1,800 pages long. However, we'll also be releasing the same content as a series of six print books (and as six audio books) at a future date.

The Sequences have been tidied up in a number of small ways, but the content is mostly unchanged. The largest change is to how the content is organized. Some important Overcoming Bias and Less Wrong posts that were never officially sorted into sequences have now been added — 58 additions in all, forming four entirely new sequences (and also supplementing some existing sequences). Other posts have been removed — 105 in total. The following old sequences will be the most heavily affected:

• Map and Territory and Mysterious Answers to Mysterious Questions are being merged, expanded, and reassembled into a new set of introductory sequences, with more focus placed on cognitive biases. The name 'Map and Territory' will be re-applied to this entire collection of sequences, constituting the first book.
• Quantum Physics and Metaethics are being heavily reordered and heavily shortened.
• Most of Fun Theory and Ethical Injunctions is being left out. Taking their place will be two new sequences on ethics, plus the modified version of Metaethics.

I'll provide more details on these changes when the eBook is out.

Unlike the print and audio-book versions, the eBook version of Rationality: From AI to Zombies will be entirely free. If you want to purchase it on Kindle Store and download it directly to your Kindle, it will also be available on Amazon for \$4.99.

To make the content more accessible, the eBook will include introductions I've written up for this purpose. It will also include a LessWrongWiki link to a glossary, which I'll be recruiting LessWrongers to help populate with explanations of references and jargon from the Sequences.

I'll post an announcement to Main as soon as the eBook is available. See you then!

## Can we talk about mental illness?

39 08 March 2015 08:24AM

For a site extremely focused on fixing bad thinking patterns, I've noticed a bizarre lack of discussion here. Considering the high correlation between intelligence and mental illness, you'd think it would be a bigger topic.

I personally suffer from Generalized Anxiety Disorder and a very tame panic disorder. Most of this is focused on financial and academic things, but I will also get panicky about social interaction, responsibilities, and things that happened in the past that seriously shouldn't bother me. I have an almost amusing response to anxiety that is basically my brain panicking and telling me to go hide under my desk.

I know lukeprog and Alicorn managed to fight off a good deal of their issues in this area and wrote up how, but I don't think enough has been done. They mostly dealt with depression. What about rational schizophrenics and phobics and bipolar people? It's difficult to find anxiety advice that goes beyond "do yoga while watching the sunrise!" Pop psych isn't very helpful. I think LessWrong could be. What's mental illness but a wrongness in the head?

Mental illness seems to be worse to intelligent people than your typical biases, honestly. Hiding under my desk is even less useful than, say, appealing to authority during an argument. At least the latter has the potential to be useful. I know it's limiting me, and starting cycles of avoidance, and so much more. And my mental illness isn't even that bad! Trying to be rational and successful when schizophrenic sounds like a Sisyphusian nightmare.

I'm not fighting my difficulties nearly well enough to feel qualified to author my own posts. Hearing from people who are managing is more likely to help. If nothing else, maybe a Rational Support Group would be a lot of fun.

## HPMOR Q&A by Eliezer at Wrap Party in Berkeley [Transcription]

36 16 March 2015 08:54PM

Transcribed from maxikov's posted videos.

Verbal filler removed for clarity.

Audience Laughter denoted with [L], Applause with [A]

Eliezer: So, any questions? Do we have a microphone for the audience?

Guy Offscreen:
We don't have a microphone for the audience, have we?

Some Other Guy: We have this furry thing, wait, no that's not hooked up. Never mind.

Eliezer: Alright, come on over to the microphone.

Guy with 'Berkeley Lab' shirt: So, this question is sort of on behalf of the HPMOR subreddit. You say you don't give red herrings, but like... He's making faces at me like... [L] You say you don't give red herrings, but while he's sitting during in the Quidditch game thinking of who he can bring along, he stares at Cedric Diggory, and he's like, "He would be useful to have at my side!", and then he never shows up. Why was there not a Cedric Diggory?

Eliezer: The true Cedrics Diggory are inside all of our hearts. [L] And in the mirror. [L] And in Harry's glasses. [L] And, well, I mean the notion is, you're going to look at that and think, "Hey, he's going to bring along Cedric Diggory as a spare wand, and he's gonna die! Right?" And then, Lestath Lestrange shows up and it's supposed to be humorous, or something. I guess I can't do humor. [L]

Guy Dressed as a Witch:
Does Quirrell's attitude towards reckless muggle scientists have anything to do with your attitude towards AI researchers that aren't you? [L]

Eliezer: That is unfair. There are at least a dozen safety conscious AI researchers on the face of the earth. [L] At least one of them is respected. [L] With that said, I mean if you have a version of Voldemort who is smart and seems to be going around killing muggleborns, and sort of pretty generally down on muggles... Like, why would anyone go around killing muggleborns? I mean, there's more than one rationalization you could apply to this situation, but the sort of obvious one is that you disapprove of their conduct with nuclear weapons. From Tom Riddle's perspective that is.

I do think I sort of try to never have leakage from that thing I spend all day talking about into a place it really didn't belong, and there's a saying that goes 'A fanatic is someone who cannot change his mind, and will not change the subject.' And I'm like ok, so if I'm not going to change my mind, I'll at least endeavor to be able to change the subject. [L] Like, towards the very end of the story we are getting into the realm where sort of the convergent attitude that any sort of carefully reasoning person will take towards global catastrophic risks, and the realization that you are in fact a complete crap rationalist, and you're going to have to start over and actually try this time. These things are sort of reflective of the story outside the story, but apart from 'there is only one king upon a chessboard', and 'I need to raise the level of my game or fail', and perhaps, one little thing that was said about the mirror of VEC, as some people called it.

Aside from those things I would say that I was treating it more as convergent evolution rather than any sort of attempted parable or Professor Quirrell speaking form me. He usually doesn't... [L] I wish more people would realize that... [L] I mean, you know the... How can I put this exactly. There are these people who are sort of to the right side of the political spectrum and occasionally they tell me that they wish I'd just let Professor Quirrell take over my brain and run my body. And they are literally Republicans for You Know Who. And there you have it basically. Next Question! ... No more questions, ok. [L] I see that no one has any questions left; Oh, there you are.

Fidgety Guy: One of the chapters you posted was the final exam chapter where you had everybody brainstorm solutions to the predicament that Harry was in. Did you have any favorite alternate solution besides the one that made it into the book.

Eliezer: So, not to give away the intended solution for anyone who hasn't reached that chapter yet, though really you're just going to have the living daylight spoiled out of you, there's no way to avoid that really. So, the most brilliant solution I had not thought of at all, was for Harry to precommit to transfigure something that would cause a large explosion visible from the Quidditch stands which had observed no such explosion, thereby unless help sent via Time-Turner showed up at that point, thereby insuring that the simplest timeline was not the one where he never reached the Time-Turner. And assuring that some self-consistent set of events would occur which caused him not to carry through on his precommitment. I, you know, I suspect that I might have ruled that that wouldn't work because of the Unbreakable Vow preventing Harry from actually doing that because it might, in effect, count as trying to destroy that timeline, or filter it, and thereby have that count as trying to destroy the world, or just risk destroying it, or something along those lines, but it was brilliant! [L] I was staring at the computer screen going, "I can't believe how brilliant these people are!" "That's not something I usually hear you say," Brienne said. "I'm not usually watching hundreds of peoples' collective intelligence coming up with solutions way better than anything I thought of!" I replied to her.

And the sort of most fun lateral thinking solution was to call 'Up!' to, or pull Quirinus Quirrell's body over using transfigured carbon nanotubes and some padding, and call 'Up!' and ride away on his broomstick bones. [L] That is definitely going in 'Omake files #5: Collective Intelligence'! Next question!

Guy Wearing Black: So in the chapter with the mirror, there was a point at which Dumbledore had said something like, "I am on this side of the mirror and I always have been." That was never explained that I could tell. I'm wondering if you could clarify that.

Eliezer: It is a reference to the fanfic 'Seventh Horcrux' that *totally* ripped off HPMOR despite being written slightly earlier than it... [L] I was slapping my forehead pretty hard when that happened. Which contains the line "Perhaps Albus Dumbledore really was inside the mirror all along." Sort of arc words as it were. And I also figured that there was simply some by-location effect using one of the advanced settings of the mirror that Dumbledore was using so that the trap would always be springable as opposed to him having to know at what time Tom Riddle would appear before the mirror and be trapped. Next!

Black Guy: So, how did Moody and the rest of them retrieve the items Dumbledore threw in the mirror of VEC?

Eliezer: Dumbledore threw them outside the mirrors range, thereby causing those not to be sealed in the corresponding real world when the duplicate mode of Dumbledore inside the mirror was sealed. So wherever Dumbledore was at the time, probably investigating Nicolas Flamel's house, he suddenly popped away and the line of Merlin Unbroken and the Elder Wand just fell to the floor from where he was.

Asian Guy: In the 'Something to Protect: Severus Snape', you wrote that he laughed. And I was really curious, what exactly does Severus Snape sound like when he laughs. [L]

Person in Audience: Perform for us!

Eliezer: He He He. [L]

Girl in Audience: Do it again now, everybody together!

Audience: He He He. [L]

Guy in Blue Shirt: So I was curious about the motivation between making Sirius re-evil again and having Peter be a good guy again, their relationship. What was the motivation?

Eliezer: In character or out character?

Guy in Blue Shirt: Well, yes. [L]

Eliezer: All right, well, in character Peter can be pretty attractive when he wants to be, and Sirius was a teenager. Or, you were asking about the alignment shift part?

Guy in Blue Shirt: Yeah, the alignment and their relationship.

Eliezer: So, in the alignment, I'm just ruling it always was that way. The whole Sirius Black thing is a puzzle, is the way I'm looking at it. And the canon solution to that puzzle is perfectly fine for a children's book, which I say once again requires a higher level of skill than a grown-up book, but just did not make sense in context. So I was just looking at the puzzle and being like, ok, so what can be the actual solution to this puzzle? And also, a further important factor, this had to happen. There's a whole lot of fanfictions out there of Harry Potter. More than half a million, and that was years ago. And 'Methods of Rationality' is fundamentally set in the universe of Harry Potter fanfiction, more than canon. And in many many of these fanfictions someone goes back in time to redo the seven years, and they know that Scabbers is secretly Peter Pettigrew, and there's a scene where they stun Scabbers the rat and take him over to Dumbledore, and Head Auror, and the Minister of Magic and get them to check out this rat over here, and uncover Peter Pettigrew. And in all the times I had read that scene, at least a dozen times literally, it was never once played out the way it would in real life, where that is just a rat, and you're crazy. [L] And that was the sort of basic seed of, "Ok, we're going to play this straight, the sort of loonier conspiracies are false, but there is still a grain of conspiracy truth to it." And then I introduced the whole accounting of what happened with Sirius Black in the same chapter where Hermione just happens to mention that there's a Metamorphmagus in Hufflepuff, and exactly one person posted to the reviews in chapter 28, based on the clue that the Metamorphmagus had been mentioned in the same chapter, "Aha! I present you the tale of Peter Pettigrew, the unfortunate Metamorphmagus." [L] See! You could've solved it, you could've solved it, but you didn't! Someone solved it, you did not solve that. Next Question!

Guy in White: First, [pulls out wand] Avada Kedavra. How do you feel about your security? [L] Second, have you considered the next time you need a large group of very smart people to really work on a hard problem, presenting it to them in fiction?

Eliezer: So, of course I always keep my Patronus Charm going inside of me. [Aww/L] And if that fails, I do have my amulet that triggers my emergency kitten shield. [L] And indeed one of the higher, more attractive things I'm considering to potentially do for the next major project is 'Precisely Bound Djinn and their Behavior'. The theme of which is you have these people who can summon djinn, or command the djinn effect, and you can sort of negotiate with them in the language of djinn and they will always interpret your wish in the worst way possible, or you can give them mathematically precise orders; Which they can apparently carry out using unlimited computing power, which obviously ends the world in fairly short order, causing our protagonist to be caught in a groundhog day loop as they try over and over again to both maybe arrange for conditions outside to be such that they can get some research done for longer than a few months before the world ends again, and also try to figure out what to tell their djinn. And, you know, I figure that if anyone can give me an unboundedly computable specification of a value aligned advanced agent, the story ends, the characters win, hopefully that person gets a large monetary prize if I can swing it, the world is safer, and I can go onto my next fiction writing project, which will be the one with the boundedly specified [L] value aligned advanced agents. [A]

Guy with Purple Tie: So, what is the source of magic?

Eliezer: Alright, so, there was a bit of literary miscommunication in HPMOR. I tried as hard as I could to signal that unraveling the true nature of magic and everything that adheres in it is actually this kind of this large project that they were not going to complete during Harry's first year of Hogwarts. [L] You know, 35 years, even if someone is helping you is a reasonable amount of time for a project like that to take. And if it's something really difficult, like AIs, you might need more that two people even. [L] At least if you want the value aligned version. Anyway, where was I?

So the only way I think that fundamentally to come up with a non-nitwit explanation of magic, you need to get started from the non-nitwit explanation, and then generate the laws of magic, so that when you reveal the answer behind the mystery, everything actually fits with it. You may have noticed this kind of philosophy showing up elsewhere in the literary theory of HPMOR at various points where it turns out that things fit with things you have already seen. But with magic, ultimately the source material was not designed as a hard science fiction story. The magic that we start with as a phenomenon is not designed to be solvable, and what did happen was that the characters thought of experiments, and I in my role of the universe thought of the answer to it, and if they had ever reached the point where there was only one explanation left, then the magic would have had rules, and they would have been arrived at in a fairly organic way that I could have felt good about; Not as a sudden, "Aha! I gotcha! I revealed this thing that you had no way of guessing."

Now I could speculate. And I even tried to write a little section where Harry runs into Dumbledore's writings that Dumbledore left behind, where Dumbledore writes some of his own speculation, but there was no good place to put that into the final chapter. But maybe I'll later be able... The final edits were kind of rushed honestly, sleep deprivation, 3am. But maybe in the second edit or something I'll be able to put that paragraph, that set of paragraphs in there. In Dumbledore's office, Dumbledore has speculated. He's mostly just taking the best of some of the other writers that he's read. That, look at the size of the universe, that seems to be mundane. Dumbledore was around during World War 2, he does know that muggles have telescopes. He has talked with muggle scientists a bit and those muggle scientists seem very confident that all the universe they can see looks like it's mundane. And Dumbledore wondered, why is there this sort of small magical section, and this much larger mundane section, or this much larger muggle section? And that seemed to Dumbledore to suggest that as a certain other magical philosopher had written, If you consider the question, what is the underlying nature of reality, is it that it was mundane to begin with, and then magic arises from mundanity, or is the universe magic to begin with, and then mundanity has been imposed above it? Now mundanity by itself will clearly never give rise to magic, yet magic permits mundanity to be imposed, and so, this other magical philosopher wrote, therefore he thinks that the universe is magical to begin with and the mundane sections are imposed above the magic. And Dumbledore himself had speculated, having been antiquated with the line of Merlin for much of his life, that just as the Interdict of Merlin was imposed to restrict the spread an the number of people who had sufficiently powerful magic, perhaps the mundane world itself, is an attempt to bring order to something that was on the verge of falling apart in Atlantis, or in whatever came before Atlantis. Perhaps the thing that happened with the Interdict of Merlin has happened over and over again. People trying to impose law upon reality, and that law having flaws, and the flaws being more and more exploited until they reach a point of power that recons to destroy the world, and the most adapt wielders of that power try to once again impose mundanity.

And I will also observe, although Dumbledore had no way of figuring this out, and I think Harry might not have figured it out yet because he dosen't yet know about chromosomal crossover, That if there is no wizard gene, but rather a muggle gene, and the muggle gene sometimes gets hit by cosmic rays and ceases to function thereby producing a non-muggle allele, then some of the muggle vs. wizard alleles in the wizard population that got there from muggleborns will be repairable via chromosomal crossover, thus sometimes causing two wizards to give birth to a squib. Furthermore this will happen more frequently in wizards who have recent muggleborn ancestry. I wonder if Lucius told Draco that when Draco told him about Harry's theory of genetics. Anyway, this concludes my strictly personal speculations. It's not in the text, so it's not real unless it's in the text somewhere. 'Opinion of God', Not 'Word of God'. But this concludes my personal speculations on the origin of magic, and the nature of the "wizard gene". [A]

## Political topics attract participants inclined to use the norms of mainstream political debate, risking a tipping point to lower quality discussion

34 26 March 2015 12:14AM

(I hope that is the least click-baity title ever.)

Political topics elicit lower quality participation, holding the set of participants fixed. This is the thesis of "politics is the mind-killer".

Here's a separate effect: Political topics attract mind-killed participants. This can happen even when the initial participants are not mind-killed by the topic.

Since outreach is important, this could be a good thing. Raise the sanity water line! But the sea of people eager to enter political discussions is vast, and the epistemic problems can run deep. Of course not everyone needs to come perfectly prealigned with community norms, but any community will be limited in how robustly it can handle an influx of participants expecting a different set of norms. If you look at other forums, it seems to take very little overt contemporary political discussion before the whole place is swamped, and politics becomes endemic. As appealing as "LW, but with slightly more contemporary politics" sounds, it's probably not even an option. You have "LW, with politics in every thread", and "LW, with as little politics as we can manage".

That said, most of the problems are avoided by just not saying anything that patterns matches too easily to current political issues. From what I can tell, LW has always had tons of meta-political content, which doesn't seem to cause problems, as well as standard political points presented in unusual ways, and contrarian political opinions that are too marginal to raise concern. Frankly, if you have a "no politics" norm, people will still talk about politics, but to a limited degree. But if you don't even half-heartedly (or even hypocritically) discourage politics, then a open-entry site that accepts general topics will risk spiraling too far in a political direction.

As an aside, I'm not apolitical. Although some people advance a more sweeping dismissal of the importance or utility of political debate, this isn't required to justify restricting politics in certain contexts. The sort of the argument I've sketched (I don't want LW to be swamped by the worse sorts of people who can be attracted to political debate) is enough. There's no hypocrisy in not wanting politics on LW, but accepting political talk (and the warts it entails) elsewhere. Of the top of my head, Yvain is one LW affiliate who now largely writes about more politically charged topics on their own blog (SlateStarCodex), and there are some other progressive blogs in that direction. There are libertarians and right-leaning (reactionary? NRx-lbgt?) connections. I would love a grand unification as much as anyone, (of course, provided we all realize that I've been right all along), but please let's not tell the generals to bring their armies here for the negotiations.

## A map of LWers - find members of the community living near you.

33 13 March 2015 05:58PM

There seems to be a lot of enthusiasm around LessWrong meetups, so I thought something like this might be interesting too. There is no need to register - just add your marker and keep an eye out for someone living near you.

I posted this on an Open Thread first. Below are some observations based on the previous discussion:

When creating a new marker you will be given a special URL you can use to edit it later. If you lose it, you can create a new one and ask me to delete the old marker. Try not to lose it though.

If someone you tried to contact is unreachable, notify me and I'll delete the marker in order to keep the map tidy. Also, try to keep your own marker updated.

It was suggested that it would be a good idea to circulate the map around survey time. I'll try to remind everyone to update their markers around that time. Any major changes (e.g. changing admin, switching services, remaking the map to eliminate dead markers) will also happen then.

The map data can be exported by anyone, so there's no need to start over if I disappear or whatever.

Edit: Please, you have to make it possible to contact you. If you choose to use a name that doesn't match your LW account, you have to add an email address or equivalent. If you don't do that, it is assumed that the name on the marker is your username here, but if it isn't you are essentially unreachable and will be removed.

## Rationality: From AI to Zombies online reading group

31 21 March 2015 09:54AM

On Wednesday, 15 April 2015, just under a month out from this posting, I will hold the first session of an online reading group for the ebook Rationality: From AI to Zombies, a compilation of the LessWrong sequences by our own Eliezer Yudkowsky. I would like to model this on the very successful Superintelligence reading group led by

join with others to ask questions, discuss ideas, and probe the arguments more deeply. It is intended to add to the experience of reading the sequences in their new format or for the first time. It is intended to supplement discussion that has already occurred the original postings and the sequence reruns.

The reading group will 'meet' on a semi-monthly post on the LessWrong discussion forum. For each 'meeting' we will read one sequence from the the Rationality book, which contains a total of 26 lettered sequences. A few of the sequences are unusually long, and these might be split into two sessions. If so, advance warning will be given.

In each posting I will briefly summarize the salient points of the essays comprising the sequence, link to the original articles and discussion when possible, attempt to find, link to, and quote one or more related materials or opposing viewpoints from outside the text, and present a half-dozen or so question prompts to get the conversation rolling. Discussion will take place in the comments. Others are encouraged to provide their own question prompts or unprompted commentary as well.

We welcome both newcomers and veterans on the topic. If you've never read the sequences, this is a great opportunity to do so. If you are an old timer from the Overcoming Bias days then this is a chance to share your wisdom and perhaps revisit the material with fresh eyes. All levels of time commitment are welcome.

If this sounds like something you want to participate in, then please grab a copy of the book and get started reading the preface, introduction, and the 10 essays / 42 pages which comprise Part A: Predictably Wrong. The first virtual meeting (forum post) covering this material will go live before 6pm Wednesday PDT (1am Thursday UTC), 15 April 2015. Successive meetings will start no later than 6pm PDT on the first and third Wednesdays of a month.

Following this schedule it is expected that it will take just over a year to complete the entire book. If you prefer flexibility, come by any time! And if you are coming upon this post from the future, please feel free leave your opinions as well. The discussion period never closes.

Topic for the first week is the preface by Eliezer Yudkowsky, the introduction by Rob Bensinger, and Part A: Predictably Wrong, a sequence covering rationality, the search for truth, and a handful of biases.

## Slate Star Codex: alternative comment threads on LessWrong?

24 27 March 2015 09:05PM

Like many Less Wrong readers, I greatly enjoy Slate Star Codex; there's a large overlap in readership. However, the comments there are far worse, not worth reading for me. I think this is in part due to the lack of LW-style up and downvotes. Have there ever been discussion threads about SSC posts here on LW? What do people think of the idea occasionally having them? Does Scott himself have any views on this, and would he be OK with it?

Update:

The latest from Scott:

I'm fine with anyone who wants reposting things for comments on LW, except for posts where I specifically say otherwise or tag them with "things i will regret writing"

In this thread some have also argued for not posting the most hot-button political writings.

Would anyone be up for doing this? Ataxerxes started with "Extremism in Thought Experiments is No Vice"

## Defeating the Villain

23 26 March 2015 09:43PM

We have a recurring theme in the greater Less Wrong community that life should be more like a high fantasy novel. Maybe that is to be expected when a quarter of the community came here via Harry Potter fanfiction, and we also have rationalist group houses named after fantasy locations, descriptions of community members in terms of character archetypes and PCs versus NPCs, semi-serious development of the new atheist gods, and feel free to contribute your favorites in the comments.

A failure mode common to high fantasy novels as well as politics is solving all our problems by defeating the villain. Actually, this is a common narrative structure for our entire storytelling species, and it works well as a narrative structure. The story needs conflict, so we pit a sympathetic protagonist against a compelling antagonist, and we reach a satisfying climax when the two come into direct conflict, good conquers evil, and we live happily ever after.

This isn't an article about whether your opponent really is a villain. Let's make the (large) assumption that you have legitimately identified a villain who is doing evil things. They certainly exist in the world. Defeating this villain is a legitimate goal.

And then what?

Defeating the villain is rarely enough. Building is harder than destroying, and it is very unlikely that something good will spontaneously fill the void when something evil is taken away. It is also insufficient to speak in vague generalities about the ideals to which the post-[whatever] society will adhere. How are you going to avoid the problems caused by whatever you are eliminating, and how are you going to successfully transition from evil to good?

In fantasy novels, this is rarely an issue. The story ends shortly after the climax, either with good ascending or time-skipping to a society made perfect off-camera. Sauron has been vanquished, the rightful king has been restored, cue epilogue(s). And then what? Has the Chosen One shown skill in diplomacy and economics, solving problems not involving swords? What was Aragorn's tax policy? Sauron managed to feed his armies from a wasteland; what kind of agricultural techniques do you have? And indeed, if the book/series needs a sequel, we find that a problem at least as bad as the original fills in the void.

Reality often follows that pattern. Marx explicitly had no plan for what happened after you smashed capitalism. Destroy the oppressors and then ... as it turns out, slightly different oppressors come in and generally kill a fair percentage of the population. It works on the other direction as well; the fall of Soviet communism led not to spontaneous capitalism but rather kleptocracy and Vladmir Putin. For most of my lifetime, a major pillar of American foreign policy has seemed to be the overthrow of hostile dictators (end of plan). For example, Muammar Gaddafi was killed in 2011, and Libya has been in some state of unrest or civil war ever since. Maybe this is one where it would not be best to contribute our favorites in the comments.

This is not to say that you never get improvements that way. Aragorn can hardly be worse than Sauron. Regression to the mean perhaps suggests that you will get something less bad just by luck, as Putin seems clearly less bad than Stalin, although Stalin seems clearly worse than almost any other regime change in history. Some would say that causing civil wars in hostile countries is the goal rather than a failure of American foreign policy, which seems a darker sort of instrumental rationality.

Human flourishing is not the default state of affairs, temporarily suppressed by villainy. Spontaneous order is real, but it still needs institutions and social technology to support it.

Defeating the villain is a (possibly) necessary but (almost certainly) insufficient condition for bringing about good.

One thing I really like about this community is that projects tend to be conceived in the positive rather than the negative. Please keep developing your plans not only in terms of "this is a bad thing to be eliminated" but also "this is a better thing to be created" and "this is how I plan to get there."

## Future of Life Institute existential risk news site

21 19 March 2015 02:33PM

I'm excited to announce that the Future of Life Institute has just launched an existential risk news site!

The site will have regular articles on topics related to existential risk, written by journalists, and a community blog written by existential risk researchers from around the world as well as FLI volunteers. Enjoy!

17 27 February 2015 07:26PM

It has long been known that algorithms out-perform human experts on a range of topics (here's a LW post on this by lukeprog). Why, then, is it that people continue to mistrust algorithms, in spite of their superiority, and instead cling to human advice? A recent paper by Dietvorst, Simmons and Massey suggests it is due to a cognitive bias which they call algorithm aversion. We judge less-than-perfect algorithms more harshly than less-than-perfect humans. They argue that since this aversion leads to poorer decisions, it is very costly, and that we therefore must find ways of combating it.

Abstract:

Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.

General discussion:

The results of five studies show that seeing algorithms err makes people less confident in them and less likely to choose them over an inferior human forecaster. This effect was evident in two distinct domains of judgment, including one in which the human forecasters produced nearly twice as much error as the algorithm. It arose regardless of whether the participant was choosing between the algorithm and her own forecasts or between the algorithm and the forecasts of a different participant. And it even arose among the (vast majority of) participants who saw the algorithm outperform the human forecaster.
The aversion to algorithms is costly, not only for the participants in our studies who lost money when they chose not to tie their bonuses to the algorithm, but for society at large. Many decisions require a forecast, and algorithms are almost always better forecasters than humans (Dawes, 1979; Grove et al., 2000; Meehl, 1954). The ubiquity of computers and the growth of the “Big Data” movement (Davenport & Harris, 2007) have encouraged the growth of algorithms but many remain resistant to using them. Our studies show that this resistance at least partially arises from greater intolerance for error from algorithms than from humans. People are more likely to abandon an algorithm than a human judge for making the same mistake. This is enormously problematic, as it is a barrier to adopting superior approaches to a wide range of important tasks. It means, for example, that people will more likely forgive an admissions committee than an admissions algorithm for making an error, even when, on average, the algorithm makes fewer such errors. In short, whenever prediction errors are likely—as they are in virtually all forecasting tasks—people will be biased against algorithms.
More optimistically, our findings do suggest that people will be much more willing to use algorithms when they do not see algorithms err, as will be the case when errors are unseen, the algorithm is unseen (as it often is for patients in doctors’ offices), or when predictions are nearly perfect. The 2012 U.S. presidential election season saw people embracing a perfectly performing algorithm. Nate Silver’s New York Times blog, Five Thirty Eight: Nate Silver’s Political Calculus, presented an algorithm for forecasting that election. Though the site had its critics before the votes were in— one Washington Post writer criticized Silver for “doing little more than weighting and aggregating state polls and combining them with various historical assumptions to project a future outcome with exaggerated, attention-grabbing exactitude” (Gerson, 2012, para. 2)—those critics were soon silenced: Silver’s model correctly predicted the presidential election results in all 50 states. Live on MSNBC, Rachel Maddow proclaimed, “You know who won the election tonight? Nate Silver,” (Noveck, 2012, para. 21), and headlines like “Nate Silver Gets a Big Boost From the Election” (Isidore, 2012) and “How Nate Silver Won the 2012 Presidential Election” (Clark, 2012) followed. Many journalists and popular bloggers declared Silver’s success a great boost for Big Data and statistical prediction (Honan, 2012; McDermott, 2012; Taylor, 2012; Tiku, 2012).
However, we worry that this is not such a generalizable victory. People may rally around an algorithm touted as perfect, but we doubt that this enthusiasm will generalize to algorithms that are shown to be less perfect, as they inevitably will be much of the time.

## Summary and Lessons from "On Combat"

16 22 March 2015 01:48AM

On Combat - The Psychology and hysiology of Deadly Conflict in War and in Peace by Lt. Col. Dave Grossman and Loren W. Christensen (third edition from 2007) is a well-written, evidence-based book about the reality of human behaviour in life-threatening situations. It is comprehensive (400 pages), provides detailed descriptions, (some) statistics as well as first-person recounts, historical context and other relevant information. But my main focus in this post is in the advice it gives and what lessons the LessWrong community may take from it.

### TL;DR

In deadly force encounters you will experience and remember the most unusual physiological and psychological things. Innoculate yourself against extreme stress with repeated authentic training; play win-only paintball, train 911-dialing and -reporting. Train combat breathing. Talk to people after traumatic events.

## Best Explainers on Different Subjects

16 18 March 2015 08:32PM

There are many recommended reading threads on lesswrong. Some examples include: MathTextbooks and Rationality.

I am looking to compile another such thread, this time aimed at "exceptional explainers" and their works. For example, I find Richard Feynman's QED: The Strange Theory of Light and Matter to be one such book.

Please list out other authors and books which you think are wonderfully written, in such a way that maximizes communication and explanation to laypeople in the given field. For example:

Physics: Richard Feynman - QED: The Strange Theory of Light and Matter.

Thank you,

Jeremy

## Postdoctoral research positions at CSER (Cambridge, UK)

15 26 March 2015 05:59PM

[To be cross-posted at Effective Altruism Forum, FLI news page]

I'm delighted to announce that the Centre for the Study of Existential Risk has had considerable recent success in grantwriting and fundraising, among other activities (full update coming shortly). As a result, we are now in a position to advance to CSER's next stage of development: full research operations. Over the course of this year, we will be recruiting for a full team of postdoctoral researchers to work on a combination of general methodologies for extreme technological (and existential) risk analysis and mitigation, alongside specific technology/risk-specific projects.

Our first round of recruitment has just opened - we will be aiming to hire up to 4 postdoctoral researchers; details below. A second recruitment round will take place in the Autumn. We have a slightly unusual opportunity in that we get to cast our net reasonably wide. We have a number of planned research projects (listed below) that we hope to recruit for. However, we also have the flexibility to hire one or more postdoctoral researchers to work on additional projects relevant to CSER's aims. Information about CSER's aims and core research areas is available on our website. We request that as part of the application process potential postholders send us a research proposal of no more than 1500 words, explaining what your research skills could contribute to CSER. At this point in time, we are looking for people who will have obtained a doctorate in a relevant discipline by their start date.

We would also humbly ask that the LessWrong community aid us in spreading the word far and wide about these positions. There are many brilliant people working within the existential risk community. However, there are academic disciplines and communities that have had less exposure to existential risk as a research priority than others (due to founder effect and other factors), but where there may be people with very relevant skills and great insights. With new centres and new positions becoming available, we have a wonderful opportunity to grow the field, and to embed existential risk as a crucial consideration in all relevant fields and disciplines.

Thanks very much,

Seán Ó hÉigeartaigh (Executive Director, CSER)

"The Centre for the Study of Existential Risk (University of Cambridge, UK) is recruiting for to four full-time postdoctoral research associates to work on the project Towards a Science of Extreme Technological Risk.

We are looking for outstanding and highly-committed researchers, interested in working as part of growing research community, with research projects relevant to any aspect of the project. We invite applicants to explain their project to us, and to demonstrate their commitment to the study of extreme technological risks.

We have several shovel-ready projects for which we are looking for suitable postdoctoral researchers. These include:

• Ethics and evaluation of extreme technological risk (ETR) (with Sir Partha Dasgupta;
• Horizon-scanning and foresight for extreme technological risks (with Professor William Sutherland);
• Responsible innovation and extreme technological risk (with Dr Robert Doubleday and the Centre for Science and Policy).

However, recruitment will not necessarily be limited to these subprojects, and our main selection criterion is suitability of candidates and their proposed research projects to CSER’s broad aims.

Details are available here. Closing date: April 24th."

## PredictIt, a prediction market out of New Zealand, now in beta.

15 16 March 2015 02:02AM

From their website:

PredictIt is an exciting new, real money site that tests your knowledge of political and financial events by letting you make and trade predictions on the future.

Taking part in PredictIt is simple and easy. Pick an event you know something about and see what other traders believe is the likelihood it will happen. Do you think they have it right? Or do you think you have the knowledge to beat the wisdom of the crowd?

The key to success at PredictIt is timing. Make your predictions when most people disagree with you and the price is low. When it turns out that your view may be right, the value of your predictions will rise. You’ll need to choose the best time to sell!

Keep in mind that, although the stakes are limited, PredictIt involves real money so the consequences of being wrong can be painful. Of course, winning can also be extra sweet.

For detailed instructions on participating in PredictIt, How It Works.

PredictIt is an educational purpose project of Victoria University, Wellington of New Zealand, a not-for-profit university, with support provided by Aristotle International, Inc., a U.S. provider of processing and verification services. Prediction markets, like this one, are attracting a lot of academic and practical interest (see our Research section). So, you get to challenge yourself and also help the experts better understand the wisdom of the crowd.

## Compilation of currently existing project ideas to significantly impact the world

15 08 March 2015 04:59AM

One of the problems the LW, EA, CFAR X-risk community has been faced with recently discussed on Slate Star Codex is the absorption of people interested in researching, volunteering, helping, participating in the community. A problem worth subdividing into how to get new people into the social community, which is addressed on the link above, and separate problem, absorbing their skills, ability, and willingness to volunteer, to which this post is dedicated:

What should specific person Smith do to help in the project of preventing X-risk, improving the world, saving lives? We assume here Smith will not be a donor - in which case the response would be "donate" -  joined the community not long ago and has a skill set X.

Soon this problem will become worse due to influx of more people brought in by the soon to be published books by MacAskill, Yudkowsky and Singer coming out. There will be more people wanting to do something, and able to do some sorts of projects, but who are not being allocated any specific project that matches their skill set and values.

Now is a good time to solve it. I was talking about this problem today with Stephen Frey and we considered it would be a good idea to have a list of specific technical or research projects that can be broken down into smaller chunks for people to pick up and do. A Getting Things Done list for researchers and technology designers. Preferably those would be tractable projects that can be done in fewer than three months. There are some lists of open problems in AI and Superintelligence control, but not for many X-risks or other problems that some of the community frequently considers important.

So I decided to make a compilation of the questions and problems we already have listed here, and then ask someone (Oliver Habryka or a volunteer in the comment section here) to transform the compiled set into a standardized format.

A tentative category list

Area: X-risk, AI, Anti-aging, Cryonics, Rationality, IA, Self-Improvement, Differential Technological Development, Strategy, Civilizational Inadequacy, etc... describes what you have to value/disvalue in order for this project to match your values.

Project: description of which actions need to be taken in 3 month period for this project to be considered complete.

Context: if part of a larger project, which is it, and how will it connect to other parts. Also justification for that project.

Notes: any relevant constraints that may play a role, time-sensitivity, costs, number of people, location, etc...

For example at Luke's list of Superintelligence research questions, the first one:

1. How strongly does IQ predict rationality, metacognition, and philosophical sophistication, especially in the far right tail of the IQ distribution? Relevant to the interaction of intelligence amplification and FAI chances. See the project guide here.

Would be rendered as

Area: FAI ; Project:  Read Rationality and the Reflective Mind, by Keith Stanovich, to become familiar with the model of algorithmic and reflective minds. For this project, investigating metacognition means investigating the reflective mind. Find ways to test Stanovich’s predictions and answer the questions in the previous section. Design the study to give participants tests which high a IQ should help with and tests which a high IQ should not help with. This step will involve searching through Rationality and the Reflective Mind, and then directly contacting Stanovich to ask which tests he has not yet conducted. Context: this is the first of two part sub-study investigating IQ and metacognition, and needs being followed by conducting a new study investigating the correlation. These parts are complimentary with the study of IQ and philosophical success, and are relevant to assess the impact that intelligence augmentation will have in our likelihood of generating Friendly Artificial Intelligence. Notes: needs to be conducted by someone with a researcher affiliation and capacity to conduct a study on human subjects later, six month commitment, some science writing experience.

Edit: Here is a file where to start compiling projects - thanks Stephen!

This is the idea.To gather a comprehensive list of research or technical questions for the areas above, transform them into projects that can be more easily parsed and assigned than their currently scattered counterparts and make the list available to those who want to work on them. This post is the first step in collection, so if there are lists anywhere of projects, or research questions that may be relevant for any of the areas cited above, please post a link to these at the comments - special kudos if you already post it in the format above. Also let me know if you would like to volunteer in this. If you remember any question or specific project but don't see it in any list or on the comments, post it. When we create a standardized list for people to look through it will be separated by area, so people can visualize only projects related to what they value.

### Compilation:

Lists of ideas and projects:

Superintelligence Strategic List - Muelhauser

Mechanisms of Aging - Ben Best

Cryonics Strategy Space - Froolow

Ideas and projects:

Go to Mars - Musk

Make it easy for people within the community to move to US, UK.

Preserve Brains

Find moral enhancers that improve global cooperation as well as intra-group cooperation

Open Borders

...

...

14 12 March 2015 05:46PM

BBC article

I'm sure I'm not the only one who greatly admired him. The theme of his stories was progress; they were set in a fantasy world, it's true, but one that was frequently a direct analogy to our own past, and where the golden age was always right now. The recent books made this ever more obvious.

We have lost a great man today, but it's the way he died that makes me uncomfortable. Terry Pratchett had early-onset Alzheimer's, and while I doubt it would have mattered, he couldn't have chosen cryonics even if he wanted to. He campaigned for voluntary euthanasia in cases like his. I will refrain from speculating on whether his unexpected death was wholly natural; whether it was or wasn't, I can't see this having a better outcome. In short...

There is, for each of us, a one-ninth chance of developing Alzheimer's if we live long enough. Many of us may have relatives that are already showing signs, and in the current regime these relatives cannot be cryonically stored even if they wish to try; by the time they die, there will be little purpose in doing so. For cryonics to help for neurodegenerative disorders, it needs to be applied before they become fatal.

Is there anything we can do to change that? Are there countries in which that generalisation is false?

## Discussion of Slate Star Codex: "Extremism in Thought Experiments is No Vice"

13 28 March 2015 09:17AM

Phil Robertson is being criticized for a thought experiment in which an atheist’s family is raped and murdered. On a talk show, he accused atheists of believing that there was no such thing as objective right or wrong, then continued:

I’ll make a bet with you. Two guys break into an atheist’s home. He has a little atheist wife and two little atheist daughters. Two guys break into his home and tie him up in a chair and gag him.

Then they take his two daughters in front of him and rape both of them and then shoot them, and they take his wife and then decapitate her head off in front of him, and then they can look at him and say, ‘Isn’t it great that I don’t have to worry about being judged? Isn’t it great that there’s nothing wrong with this? There’s no right or wrong, now, is it dude?’

Then you take a sharp knife and take his manhood and hold it in front of him and say, ‘Wouldn’t it be something if [there] was something wrong with this? But you’re the one who says there is no God, there’s no right, there’s no wrong, so we’re just having fun. We’re sick in the head, have a nice day.’

The media has completely proportionally described this as Robinson “fantasizing about” raping atheists, and there are the usual calls for him to apologize/get fired/be beheaded.

So let me use whatever credibility I have as a guy with a philosophy degree to confirm that Phil Robertson is doing moral philosophy exactly right.

_____

This is a LW discussion post for Yvain's blog posts at Slate Star Codex, as per tog's suggestion:

Like many Less Wrong readers, I greatly enjoy Slate Star Codex; there's a large overlap in readership. However, the comments there are far worse, not worth reading for me. I think this is in part due to the lack of LW-style up and downvotes. Have there ever been discussion threads about SSC posts here on LW? What do people think of the idea occasionally having them? Does Scott himself have any views on this, and would he be OK with it?

Scott/Yvain's permission to repost on LW was granted (from facebook):

I'm fine with anyone who wants reposting things for comments on LW, except for posts where I specifically say otherwise or tag them with "things i will regret writing"

## Negative visualization, radical acceptance and stoicism

13 27 March 2015 03:51AM

In anxious, frustrating or aversive situations, I find it helpful to visualize the worst case that I fear might happen, and try to accept it. I call this “radical acceptance”, since the imagined worst case is usually an unrealistic scenario that would be extremely unlikely to happen, e.g. “suppose I get absolutely nothing done in the next month”. This is essentially the negative visualization component of stoicism. There are many benefits to visualizing the worst case:

• Feeling better about the present situation by contrast.
• Turning attention to the good things that would still be in my life even if everything went wrong in one particular domain.
• Weakening anxiety using humor (by imagining an exaggerated “doomsday” scenario).
• Being more prepared for failure, and making contingency plans (pre-hindsight).
• Helping make more accurate predictions about the future by reducing the “X isn’t allowed to happen” effect (or, as Anna Salamon once put it, “putting X into the realm of the thinkable”).
• Reducing the effect of ugh fields / aversions, which thrive on the “X isn’t allowed to happen” flinch.
• Weakening unhelpful identities like “person who is always productive” or “person who doesn’t make stupid mistakes”.

Let’s say I have an aversion around meetings with my advisor, because I expect him to be disappointed with my research progress. When I notice myself worrying about the next meeting or finding excuses to postpone it so that I have more time to make progress, I can imagine the worst imaginable outcome a meeting with my advisor could have - perhaps he might yell at me or even decide to expel me from grad school (neither of these have actually happened so far). If the scenario is starting to sound silly, that’s a good sign. I can then imagine how this plays out in great detail, from the disappointed faces and words of the rest of the department to the official letter of dismissal in my hands, and consider what I might do in that case, like applying for industry jobs. While building up these layers of detail in my mind, I breathe deeply, which I associate with meditative acceptance of reality. (I use the word “acceptance” to mean “acknowledgement” rather than “resignation”.)

I am trying to use this technique more often, both in the regular and situational sense. A good default time is my daily meditation practice. I might also set up a trigger-action habit of the form “if I notice myself repeatedly worrying about something, visualize that thing (or an exaggerated version of it) happening, and try to accept it”. Some issues have more natural triggers than others - while worrying tends to call attention to itself, aversions often manifest as a quick flinch away from a thought, so it’s better to find a trigger among the actions that are often caused by an aversion, e.g. procrastination. A trigger for a potentially unhelpful identity could be a thought like “I’m not good at X, but I should be”. A particular issue can simultaneously have associated worries (e.g. “will I be productive enough?”), aversions (e.g. towards working on the project) and identities (“productive person”), so there is likely to be something there that makes a good trigger. Visualizing myself getting nothing done for a month can help with all of these to some degree.

System 1 is good at imagining scary things - why not use this as a tool?

Cross-posted

## Best of Rationality Quotes, 2014 Edition

13 27 February 2015 10:43PM

Here is the way-too-late 2014 edition of the Best of Rationality Quotes collection. (Here is last year's.) Thanks Huluk for nudging me to do it.

Best of Rationality Quotes 2014 (300kB page, 235 quotes)
and Best of Rationality Quotes 2009-2014 (1900kB page, 1770 quotes)

The page was built by a short script (source code here) from all the LW Rationality Quotes threads so far. (We had such a thread each month since April 2009.) The script collects all comments with karma score 10 or more, and sorts them by score. Replies are not collected, only top-level comments.

As is now usual, I provide various statistics and top-lists based on the data. (Source code for these is also at the above link, see the README.) I added these as comments to the post:

## Request for Steelman: Non-correspondence concepts of truth

12 24 March 2015 03:11AM

A couple of days ago, Buybuydandavis wrote the following on Less Wrong:

I'm increasingly of the opinion that truth as correspondence to reality is a minority orientation.

I've spent a lot of energy over the last couple of days trying to come to terms with the implications of this sentence.  While it certainly corresponds with my own observations about many people, the thought that most humans simply reject correspondence to reality as the criterion for truth seems almost too outrageous to take seriously.  If upon further reflection I end up truly believing this, it seems  that it would be impossible for me to have a discussion about the nature of reality with the great majority of the human race.  In other words, if I truly believed this, I would label most people as being too stupid to have a real discussion with.

However, this reaction seems like an instance of a failure mode described by Megan McArdle:

I’m always fascinated by the number of people who proudly build columns, tweets, blog posts or Facebook posts around the same core statement: “I don’t understand how anyone could (oppose legal abortion/support a carbon tax/sympathize with the Palestinians over the Israelis/want to privatize Social Security/insert your pet issue here)." It’s such an interesting statement, because it has three layers of meaning.

The first layer is the literal meaning of the words: I lack the knowledge and understanding to figure this out. But the second, intended meaning is the opposite: I am such a superior moral being that I cannot even imagine the cognitive errors or moral turpitude that could lead someone to such obviously wrong conclusions. And yet, the third, true meaning is actually more like the first: I lack the empathy, moral imagination or analytical skills to attempt even a basic understanding of the people who disagree with me

In short, “I’m stupid.” Something that few people would ever post so starkly on their Facebook feeds.

With this background, it seems important to improve my model of people who reject correspondence as the criterion for truth.  The obvious first place to look is in academic philosophy.  The primary challenger to correspondence theory is called “coherence theory”. If I understand correctly, coherence theory says that a statement is true iff it is logically consistent with “some specified set of sentences”

Coherence is obviously an important concept, which has valuable uses for example in formal systems. It does not capture my idea of what the word “truth” means, but that is purely a semantics issue. I would be willing to cede the word “truth” to the coherence camp if we agreed on a separate word we could use to mean “correspondence to reality”.   However, my intuition is that they wouldn't let us to get away with this. I sense that there are people out there who genuinely object to the very idea of discussing whether a sentences correspond to reality.

So it seems I have a couple of options:

1. I can look for empirical evidence that buybuydandavis is wrong, ie that most people accept correspondence to reality as the criterion for truth

2. I can try to convince people to use some other word for correspondence to reality, so they have the necessary semantic machinery to have a real discussion about what reality is like

3. I can accept that most people are unable to have a discussion about the nature of reality

4. I can attempt to steelman the position that truth is something other than correspondence

Option 1 appears unlikely to be true. Option 2 seems unlikely to work.  Option 3 seems very unattractive, because it would be very uncomfortable to have discussions that on the surface appear to be about the nature of reality, but which really are about something else, where the precise value of "something else" is unknown to me.

I would therefore be very interested in a steelman of non-correspondence concepts of truth. I think it would be important not only for me, but also for the rationalist community as a group, to get a more accurate model of how non-rationalists think about "truth"

## Just a casual question regarding MIRI

12 22 March 2015 08:16PM

Currently I am planning to start a mathematics degree when I enter university, however my interest has shifted largely to computational neuroscience and related fields, so I'm now planning to switch to an AI degree when I go to study. Having said that, MIRI has always posed interesting problems to me, and I have entertained the thought of trying to do some work for MIRI before. And so my question boils down to this: Would there be any problem with taking the AI degree if I ever wanted to try my hand at doing some math for MIRI? Is a maths degree essential or would an AI degree with a good grasp on mathematics related to MIRI work just as well? Any thoughts or musings would be appreciated :)

## Experimental EA funding [crosspost]

12 15 March 2015 07:48PM

Over the course of 2015, we will be distributing \$10,000 to completed projects which we believe will have a significant long-term humanitarian impact.

These awards are being made in exchange for certificates of impact. Here's how it works: you tell us about something good you did. We offer you some money. Rather than considering a complicated counterfactual ("How well will this money be spent if I don't take it?"), we encourage you to accept our offer if and only if you would be willing to undo the humanitarian impact of your project in exchange for the money. For more details, see here.

I originally posted this at the EA forum, but it may also be of interest to people here. We are open to funding writing or research on many perennial LW topics (methodological issues, small experiments, lifehacks, useful futurism, etc.).

Why are we buying certificates instead of making grants? Just as market prices help coordinate and incentivize the efficient production of commercial products, they could also help coordinate and incentivize efficient altruism. We also think that paying for performance after the fact has a number of big advantages. Not convinced yet? See a more complete answer.

Applications will include an asking price, the minimum amount of money that would be enough to compensate you for undoing the humanitarian impact of the project. The actual awards will be determined by combining the asking prices with ourimpact assessments in a (truthful) auction. Instead of buying 100% of your project's impact, we'll buy some a fraction less than 50% (at your discretion).

The awards will be made in ten \$1,000 rounds, spread over the course of the year. The deadline for the first round is March 25. We'll post the results of each round as they occur. New proposals can be made in between rounds. Once an application is submitted it will be considered in each round unless it is withdrawn.

If you are interested, submit an application here. The application process is designed to be as straightforward as possible. Learn about the kind of work we are most interested in, and see our other restrictions. If you have other questions or comments, contact us atcontact@impactpurchase.org or discuss the project at the effective altruism forum.

impactpurchase.org contains other information about the project, and will describe awards as they are made.

"We" is currently Paul Christiano and Katja Grace. If you are interested in purchasing certificates of impact as part of this effort, we'd love to hear from you.

## [POLL] LessWrong group on YourMorals.org (2015)

12 03 March 2015 03:08AM

In 2011

The regular research has had interesting results like showing a distinct pattern of cognitive traits and values associated with libertarian politics, but there's no reason one can't use it for investigating LWers in more detail; for example, going through the results, "we can see that many of us consider purity/respect to be far less morally significant than most", and we collectively seem to have Conscientiousness issues. (I also drew on it recently for a gay marriage comment.) If there were more data, it might be interesting to look at the results and see where LWers diverge the most from libertarians (the mainstream group we seem most psychologically similar to), but unfortunately for a lot of the tests, there's too little to bother with (LW n<10). Maybe more people could take it.

(You can see some of my results at http://www.gwern.net/Links#profile )

## Human Capital Contracts

11 10 March 2015 01:21AM

Cross-posted on my blog here. Partially inspired by some slatestarcodex discussion here.

Summary: Human Capital Contracts would allow people sell a certain % of their future income in return for upfront cash, as opposed to taking out a loan. This would be less risky for them, would give them valuable information about different college majors, and would help give people de facto ‘mentors’, among other advantages. Adverse selection could reduce the benefits, and reducing inter-state competition poses a major possible disadvantage. We also discuss two niche applications: parents and divorce.

Readers with an economics background might like to jump to the sections on 'Education' and 'Mentors - Incentive Alignment'

### Debt vs equity financing

There are two methods of financing for companies; debt and equity.

Debt is fundamentally very simple. I give the company \$100 now; it promises to give me \$105 in a year’s time. They owe me a fixed amount in return. Hopefully in the meantime the company has invested that \$100 in a project or piece of equipment that produces more than \$105; if so they made a profit on the transaction as a whole. Here the risk is borne by the company; they have no choice but to pay me back, even if they didn’t make a profit this year. This form of financing is familiar to most people, as they personally use savings accounts, credit cards, mortgages, auto loans and so on.

Equity, unlike debt, does not represent a fixed level of obligation. Instead the company owes you a certain fraction of future profits. If you give a company \$100 in return for a 10% share, and they made a \$50 profit, your share of the profit is \$5. Hopefully they will make growing profits for many years, in which case your portion will grow to \$6, to \$7, and so on. Here the risk is borne by you; if they don’t make a profit, you get nothing. This form of financing is much less familiar to most people; about the only experience they are likely to have would be investing in the stock market, but that is now highly abstract so the underlying mechanics are obscured.

### Equity: Less Risky than Debt

One of the biggest advantages of this system is it moves the risk from the individual borrower to the investor. When you borrow money, you put yourself at substantial risk. What if you struggle to find a good job after college? You’re still obliged to make repayments, which could be very difficult if you only have to accept a very low-paying job. Or if you borrow money after college, what if you lose your job? Or have a family emergency? Your circumstances have deteriorated, but you’re still obliged to make the same level of payments – meaning your post-debt income falls by even more than your pre-debt income.

With equity, on the other hand, you don’t have the risk. If you don’t find a job after college, your income will be zero, so your repayments will be X% of 0 – namely 0. If you find a low-paying job, your repayments will be low. The investors will be made whole by the people who instead find high-paying jobs – who can also afford to repay more. So equity investments better match up your repayment obligations with your ability to repay.

Here is an chart showing the difference, in terms of the % of income you’d be spending on repayment, from an example worked out later in the article:

The risk is transferred to the investor, who now loses out if you don’t have much income. But they are in a much better position to deal with the risk – they can diversify, investing in many different people, and also in other asset classes. Some human capital contracts could be a good diversifying addition to a conventional portfolio of stocks and bonds.

### Education

Funding higher education is perhaps the best application for Human Capital Contracts.

Firstly, this is an extremely risky investment. There are countless stories of people who took out huge student loans to fund an arts degree and then have their lives dominated by the struggle to repay. Alternatively, if people could discharge education debts through bankruptcy, the risk to the lender would be too great, as the borrowers typically lack collateral, so loans would be available only at prohibitively high interest rates, if at all. Selling equity shares would avoid this problem; people who did badly after school would only have to repay a minimal amount, but lenders could afford to offer relatively generous terms because the average would be pulled up by the occasional very successful student.

The other appeal is the information such a market would provide students. It is fair to say that many students don’t really understand the long-term consequences of their choices. The information available on the future paths opened up by different majors is poor quality – at best, it tells you how well people who studied that major years ago have done, but the labor market has probably changed substantially over time. What students really want is forecasts of future returns to different colleges and majors, but this is very difficult! And many people are not even aware of the backwards-looking data. The situation isn’t improved by professors, who generally lack experience outside academia, and sometimes simply lie! I remember being told by a philosophy professor that philosophers were highly in demand due to the “transferable thinking skills” – despite the total lack of evidence for such an effect. Human Capital Contracts would largely solve this problem.

TIPs markets provide a forecast of future inflation. Population-linked bonds would provide similar forecasts of future population growth. Similarly, Human Capital Contracts could provide forecasts of the future returns to future degrees. Lenders would expect higher returns to some colleges and majors (Stanford Computer Science vs No-Name Communications Studies), and so would be willing to accept lower income shares for people who chose those majors. As such, being offered financing for a small percentage would indicate that the market expected this to be a profitable degree. Being offered financing only for a large percentage would be a sign that the degree would not be very profitable. Some people would still want to do it for love rather than money, but many would not – saving them from spending four years and a lot of money on a decision they’d subsequently regret.

What could make clearer the difference in expected outcomes than being offered the choice between Engineering for 1% or Fine Art for 3%?

Certainly I think I would have benefited from having this information available. Most people probably know that Computer Science pays better than English Literature, but that’s probably not a pair many people are choosing from. I was considering between Physics, Math, Economics or History for my major. I knew that History would probably pay less, but didn’t have a strong view on the relative earnings of the others. I probably would have guessed that math beat physics, for example, but in retrospect I think physics probably actually beats math.

Astute readers might object here that I am conflating the benefits of the type of financing (debt vs equity) with the mechanism for pricing the financing (free market or price fixed). If there was a free market in debt financing, lenders could charge different interest rates, and these would provide information to the students. This is true, except that 1) the interest rate would only tell you about the risk you’d end up super-poor, rather than providing information about the full distribution of outcomes, and 2) as student loans cannot be discharged through bankruptcy, there’s not really much reason for lenders to differentiate between candidates. If student loans could be discharged through bankruptcy, the interest rates charged would be informative but also probably very high. Perhaps this would be a good thing!

#### Education Funding – some illustrative examples.

Because it can be hard to think about these things in the abstract, I’ve tried to produce some worked-out examples. Suppose someone borrows \$100,000, and then starts out earning \$50,000 when they graduate. Their income grows over time, as they gain experience (maturity) and the economy grows (NGDP/capita). If we assume a 6% return for investors and a 20 year duration, they would have to give up just over 5% of their income of this time period. The repayments would be much more manageable – in year one, it would represent just 5% of their income, as opposed to 17% if they used debt.

(click for larger image)

Now, investors might demand a higher rate of return for equity investments, as they’re riskier than debt ( but then again maybe not . Here’s the same calculations, but assuming equity investors require a 10% return vs 6% on debt:

What happens if the student runs into financial difficulty later in life? Here’s what happens if their income falls by 50% and never really recovers:

with equity, the hit is affordable, but with debt they have to pay 20% of their income in debt repayments – perhaps at the same time as having medical problems.

And what about the information value? Well, the investors would be willing to offer them \$100,000 for just a 5.6% share, instead of 7.85%, if they took a major that would offer a \$70,000 starting salary instead.

### Mentors – Incentive Alignment

The modern world is very complicated, and we can’t expect people to understand all of it. Which is fine, except when it comes to understanding contracts, or credit cards, or multi-level-marketing schemes. At times the complexity of the modern world allows people to be taken advantage of, even in transactions which would be perfectly legitimate had the participants been better informed.

Equity investments have the potential to help a lot here. All of a sudden I have a third party who is genuinely concerned with maximizing my income. I could ask them for advise about looking for a job. Perhaps they could negotiate a raise for me. Indeed, they might even line up new jobs for me! Obviously their incentives are not totally aligned with me. Except insomuch-as happy workers are more productive, they might not put much weight on how pleasant the job is. But true incentive alignment is rare in general; even your parents or your spouse’s incentives aren’t perfectly aligned, and the government’s certainly aren’t. Even better, it’s very clear exactly how and to what degree my inventor’s incentives are aligned with mine: I don’t need to try and work out their angle. I can trust them on monetary affairs, and ignore their advice (if they offered any) with regards hobbies or friendships or whatever else.

Indeed, you could imagine schools that funded themselves entirely through equity investments in their students, and advertised this as a strength: their incentives are well aligned with their students. They would teach only the most useful skills, as efficiently as possible, and actively support your future career progression. This is basically the model App Academy uses:

App Academy is as low-risk as we can make it.

App Academy does not charge any tuition. Instead, you pay us a placement fee only if you find a job as a developer after the program. In that case, the fee is 18% of your first year salary, payable over the first 6 months after you start working.

source

Compare this to current universities, which actively push minority students out of STEM majors to maintain graduation rates.

### Progressive

A clear implication of equity financing is that people who go on to earn more for ex ante unpredictable reasons will pay more than those who are ex post unlucky. As such, this system is mildly redistributive in a manner many people find attractive – like a sort of idealized social insurance that Luck Egalitarians like talking about. The lucky rich pay more and the unlucky poor pay less. Even better, it manages to do so in a voluntary way.

### Taxation

The idea of human capital contracts may sound very strange. But we actually already have something similar in taxation. Governments invest in the education, health etc. of their citizens, and then levy taxes upon them. These taxes tend to be proportional to one’s ability to pay; they are some fraction of income, or expenditures (sales taxes). So Human Capital Contracts should feel familiar to socialists and the like.

There are of course a few differences between Human Capital Contracts and taxation. For example,

• Human Capital Contracts are optional, whereas taxation is mandatory.
• Human Capital Contracts give you more choice about what you spend the money on, whereas governments typically give you little choice.
• Finally, Human Capital Contracts are customizable; you could negotiate different terms with the lender (like the % share you’re selling, or the income level at which you start repaying, or the timing of repayments), whereas individuals rarely get much choice about the taxes they will be made to pay.

Indeed, the advantages of human capital contracts suggest a new way of doing taxation: the state could simply claim a certain % ownership of its citizens. Perhaps it might demand a higher % for those who use public education or public healthcare.

The idea of the state literally owning (a stake in) its citizens, without their consent, might sound evil. But this is basically what the government already does with taxation – it claims a certain fraction of your income, leaving you no recourse. Even renouncing your citizenship will not persuade the IRS to let its property go. Human Capital Contracts just make it more explicit that the governments of most countries effectively own somewhere between 30% – 60% of their populations. Worse, if they want to they can increase their ownership stake without the consent of those affected. Compared to this, it is hard to make voluntary Human Capital Contracts sound problematic.

However, this suggests a danger with equity investments in people. At the moment you can escape most governments by fleeing abroad. The couple of exceptions are largely viewed as immoral aberrations, not the rightful state of affairs. This exit-right provides a vital check on their power, and forces them to compete to some degree. Without it they can descend to the most abusive tyranny. If equity investments became widely recognized, however, governments might start to recognize each other’s ownership of its population to a greater degree than now, which would make them harder to escape. Of course, virtually any innovation can be opposed by pointing out they make it easier for governments to oppress ‘their’ populace, from coinage to maps to cell phones. Perhaps a more powerful government would be a more benign one, as many different people have argued – though perhaps not.

### Mechanics

Operationally this would be slightly more complicated than taking out a standard loan, because the amount owed to the lender would be variable. As such, they need to verify my income so they can check I’m repaying the correct amount. There are many ways this could be done, but an obvious one would be through the tax system; I would submit to the lender a copy of my tax return to show my annual income. Perhaps this could be automated through TurboTax. An even easier option would be if the payments were deducted from my paychecks – this is how English student loans work.

### Possible Regulations

One option for regulating the system would be to impose a maximum amount of equity an individual could sell. This would prevent people from selling 100% of themselves, which might be a bad idea! Though for-profit investors would probably be uninterested in buying up to 100%, as the individual would lack any reason to actually work. Probably the only people interested in buying 100% ownership would be cults, communist co-ops and terrorist movements.

Another would be to regulate the contingencies that could be attached to such contracts.

A third would be to prohibit the investor from employing the investee, or vice versa.

One of the biggest impediments to such a system might be adverse selection. Students have ‘insider information’ about their future prospects – they know about their career plans. The less you expect to earn, the more attractive selling equity is over the fixed payments of debt. Conversely, the more you expect to earn, the less attractive equity is vs debt. As such, the students who opted for equity financing might be disproportionately the students with the lowest expected outcomes. This would increase the % investors would demand in return for funding, further deterring the higher-expectation students, until eventually only the very lowest-expectation students would remain in the pool.

We could imagine this being a big issue in some subjects, like physics, where there is a large variance in income for the different exit routes – grad school vs industry vs quant finance. For others it’s less of an issue; if you go to law school you’re probably aiming to become a lawyer, though even there you might choose between criminal or corporate law.

However, there are several factors which would mitigate against such an outcome. Firstly, the risk aversion we discussed earlier means students would probably be willing to pay a substantial amount to avoid the risk associated with debt. Adverse selection would mean it would be even more attractive to students pessimistic about their long-term earnings, but so long as it is attractive enough for the optimistic ones, it would still work.

Indeed, this is basically how it works for health insurance. In theory adverse selection is a problem for private health insurance; but in practice there is not much evidence this is actually a problem; healthy people still buy health insurance.

The effect would also be substantially reduced by students own lack of knowledge about their futures. Many students change their mind over the course of their studies about what they want to go on to do. So some low-expectation students might take out equity financing, thinking they were being cunning… and then change to a high paying career track! This seems to be the more common direction of travel in general; students go to college planning on becoming human rights lawyers, or engineers, or artists, but instead end up as corporate lawyers, investment bankers and advertisers.

So this is a problem I’d expect the bond/equity/insurance market to be more than capable of dealing with.

Here are some more ideas where equity investments in people could be useful. The idea could still be valuable even without these though; education is probably the best use-case.

#### Parents

Once upon a time the land was rich and fruitful, and the people were fecund with beautiful offspring.

… maybe that never happened, but fertility rates definitely have fallen over time, probably to our detriment. Future people matter, a lot! And even if they didn’t, we still need someone to fund social security.

One guess as to why fertility has dropped is once upon a time your children could be relied upon to live near you, following your customs, and supporting you in your old age, though its unclear if this ever made strictly economic sense. Now, however, children feel much less moved by filial piety, and frequently move far away. As such, parents seem much less value in having children, and only do so out of charity – raising a child takes a lot of effort, and the modern world is full of super-stimuli to distract you from productive procreation. Giving parents a small equity stake in their children would go some way towards recognizing the investment parents put into their children, and hopefully boost fertility rates.

It would also encourage parents to support their children and their careers; now the high-flying child is not merely a source of pride but also a source of retirement. A friend I discussed this with suggested that first-generation immigrants tended to give their children very practical advice about school, careers and relationships, whereas whites tend to be more wishy-washy; perhaps this would promote a return to reality-based parenting.

#### Divorce settlement

Another niche case where these could be useful would be divorce settlements. The classic feminist argument about divorce settlements was that the woman had invested in domestic and family labor, which was disrupted by the divorce, while the man had invested in his career, which he kept. Partly as a result of arguments like this, we now see divorce settlements where one party gets a claim to some of the resources of the other.

However, a fixed sum is not a very natural way of dealing with this. The woman, in entering marriage, assumed she would be benefiting from a certain share of the man’s output. If he were successful, this would be more; if he came upon poor fortune, this would be less – rather than taking a costly and messy court case to adjust the payments.  Human Capital Contracts would allow a divorce settlement to recognize this: in a divorce were the man were at fault, the woman might be granted a 1% equity share for each year of marriage.

Obviously if you thought permitting divorce was a mistake – “til death do us part” – then you’d have little interest in this application.

Cross-posted on my blog here. Partially inspired by some slatestarcodex discussion here.

## Precisely Bound Demons and their Behavior

11 06 March 2015 08:01PM

EY posted this on reddit, I'd like to know what you'd do with it:

I can't promise this will turn into a sufficiently good environment for storytelling or that I'll write in it, but you never know unless you try, and worldbuilding can be fun regardless... One in X people (X ~ 10,000?) has the ability to summon demons, once per Y days, and bind them to arbitrary commands at will. Demons are malevolent and will interpret any instruction in such ways as to cause the most damage. Evil summoners can sometimes reach an accommodation of sorts by giving the demons orders which benefit themselves and hurt others more, in which case the demon will often go along with it, most of the time.

Most good people with the ability to summon demons were advised never to do so, unless it became necessary to defeat an evil demon-summoner creating horror on a mass scale.

This world's Industrial Revolution began when it was realized that mathematically precise and complete commands to demons apparently could not be misinterpreted. For example (this could perhaps be picked apart): A demon told to accelerate a vehicle along an exactly given vector for a specified time, applying the same added acceleration at any given time to all particles in the vehicle, and causing no other impact on the material universe, will do only that... if the language of the contract can be mathematically specified in an absolutely unambiguous way. (What exactly is the 'vehicle'? Maybe you'd better have the demon apply acceleration to a sphere to which the engine car is attached.)

Demon-summoners promptly began to use their powers in the most economically rewarding way, such as by summoning demons who would just accelerate particular train engine cars; and this occurred on a mass scale throughout society.

This is a point where I wouldn't mind help worldbuilding: given this basic setup, what industrially useful demonic bindings can be precisely specified? Suppose the world is such that electricity doesn't exist, but fire does, and steam. Demon summoners will end up being rare enough, whatever frequency is 'rare enough', that the society doesn't come apart as the result of whatever powers you invent.

Bindings can also tell demons to act based on the result of a calculation, if that calculation is precisely specified. There is no known limit on how much calculation can be done this way. If a demon is told to behave in a way that depends on a calculation that does not halt, it is the same as telling the demon "do what you want", which is a very bad thing to tell a demon (though for poorly understood reasons, demons' most malevolent free actions are not as destructive as the worst human commands). Summoners are well-advised to tell the demon to only compute something for a bounded number of steps, though no known limit exists on how high the bound can be.

From our perspective, they discovered that demons can act like unboundedly large and fast computers.

This kind of demonic calculation has been previously used to investigate interesting math questions and create demons that e.g. loft steerable airplanes. But as the calculations used in spiritual industry grow more complex, people have the bright idea that cognitive calculations can also be specified. They begin to publish specifications for simple cognitive constructs, like gradient-descent sigmoid neural networks. It would be useful (think those spiritualists) if demons could be told to recognize particular faces by recourse to a neural network, without giving any demon underspecified instructions about 'if you recognize person X' that would allow their malevolence room to act.

Shortly thereafter, the world ends.

Our N protagonists find themselves in a Groundhog Day Loop of period ???, trying to prevent the seemingly inevitable end of the world that occurs when some damned idiot summoner, somewhere, instructs a demon to act like the equivalent of AIXI-tl. For reasons that are unclear, even though 'natural' demons don't instantly destroy the world given an instruction like 'do what you want', the cognitively bound equivalent of AIXI-tl can construct self-replicating agentic goo in the environment in order to serve its purposes (in the case of AIXI-tl, maximizing a reward channel).

After some failures trying to prevent the end of the world the normal way, the thought has occurred to our protagonists that the only Power great enough to prevent the end of the world would be a demon bound to implement a 'nice' superpowerful cognitive binding, or at least a cognitive binding that carries out intuitively specified instructions well enough to shut down all attempts at summoning non-value-aligned cognitive demons.

But the mathematical technology that the Looped summoners presently have for specifying cognitive bindings is incredibly primitive - at the level of AIXI-tl. They can't even solve a problem like 'Specify an advanced agent that, otherwise given freedom to act on the material universe however it likes, just wants to flip a certain button and then shut itself off in an orderly and nondestructive fashion, without e.g. constructing any other agents to maximize the probability of it being absolutely shut off forever, etc.'

And doing research on this topic, at least openly, does tend to destroy the world before the non-Looped researchers can get properly started. If you say "Can we have a non-destructive version of AIXI-tl?" then somebody goes off and summons AIXI-tl.

The story opens well into the Loops, as the Loopers try to conquer the world and restrain all other summoners in order to create an environment where they can actually get some collaborative research done before the end of a Loop, and maybe live in a world for longer than ??? days for bloody once. They are, of course, regarded as supervillains by the general public. Being not a little crazy by this point, many of them are happy to play the part so far as that goes - wear black, live in a dark castle, accept the service of the sort of member of the appropriate sex who wants to swear themselves to a supervillain, etcetera.

Demons seem blind to the Loops, so some Loopers may also be using seemingly destructive ordinary demonic contracts to gain an advantage. Opinions differ among the Loopers as to what degree the Loops are real, other people in the Loops are worth optimizing for, etcetera. "If those other people are even real in the same way we are, they're all going to die anyway and go on dying until we end this somehow" is a common but not universal sentiment.

The questions I pose to you:

• What sort of industrially scaled, or personally awesome uses for a mathematically specified, precisely bound demon can you imagine? What was the prior world that existed before the Loops?
• What kind of advantage do our Loopers have from their preliminary research into cognitive demons?
• How are they trying to take over the world in the first written Loop?
• What sort of really awesome character would you like to see in this situation? Feel free to pick references from fiction, e.g. "BBC!Sherlock". My trying to write them played straight will just generate a new Yudkowskian character.

Among other things, the Groundhog Day format hopefully means that I can have characters freely do what a subreddit and/or high bidders suggest, within the limits of my own filtering for intelligent action; and when that all goes pear-shaped, it's back to the next reset.

If anyone can give an unboundedly-computable specification of either a nice Sovereign agent, or less improbably, a trainable good Genie, the characters Win. While I can't make promises in my own person at this point, if that started to be a reasonable prospect, I'd expect I could swing a million-dollar prize to be set up for that perhaps improbable case. It's not like there are better uses for money.

As is my usual practice, the world and characters would be open for anyone else to use and profit on.

ADDED 1: Demons have limits as to how much material force they can exert, within what range. You cannot summon a demon and tell it to hurl the moon into the sun. Pulling a train is about as much as they can do. AIXI-tl kills by creating self-replicating smart goo, not by instantly optimizing the whole universe from within its local radius. Demons cannot be used for long-range communication, except by making flashes of light that are seen elsewhere.

ADDED 2: Demons are cunning but can still often be outwitted by clever humans... unless you've given the demon precise instructions to act on the material world in a way that depends on a calculation, in which case that calculation can be arbitrarily powerful. You can't instruct a demon 'make nanotech' (not that this would ever be a good idea) because the demon isn't smart enough to figure that out on its own without a calculatory binding.

ADDED 3: Name not set in stone, better names welcome.

## False thermodynamic miracles

11 05 March 2015 05:04PM

A putative new idea for AI control; index here.

Ok, here is the problem:

• You have to create an AI that believes (or acts as if it believed) that event X is almost certain, while you believe that X is almost impossible. Furthermore, you have to be right. To make things more interesting, the AI is much smarter than you, knows everything that you do (and more), and has to react sensibly when event X doesn't happen.

Answers will be graded on mathematics, style, colours of ink, and compatibility with the laws of physics. Also, penmanship. How could you achieve this?

10 27 March 2015 02:02PM

To keep this post manageable in length, I have only included a small subset of the illustrative examples and discussion. I have published a longer version of this post, with more examples (but the same intro and concluding section), on my personal site.

Last year, during the months of June and July, as my work for MIRI was wrapping up and I hadn't started my full-time job, I worked on the Wikipedia Views website, aimed at easier tabulation of the pageviews for multiple Wikipedia pages over several months and years. It relies on a statistics tool called stats.grok.se, created by Doms Mituzas, and maintained by Henrik.

One of the interesting things I noted as I tabulated pageviews for many different pages was that the pageview counts for many already popular pages were in decline. Pages of various kinds peaked at different historical points. For instance, colors have been in decline since early 2013. The world's most populous countries have been in decline since as far back as 2010!

#### Defining the problem

The first thing to be clear about is what these pageviews count and what they don't. The pageview measures are taken from stats.grok.se, which in turn uses the pagecounts-raw dump provided hourly by the Wikimedia Foundation's Analytics team, which in turn is obtained by processing raw user activity logs. The pagecounts-raw measure is flawed in two ways:

• It only counts pageviews on the main Wikipedia website and not pageviews on the mobile Wikipedia website or through Wikipedia Zero (a pared down version of the mobile site that some carriers offer at zero bandwidth costs to their customers, particularly in developing countries). To remedy these problems, a new dump called pagecounts-all-sites was introduced in September 2014. We simply don't have data for views of mobile domains or of Wikipedia Zero at the level of individual pages for before then. Moreover, stats.grok.se still uses pagecounts-raw (this was pointed to me in a mailing list message after I circulated an early version of the post).
• The pageview count includes views by bots. The official estimate is that about 15% of pageviews are due to bots. However, the percentage is likely higher for pages with fewer overall pageviews, because bots have a minimum crawling frequency. So every page might have at least 3 bot crawls a day, resulting in a minimum of 90 bot pageviews even if there are only a handful of human pageviews.

Therefore, the trends I discuss will refer to trends in total pageviews for the main Wikipedia website, including page requests by bots, but excluding visits to mobile domains. Note that visits from mobile devices to the main site will be included, but mobile devices are by default redirected to the mobile site.

#### How reliable are the metrics?

As noted above, the metrics are unreliable because of the bot problem and the issue of counting only non-mobile traffic. German Wikipedia user Atlasowa left a message on my talk page pointing me to an email thread suggesting that about 40% of pageviews may be bot-related, and discussing some interesting examples.

#### Relationship with the overall numbers

I'll show that for many pages of interest, the number of pageviews as measured above (non-mobile) has declined recently, with a clear decline from 2013 to 2014. What about the total?

We have overall numbers for non-mobile, mobile, and combined. The combined number has largely held steady, whereas the non-mobile number has declined and the mobile number has risen.

What we'll find is that the decline for most pages that have been around for a while is even sharper than the overall decline. One reason overall pageviews haven't declined so fast is the creation of new pages. To give an idea, non-mobile traffic dropped by about 1/3 from January 2013 to December 2014, but for many leading categories of pages, traffic dropped by about 1/2-2/3.

Why is this important? First reason: better context for understanding trends for individual pages

People's behavior on Wikipedia is a barometer of what they're interested in learning about. An analysis of trends in the views of pages can provide an important window into how people's curiosity, and the way they satisfy this curiosity, is evolving. To take an example, some people have proposed using Wikipedia pageview trends to predict flu outbreaks. I myself have tried to use relative Wikipedia pageview counts to gauge changing interests in many topics, ranging from visa categories to technology companies.

My initial interest in pageview numbers arose because I wanted to track my own influence as a Wikipedia content creator. In fact, that was my original motivation with creating Wikipedia Views. (You can see more information about my Wikipedia content contributions on my site page about Wikipedia).

Now, when doing this sort of analysis for individual pages, one needs to account for, and control for, overall trends in the views of Wikipedia pages that are occurring for reasons other than a change in people's intrinsic interest in the subject. Otherwise, we might falsely conclude from a pageview count decline that a topic is falling in popularity, whereas what's really happening is an overall decline in the use of (the non-mobile version of) Wikipedia to satisfy one's curiosity about the topic.

Why is this important? Second reason: a better understanding of the overall size and growth of the Internet.

Wikipedia has been relatively mature and has had the top spot as an information source for at least the last six years. Moreover, unlike almost all other top websites, Wikipedia doesn't try hard to market or optimize itself, so trends in it reflect a relatively untarnished view of how the Internet and the World Wide Web as a whole are growing, independent of deliberate efforts to manipulate and doctor metrics.

The case of colors

Let's look at Wikipedia pages on some of the most viewed colors (I've removed the 2015 and 2007 columns because we don't have the entirety of these years). Colors are interesting because the degree of human interest in colors in general, and in individual colors, is unlikely to change much in response to news or current events. So one would at least a priori expect colors to offer a perspective into Wikipedia trends with fewer external complicating factors. If we see a clear decline here, then that's strong evidence in favor of a genuine decline.

I've restricted attention to a small subset of the colors, that includes the most common ones but isn't comprehensive. But it should be enough to get a sense of the trends. And you can add in your own colors and check that the trends hold up.

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Black 431K 1.5M 1.3M 778K 900K 1M 958K 6.9M 16.1 Colors
Blue 710K 1.3M 1M 987K 1.2M 1.2M 1.1M 7.6M 17.8 Colors
Brown 192K 284K 318K 292K 308K 300K 277K 2M 4.6 Colors
Green 422K 844K 779K 707K 882K 885K 733K 5.3M 12.3 Colors
Orange 133K 181K 251K 259K 275K 313K 318K 1.7M 4 Colors
Purple 524K 906K 847K 895K 865K 841K 592K 5.5M 12.8 Colors
Red 568K 797K 912K 1M 1.1M 873K 938K 6.2M 14.6 Colors
Violet 56K 96K 75K 77K 69K 71K 65K 509K 1.2 Colors
White 301K 795K 615K 545K 788K 575K 581K 4.2M 9.8 Colors
Yellow 304K 424K 453K 433K 452K 427K 398K 2.9M 6.8 Colors
Total 3.6M 7.1M 6.6M 6M 6.9M 6.5M 6M 43M 100 --
Percentage 8.5 16.7 15.4 14 16 15.3 14 100 -- --

Since the decline appears to have happened between 2013 and 2014, let's examine the 24 months from January 2013 to December 2014:

 Month Views of page Black Views of page Blue Views of page Brown Views of page Green Views of page Orange Views of page Purple Views of page Red Views of page Violet Views of page White Views of page Yellow Total Percentage 201412 30K 41K 14K 27K 9.6K 28K 67K 3.1K 21K 19K 260K 2.4 201411 36K 46K 15K 31K 10K 35K 50K 3.7K 23K 22K 273K 2.5 201410 37K 52K 16K 34K 10K 34K 51K 4.5K 25K 26K 289K 2.7 201409 37K 57K 16K 35K 9.9K 37K 45K 4.8K 27K 29K 298K 2.8 201408 33K 47K 14K 34K 8.5K 31K 38K 3.9K 21K 22K 253K 2.4 201407 33K 47K 14K 30K 9.3K 31K 37K 4.2K 22K 22K 250K 2.3 201406 32K 49K 14K 31K 10K 34K 39K 4.9K 23K 22K 259K 2.4 201405 44K 55K 17K 37K 10K 51K 42K 5.2K 26K 26K 314K 2.9 201404 34K 60K 17K 36K 14K 38K 47K 5.8K 27K 28K 306K 2.8 201403 37K 136K 19K 51K 14K 123K 52K 5.5K 30K 31K 497K 4.6 201402 38K 58K 19K 39K 13K 41K 49K 5.6K 29K 29K 321K 3 201401 40K 60K 19K 36K 14K 40K 50K 4.4K 27K 28K 319K 3 201312 62K 67K 17K 44K 12K 48K 48K 4.4K 42K 26K 372K 3.5 201311 141K 96K 20K 65K 11K 68K 55K 5.3K 71K 34K 566K 5.3 201310 145K 102K 21K 69K 11K 77K 59K 5.7K 71K 36K 598K 5.6 201309 98K 80K 17K 60K 11K 53K 51K 4.9K 45K 30K 450K 4.2 201308 109K 87K 20K 57K 20K 57K 60K 4.6K 53K 28K 497K 4.6 201307 107K 92K 21K 61K 11K 66K 65K 4.6K 61K 30K 520K 4.8 201306 115K 106K 22K 69K 13K 73K 64K 5.5K 70K 33K 571K 5.3 201305 158K 122K 24K 79K 14K 83K 69K 11K 77K 39K 677K 6.3 201304 151K 127K 28K 83K 14K 86K 74K 12K 78K 40K 694K 6.4 201303 155K 135K 31K 92K 15K 99K 84K 12K 80K 43K 746K 6.9 201302 152K 131K 31K 84K 28K 95K 84K 17K 77K 41K 740K 6.9 201301 129K 126K 32K 81K 19K 99K 84K 9.6K 70K 42K 691K 6.4 Total 2M 2M 476K 1.3M 314K 1.4M 1.4M 152K 1.1M 728K 11M 100 Percentage 18.1 18.4 4.4 11.8 2.9 13.3 12.7 1.4 10.2 6.8 100 -- Tags Colors Colors Colors Colors Colors Colors Colors Colors Colors Colors -- --

As we can see, the decline appears to have begun around March 2013 and then continued steadily till about June 2014, at which numbers stabilized to their lower levels.

A few sanity checks on these numbers:

• The trends appear to be similar for different colors, with the notable difference that the proportional drop was higher for the more viewed color pages. Thus, for instance, black and blue saw declines from 129K and 126K to 30K and 41K respectively (factors of four and three respectively) from January 2013 to December 2014. Orange and yellow, on the other hand, dropped by factors of close to two. The only color that didn't drop significantly was red (it dropped from 84K to 67K, as opposed to factors of two or more for other colors), but this seems to have been partly due to an unusually large amount of traffic in the end of 2014. The trend even for red seems to suggest a drop similar to that for orange.
• The overall proportion of views for different colors comports with our overall knowledge of people's color preferences: blue is overall a favorite color, and this is reflected in its getting the top spot with respect to pageviews.
• The pageview decline followed a relatively steady trend, with the exception of some unusual seasonal fluctuation (including an increase in October and November 2013).

One might imagine that this is due to people shifting attention from the English-language Wikipedia to other language Wikipedias, but most of the other major language Wikipedias saw a similar decline at a similar time. More details are in my longer version of this post on my personal site.

Geography: continents and subcontinents, countries, and cities

Here are the views of some of the world's most populated countries between 2008 and 2014, showing that the peak happened as far back as 2010:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
China 5.7M 6.8M 7.8M 6.1M 6.9M 5.7M 6.1M 45M 9 Countries
India 8.8M 12M 12M 11M 14M 8.8M 7.6M 73M 14.5 Countries
United States 13M 15M 18M 18M 34M 16M 15M 129M 25.7 Countries
Indonesia 5.3M 5.2M 3.7M 3.6M 4.2M 3.1M 2.5M 28M 5.5 Countries
Brazil 4.8M 4.9M 5.3M 5.5M 7.5M 4.9M 4.3M 37M 7.4 Countries
Pakistan 2.9M 4.5M 4.4M 4.3M 5.2M 4M 3.2M 28M 5.7 Countries
Bangladesh 2.2M 2.9M 3M 2.8M 2.9M 2.2M 1.7M 18M 3.5 Countries
Russia 5.6M 5.6M 6.5M 6.8M 8.6M 5.4M 5.8M 44M 8.8 Countries
Nigeria 2.6M 2.6M 2.9M 3M 3.5M 2.6M 2M 19M 3.8 Countries
Japan 4.8M 6.4M 6.5M 8.3M 10M 7.3M 6.6M 50M 10 Countries
Mexico 3.1M 3.9M 4.3M 4.3M 5.9M 4.7M 4.5M 31M 6.1 Countries
Total 59M 69M 74M 74M 103M 65M 59M 502M 100 --
Percentage 11.7 13.8 14.7 14.7 20.4 12.9 11.8 100 -- --

Of these countries, China, India and the United States are the most notable. China is the world's most populous. India has the largest population with some minimal English knowledge and legally (largely) unfettered Internet access to Wikipedia, while the United States has the largest population with quality Internet connectivity and good English knowledge. Moreover, in China and India, Internet use and access have been growing considerably in the last few years, whereas it has been relatively stable in the United States.

It is interesting that the year with the maximum total pageview count was as far back as 2010. In fact, 2010 was so significantly better than the other years that the numbers beg for an explanation. I don't have one, but even excluding 2010, we see a declining trend: gradual growth from 2008 to 2011, and then a symmetrically gradual decline. Both the growth trend and the decline trend are quite similar across countries.

We see a similar trend for continents and subcontinents, with the peak occurring in 2010. In contrast, the smaller counterparts, such as cities, peaked in 2013, similarly to colors, and the drop, though somewhat less steep than with colors, has been quite significant. For instance, a list for Indian cities shows that the total pageviews for these Indian cities declined from about 20 million in 2013 (after steady growth in the preceding years) to about 13 million in 2014.

Some niche topics where pageviews haven't declined

So far, we've looked at topics where pageviews have been declining since at least 2013, and some that peaked as far back as 2010. There are, however, many relatively niche topics where the number of pageviews has stayed roughly constant. But this stability itself is a sign of decay, because other metrics suggest that the topics have experienced tremendous growth in interest. In fact, the stability is even less impressive when we notice that it's a result of a cancellation between slight declines in views of established pages in the genre, and traffic going to new pages.

For instance, consider some charity-related pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Against Malaria Foundation 5.9K 6.3K 4.3K 1.4K 2 0 0 18K 15.6 Charities
Development Media International 757 0 0 0 0 0 0 757 0.7 Pages created by Vipul Naik Charities
Deworm the World Initiative 2.3K 277 0 0 0 0 0 2.6K 2.3 Charities Pages created by Vipul Naik
GiveDirectly 11K 8.3K 2.6K 442 0 0 0 22K 19.2 Charities Pages created by Vipul Naik
International Council for the Control of Iodine Deficiency Disorders 1.2K 1 2 2 0 1 2 1.2K 1.1 Charities Pages created by Vipul Naik
Nothing But Nets 5.9K 6.6K 6.6K 5.1K 4.4K 4.7K 6.1K 39K 34.2 Charities
Nurse-Family Partnership 2.9K 2.8K 909 30 8 72 63 6.8K 5.9 Pages created by Vipul Naik Charities
Root Capital 3K 2.5K 414 155 51 1.2K 21 7.3K 6.3 Charities Pages created by Vipul Naik
Schistosomiasis Control Initiative 4K 2.7K 1.6K 191 0 0 0 8.5K 7.4 Charities Pages created by Vipul Naik
VillageReach 1.7K 1.9K 2.2K 2.6K 97 3 15 8.4K 7.3 Charities Pages created by Vipul Naik
Total 38K 31K 19K 9.9K 4.6K 5.9K 6.2K 115K 100 --
Percentage 33.4 27.3 16.3 8.6 4 5.1 5.4 100 -- --

For this particular cluster of pages, we see the totals growing robustly year-on-year. But a closer look shows that the growth isn't that impressive. Whereas earlier, views were doubling every year from 2010 to 2013 (this was the take-off period for GiveWell and effective altruism), the growth from 2013 to 2014 was relatively small. And about half the growth from 2013 to 2014 was powered by the creation of new pages (including some pages created after the beginning of 2013, so they had more months in a mature state in 2014 than in 2013), while the other half was powered by growth in traffic to existing pages.

The data for philanthropic foundations demonstrates a fairly slow and steady growth (about 5% a year), partly due to the creation of new pages. This 5% hides a lot of variation between individual pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Atlantic Philanthropies 11K 11K 12K 10K 9.8K 8K 5.8K 67K 2.1 Philanthropic foundations
Bill & Melinda Gates Foundation 336K 353K 335K 315K 266K 240K 237K 2.1M 64.9 Philanthropic foundations
Draper Richards Kaplan Foundation 1.2K 25 9 0 0 0 0 1.2K 0 Philanthropic foundations Pages created by Vipul Naik
Ford Foundation 110K 91K 100K 90K 100K 73K 61K 625K 19.5 Philanthropic foundations
Good Ventures 9.9K 8.6K 3K 0 0 0 0 21K 0.7 Philanthropic foundations Pages created by Vipul Naik
Jasmine Social Investments 2.3K 1.8K 846 0 0 0 0 5K 0.2 Philanthropic foundations Pages created by Vipul Naik
Laura and John Arnold Foundation 3.7K 13 0 1 0 0 0 3.7K 0.1 Philanthropic foundations Pages created by Vipul Naik
Mulago Foundation 2.4K 2.3K 921 0 1 1 10 5.6K 0.2 Philanthropic foundations Pages created by Vipul Naik
Omidyar Network 26K 23K 19K 17K 19K 13K 11K 129K 4 Philanthropic foundations
Peery Foundation 1.8K 1.6K 436 0 0 0 0 3.9K 0.1 Philanthropic foundations Pages created by Vipul Naik
Robert Wood Johnson Foundation 26K 26K 26K 22K 27K 22K 17K 167K 5.2 Philanthropic foundations
Skoll Foundation 13K 11K 9.2K 7.8K 9.6K 5.8K 4.3K 60K 1.9 Philanthropic foundations
Smith Richardson Foundation 8.7K 3.5K 3.8K 3.6K 3.7K 3.5K 2.9K 30K 0.9 Philanthropic foundations
Thiel Foundation 3.6K 1.5K 1.1K 47 26 1 0 6.3K 0.2 Philanthropic foundations Pages created by Vipul Naik
Total 556K 533K 511K 466K 435K 365K 340K 3.2M 100 --
Percentage 17.3 16.6 15.9 14.5 13.6 11.4 10.6 100 -- --

#### The dominant hypothesis: shift from non-mobile to mobile Wikipedia use

The dominant hypothesis is that pageviews have simply migrated from non-mobile to mobile. This is most closely borne by the overall data: total pageviews have remained roughly constant, and the decline in total non-mobile pageviews has been roughly canceled by growth in mobile pageviews. However, the evidence for this substitution doesn't exist at the level of individual pages because we don't have pageview data for the mobile domain before September 2014, and much of the decline occurred between March 2013 and June 2014.

What would it mean if there were an approximate one-on-one substitution from non-mobile to mobile for the page types discussed above? For instance, non-mobile traffic to colors dropped to somewhere between 1/3 and 1/2 of their original traffic level between January 2013 and December 2014. This would mean that somewhere between 1/2 and 2/3 of the original non-mobile traffic to colors has shifted to mobile devices. This theory should be at least partly falsifiable: if the sum of traffic to non-mobile and mobile platforms today for colors is less than non-mobile-only traffic in January 2013, then clearly substitution is only part of the story.

Although the data is available, it's not currently in an easily computable form, and I don't currently have the time and energy to extract it. I'll update this once the data on all pageviews since September 2014 is available on stats.grok.se or a similar platform.

#### Other hypotheses

The following are some other hypotheses for the pageview decline:

1. Google's Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets (called Knowledge Cards and Knowledge Panels) based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn't need to click through to the Wikipedia page. I suspect that the Knowledge Graph played some role in the decline for colors seen between March 2013 and June 2014. On the other hand, many of the pages that saw a decline don't have any search snippets based on the Knowledge Graph, and therefore the decline for those pages cannot be explained this way.
2. Other means of accessing Wikipedia's knowledge that don't involve viewing it directly: For instance, Apple's Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia. The usage of such tools has increased greatly starting in late 2012. Siri itself was released with the third generation iPad in September 2012 and became part of the iPhone released the next month. Since then, it has shipped with all of Apple's mobile devices and tablets.
3. Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position. I think there's a lot of truth to this, but it's hard to quantify.
4. Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what's going on with the early decline of pageviews (e.g., the decline in pageviews of countries and continents starting around 2010, as people go directly to specialized articles related to the particular aspects of those countries or continents they are interested in).
5. Substitution to Internet use in other languages: This hypothesis doesn't seem borne out by the simultaneous decline in pageviews for the English, French, and Spanish Wikipedia, as documented for the color pages.

#### It's still a mystery

I'd like to close by noting that the pageview decline is still very much a mystery as far as I am concerned. I hope I've convinced you that (a) the mystery is genuine, (b) it's important, and (c) although the shift to mobile is probably the most likely explanation, we don't yet have clear evidence. I'm interested in hearing whether people have alternative explanations, and/or whether they have more compelling arguments for some of the explanations proffered here.

## Is arrogance a symptom of bad intellectual hygeine?

10 21 March 2015 07:59PM

I have this belief that humility is a part of good critical thinking, and that egoism undermines it.  I imagine arrogance as a kind of mind-death.  But I have no evidence, and no good mechanism by which it might be true.  In fact, I know the belief is suspect because I know that I want it to be true — I want to be able to assure myself that this or that intolerable academic will be magically punished with a decreased capacity to do good work. The truth could be the opposite: maybe hubris breeds confidence, and confidence results? After all, some of the most important thinkers in history were insufferable.

Is any link, positive or negative, between arrogance and reasoning too tenuous to be worth entertaining? Is humility a pretty word or a valuable habit? I don't know what I think yet.   Do you?

## Superintelligence 27: Pathways and enablers

10 17 March 2015 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-seventh section in the reading guidePathways and enablers.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Pathways and enablers” from Chapter 14

# Summary

1. Is hardware progress good?
1. Hardware progress means machine intelligence will arrive sooner, which is probably bad.
2. More hardware at a given point means less understanding is likely to be needed to build machine intelligence, and brute-force techniques are more likely to be used. These probably increase danger.
3. More hardware progress suggests there will be more hardware overhang when machine intelligence is developed, and thus a faster intelligence explosion. This seems good inasmuch as it brings a higher chance of a singleton, but bad in other ways:
1. Less opportunity to respond during the transition
2. Less possibility of constraining how much hardware an AI can reach
3. Flattens the playing field, allowing small projects a better chance. These are less likely to be safety-conscious.
4. Hardware has other indirect effects, e.g. it allowed the internet, which contributes substantially to work like this. But perhaps we have enough hardware now for such things.
5. On balance, more hardware seems bad, on the impersonal perspective.
2. Would brain emulation be a good thing to happen?
1. Brain emulation is coupled with 'neuromorphic' AI: if we try to build the former, we may get the latter. This is probably bad.
2. If we achieved brain emulations, would this be safer than AI? Three putative benefits:
1. "The performance of brain emulations is better understood"
1. However we have less idea how modified emulations would behave
2. Also, AI can be carefully designed to be understood
2. "Emulations would inherit human values"
1. This might require higher fidelity than making an economically functional agent
2. Humans are not that nice, often. It's not clear that human nature is a desirable template.
3. "Emulations might produce a slower take-off"
1. It isn't clear why it would be slower. Perhaps emulations would be less efficient, and so there would be less hardware overhang. Or perhaps because emulations would not be qualitatively much better than humans, just faster and more populous of them
2. A slower takeoff may lead to better control
3. However it also means more chance of a multipolar outcome, and that seems bad.
3. If brain emulations are developed before AI, there may be a second transition to AI later.
1. A second transition should be less explosive, because emulations are already many and fast relative to the new AI.
2. The control problem is probably easier if the cognitive differences are smaller between the controlling entities and the AI.
3. If emulations are smarter than humans, this would have some of the same benefits as cognitive enhancement, in the second transition.
4. Emulations would extend the lead of the frontrunner in developing emulation technology, potentially allowing that group to develop AI with little disturbance from others.
5. On balance, brain emulation probably reduces the risk from the first transition, but added to a second  transition this is unclear.
4. Promoting brain emulation is better if:
1. You are pessimistic about human resolution of control problem
2. You are less concerned about neuromorphic AI, a second transition, and multipolar outcomes
3. You expect the timing of brain emulations and AI development to be close
4. You prefer superintelligence to arrive neither very early nor very late
3. The person affecting perspective favors speed: present people are at risk of dying in the next century, and may be saved by advanced technology

# Another view

I talked to Kenzi Amodei about her thoughts on this section. Here is a summary of her disagreements:

Bostrom argues that we probably shouldn't celebrate advances in computer hardware. This seems probably right, but here are counter-considerations to a couple of his arguments.

The great filter

A big reason Bostrom finds fast hardware progress to be broadly undesirable is that he judges the state risks from sitting around in our pre-AI situation to be low, relative to the step risk from AI. But the so called 'Great Filter' gives us reason to question this assessment.

The argument goes like this. Observe that there are a lot of stars (we can detect about ~10^22 of them). Next, note that we have never seen any alien civilizations, or distant suggestions of them. There might be aliens out there somewhere, but they certainly haven't gone out and colonized the universe enough that we would notice them (see 'The Eerie Silence' for further discussion of how we might observe aliens).

This implies that somewhere on the path between a star existing, and it being home to a civilization that ventures out and colonizes much of space, there is a 'Great Filter': at least one step that is hard to get past. 1/10^22 hard to get past. We know of somewhat hard steps at the start: a star might not have planets, or the planets may not be suitable for life. We don't know how hard it is for life to start: this step could be most of the filter for all we know.

If the filter is a step we have passed, there is nothing to worry about. But if it is a step in our future, then probably we will fail at it, like everyone else. And things that stop us from visibly colonizing the stars are may well be existential risks.

At least one way of understanding anthropic reasoning suggests the filter is much more likely to be at a step in our future. Put simply, one is much more likely to find oneself in our current situation if being killed off on the way here is unlikely.

So what could this filter be? One thing we know is that it probably isn't AI risk, at least of the powerful, tile-the-universe-with-optimal-computations, sort that Bostrom describes. A rogue singleton colonizing the universe would be just as visible as its alien forebears colonizing the universe. From the perspective of the Great Filter, either one would be a 'success'. But there are no successes that we can see.

What's more, if we expect to be fairly safe once we have a successful superintelligent singleton, then this points at risks arising before AI.

So overall this argument suggests that AI is less concerning than we think and that other risks (especially early ones) are more concerning than we think. It also suggests that AI is harder than we think.

Which means that if we buy this argument, we should put a lot more weight on the category of 'everything else', and especially the bits of it that come before AI. To the extent that known risks like biotechnology and ecological destruction don't seem plausible, we should more fear unknown unknowns that we aren't even preparing for.

How much progress is enough?

Bostrom points to positive changes hardware has made to society so far. For instance, hardware allowed personal computers, bringing the internet, and with it the accretion of an AI risk community, producing the ideas in Superintelligence. But then he says probably we have enough: "hardware is already good enough for a great many applications that could facilitate human communication and deliberation, and it is not clear that the pace of progress in these areas is strongly bottlenecked by the rate of hardware improvement."

This seems intuitively plausible. However one could probably have erroneously made such assessments in all kinds of progress, all over history. Accepting them all would lead to madness, and we have no obvious way of telling them apart.

In the 1800s it probably seemed like we had enough machines to be getting on with, perhaps too many. In the 1800s people probably felt overwhelmingly rich. If the sixties too, it probably seemed like we had plenty of computation, and that hardware wasn't a great bottleneck to social progress.

If a trend has brought progress so far, and the progress would have been hard to predict in advance, then it seems hard to conclude from one's present vantage point that progress is basically done.

# Notes

1. How is hardware progressing?

I've been looking into this lately, at AI Impacts. Here's a figure of MIPS/\$ growing, from Muehlhauser and Rieber.

(Note: I edited the vertical axis, to remove a typo)

2. Hardware-software indifference curves

It was brought up in this chapter that hardware and software can substitute for each other: if there is endless hardware, you can run worse algorithms, and vice versa. I find it useful to picture this as indifference curves, something like this:

(Image: Hypothetical curves of hardware-software combinations producing the same performance at Go (source).)

I wrote about predicting AI given this kind of model here.

3. The potential for discontinuous AI progress

While we are on the topic of relevant stuff at AI Impacts, I've been investigating and quantifying the claim that AI might suddenly undergo huge amounts of abrupt progress (unlike brain emulations, according to Bostrom). As a step, we are finding other things that have undergone huge amounts of progress, such as nuclear weapons and high temperature superconductors:

(Figure originally from here)

4. The person-affecting perspective favors speed less as other prospects improve

I agree with Bostrom that the person-affecting perspective probably favors speeding many technologies, in the status quo. However I think it's worth noting that people with the person-affecting view should be scared of existential risk again as soon as society has achieved some modest chance of greatly extending life via specific technologies. So if you take the person-affecting view, and think there's a reasonable chance of very long life extension within the lifetimes of many existing humans, you should be careful about trading off speed and risk of catastrophe.

5. It seems unclear that an emulation transition would be slower than an AI transition.

One reason to expect an emulation transition to proceed faster is that there is an unusual reason to expect abrupt progress there.

6. Beware of brittle arguments

This chapter presented a large number of detailed lines of reasoning for evaluating hardware and brain emulations. This kind of concern might apply.

# In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

1. Investigate in more depth how hardware progress affects factors of interest
2. Assess in more depth the likely implications of whole brain emulation
3. Measure better the hardware and software progress that we see (e.g. some efforts at AI Impacts, MIRI, MIRI and MIRI)
4. Investigate the extent to which hardware and software can substitute (I describe more projects here)
5. Investigate the likely timing of whole brain emulation (the Whole Brain Emulation Roadmap is the main work on this)
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

# How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about how collaboration and competition affect the strategic picture. To prepare, read “Collaboration” from Chapter 14 The discussion will go live at 6pm Pacific time next Monday 23 March. Sign up to be notified here.

## Efficient Open Source

10 15 March 2015 09:24PM

I've heard a quote from Richard Hamming "What are the important Problems in your field? And why aren't you working on them?"  Humans are not immediately strategic, so when I first heard this quote, I was shocked and started wondering why I wasn't working on anything like that?  Then I wondered: Why wasn't EVERYONE working on something like that?

One of the things I like about efficient charity is that it solves the problem of "What charity should I spend money on?"  Check GiveWell and give money to the charity that saves the most lives.  So I've been thinking, why don't we have a solution like this for open source programming?

I want to try to create a site for tracking projects that need programming help. If everyone could run down the list of 'important projects' until they found something they thought they could do I think that might lead to more important things getting done.  Who knows, if the idea works out we could expand it to efficient free time for people who want something productive to do in their free time.

I'd also like to enlist lesswrong's help.  Is there already a site/service like this but no one uses it?  And is there anything else that the site would need that I haven't thought about?  An important Project Tracker would need:

• It should estimate how important the project is. Is it user voting?  Do we have some way to measure importance? It would seem pretty silly if X-risks got voted as 'lower importance' than games.
• It should estimate how many people are working on the project.  We want to bring attention to projects that need more people, not projects that are saturated with developers. Aim for high marginal utility.
• It should estimate how hard the project is.  If it's important and easy, we should probably do it first. marginal utility again.
• Does the project Require special skills?  I bet a lot more people could throw together a website than could meaningfully contribute friendly AI even if Friendly AI is more important.
• Are there multiple groups pursuing this project? (Look at all those Linux distributions!) How well are the different groups doing?  Is it better to contribute to one of those groups or start a new one? Which groups are easier to work with, are closer to their goals, or have been abandoned.
• There should probably be a way to break projects apart and link them together. There are lots of different open problems in AI that could be solved separately, but they should also be linked together somehow.  And there should probably be some way for to divide work at different skill levels so that people who aren't math geniuses can help with the 'grunt work coding' for things like friendly AI.
• How feasible is the project?  Solving P=NP would be really awesome, but nobody has any ideas how they might solve that problem.
• It should be enjoyable enough to use that it attracts a community.  Stack Overflow turned answering programming questions into a game and it became quite successful.
• Are there rewards for working on this?  Would it make it easier to get a job in a particular job or industry?  Would it help with X-risks or otherwise make you famous?  I originally thought that it was so important to avoid tampering with the votes that it was not even worth mentioning.  But actually, some intentional tampering would be really useful.  If people could put bounties on important jobs, that would incentivize those jobs.  (There is an x-prize for space travel.  Why isn't there an x-prize of friendly AI?)

## Crude measures

9 27 March 2015 03:44PM

A putative new idea for AI control; index here.

People often come up with a single great idea for AI, like "complexity" or "respect", that will supposedly solve the whole control problem in one swoop. Once you've done it a few times, it's generally trivially easy to start taking these ideas apart (first step: find a bad situation with high complexity/respect and a good situation with lower complexity/respect, make the bad very bad, and challenge on that). The general responses to these kinds of idea are listed here.

However, it seems to me that rather than constructing counterexamples each time, we should have a general category and slot these ideas into them. And not only have a general category with "why this can't work" attached to it, but "these are methods that can make it work better". Seeing the things needed to make their idea better can make people understand the problems, where simple counter-arguments cannot. And, possibly, if we improve the methods, one of these simple ideas may end up being implementable.

## Crude measures

The category I'm proposing to define is that of "crude measures". Crude measures are methods that attempt to rely on non-fully-specified features of the world to ensure that an underdefined or underpowered solution does manage to solve the problem.

To illustrate, consider the problem of building an atomic bomb. The scientists that did it had a very detailed model of how nuclear physics worked, the properties of the various elements, and what would happen under certain circumstances. They ended up producing an atomic bomb.

The politicians who started the project knew none of that. They shovelled resources, money and administrators at scientists, and got the result they wanted - the Bomb - without ever understanding what really happened. Note that the politicians were successful, but it was a success that could only have been achieved at one particular point in history. Had they done exactly the same thing twenty years before, they would not have succeeded. Similarly, Nazi Germany tried a roughly similar approach to what the US did (on a smaller scale) and it went nowhere.

So I would define "shovel resources at atomic scientists to get a nuclear weapon" as a crude measure. It works, but it only works because there are other features of the environment that are making it work. In this case, the scientists themselves. However, certain social and human features about those scientists (which politicians are good at estimating) made it likely to work - or at least more likely to work than shovelling resources at peanut-farmers to build moon rockets.

In the case of AI, advocating for complexity is similarly a crude measure. If it works, it will work because of very contingent features about the environment, the AI design, the setup of the world etc..., not because "complexity" is intrinsically a solution to the FAI problem. And though we are confident that human politicians have some good enough idea about human motivations and culture that the Manhattan project had at least some chance of working... we don't have confidence that those suggesting crude measures for AI control have a good enough idea to make their idea works.

It should be evident that "crudeness" is on a sliding scale; I'd like to reserve the term for proposed solutions to the full FAI problem that do not in any way solve the deep questions about FAI.

## More or less crude

The next question is, if we have a crude measure, how can we judge its chance of success? Or, if we can't even do that, can we at least improve the chances of it working?

The main problem is, of course, that of optimising. Either optimising in the sense of maximising the measure (maximum complexity!) or of choosing the measure that is most extreme fit to the definition (maximally narrow definition of complexity!). It seems we might be able to do something about this.

Let's start by having AI create sample a large class of utility functions. Require them to be around the same expected complexity as human values. Then we use our crude measure μ - for argument's sake, let's make it something like "approval by simulated (or hypothetical) humans, on a numerical scale". This is certainly a crude measure.

We can then rank all the utility functions u, using μ to measure the value of "create M(u), a u-maximising AI, with this utility function". Then, to avoid the problems with optimisation, we could select a certain threshold value and pick any u such that E(μ|M(u)) is just above the threshold.

How to pick this threshold? Well, we might have some principled arguments ("this is about as good a future as we'd expect, and this is about as good as we expect that these simulated humans would judge it, honestly, without being hacked").

One thing we might want to do is have multiple μ, and select things that score reasonably (but not excessively) on all of them. This is related to my idea that the best Turing test is one that the computer has not been trained or optimised on. Ideally, you'd want there to be some category of utilities "be genuinely friendly" that score higher than you'd expect on many diverse human-related μ (it may be better to randomly sample rather than fitting to precise criteria).

You could see this as saying that "programming an AI to preserve human happiness is insanely dangerous, but if you find an AI programmed to satisfice human preferences, and that other AI also happens to preserve human happiness (without knowing it would be tested on this preservation), then... it might be safer".

There are a few other thoughts we might have for trying to pick a safer u:

• Properties of utilities under trade (are human-friendly functions more or less likely to be tradable with each other and with other utilities)?
• If we change the definition of "human", this should have effects that seem reasonable for the change. Or some sort of "free will" approach: if we change human preferences, we want the outcome of u to change in ways comparable with that change.
• Maybe also check whether there is a wide enough variety of future outcomes, that don't depend on the AI's choices (but on human choices - ideas from "detecting agents" may be relevant here).
• Changing the observers from hypothetical to real (or making the creation of the AI contingent, or not, on the approval), should not change the expected outcome of u much.
• Making sure that the utility u can be used to successfully model humans (therefore properly reflects the information inside humans).
• Make sure that u is stable to general noise (hence not over-optimised). Stability can be measured as changes in E(μ|M(u)), E(u|M(u)), E(v|M(u)) for generic v, and other means.
• Make sure that u is unstable to "nasty" noise (eg reversing human pain and pleasure).
• All utilities in a certain class - the human-friendly class, hopefully - should score highly under each other (E(u|M(u)) not too far off from E(u|M(v))), while the over-optimised solutions - those scoring highly under some μ - must not score high under the class of human-friendly utilities.

This is just a first stab at it. It does seem to me that we should be able to abstractly characterise the properties we want from a friendly utility function, which, combined with crude measures, might actually allow us to select one without fully defining it. Any thoughts?

And with that, the various results of my AI retreat are available to all.

9 16 March 2015 07:16PM

"Gödel, Escher, Bach" by Douglas R. Hofstadter is the most awesome book that I have ever read. If there is one book that emphasizes the tragedy of Death, it is this book, because it's terrible that so many people have died without reading it."

- Eliezer Yudkowsky

Gödel, Escher, Bach: An Eternal Golden Braid is an excellent primer on subjects related to artificial intelligence, like self-reference, metamathematics, formal rules and cognitive science. Yudkowsky has said about the book that it's "the best and most beautiful book ever written by the human species." It was the book that Yudkowsky said launched his career in artificial intelligence and cognitive science.

That's why we decided to start a read-through of it in reddit.com/r/rational. If you have any interest in analyzing or discussing the book, join us!

## A counterfactual and hypothetical note on AI safety design

9 11 March 2015 04:20PM

A putative new idea for AI control; index here.

A lot of the new ideas I've been posting could be parodied as going something like this:

The AI A, which is utility indifferent to the existence of AI B, has utility u (later corriged to v', twice), and it will create a subagent C which believe via false thermodynamic miracles that D does not exist, while D' will hypothetically and counterfactually use two different definitions of counterfactual so that the information content of its own utility cannot be traded with a resource gathering agent E that doesn't exist (assumed separate from its unknown utility function)...

What is happening is that I'm attempting to define algorithms that accomplish a particular goal (such as obeying the spirit of a restriction, or creating a satisficer). Typically this algorithm has various underdefined components - such as inserting an intelligent agent at a particular point, controlling the motivation of an agent at a point, effectively defining a physical event, or having an agent believe (or act as if they believed) something that was incorrect.

The aim is to reduce the problem from stuff like "define human happiness" to stuff like "define counterfactuals" or "pinpoint an AI's motivation". These problems should be fundamentally easier - if not for general agents, then for some of the ones we can define ourselves (this may also allow us to prioritise research directions).

And I also have no doubt that once a design is available, that it will be improved upon and transformed and made easier to implement and generalise. Therefore I'm currently more interested in objections of the form "it won't work" than "it can't be done".

## Why the culture of exercise/fitness is broken and how to fix it

9 10 March 2015 11:24AM

### Summary

The culture of exercise/fitness suffers from a motivation (and discipline) problem. Sports turn it around and make it actually fun, lessening the need for those.

#### The problem

Despite the the media and blogs and forums talking incredibly a lot about exercise/fitness, not everybody is motivated to do it. Of those who do, some hate it and will stop once the motivation runs out. Some try to go from motivation to discipline to fight it, but it is an uphill fight: why exercise/fitness cannot be fun you want to do, not a second grinding job you must do? The need for discipline or motivation suggests it is no fun.  Some people try to turn boring exercise into a game with e.g. Fitocracy. However there is an old and time-tested way to do it: actual sports, not necessarily competitively.

#### The approximate reason

Sports coaches have used strength and cardio training as a part of their toolbox since a long time. Exercise/fitness looks a lot like basically taking out this aspect of sport trainings and leaving out the rest. [1][2]

However by doing so we lose three important motivators:

1) Playing sports - not necessarily on competitive levels! -  is fun (for some at least), and in itself they work as cardio or HIIT training. Think playing soccer or martial arts sparring or playing tennis, hard effort and rest intervals, quite HIIT-ish, and HIIT training seems to confer similar benefits to weight lifting  - it is usually considered anaerobic.

2) When doing exercise/fitness our motivation is usually to be healthier, feel better, and be sexier. When doing strength or cardio as part of a sports coaching / training, we have a fourth motivator: to be actually better at playing at that sport. One big issue with exercise/fitness is that in an comfy urban life we do not actually use strength or endurance, and this is a huge demotivator (for me at least, I suspect others too): it feels a bit like growing a pretty but useless third leg. Why be stronger, when spending that time studying is more efficient for real world success? (The worst offender is the CrossFit movement: it makes you fit for all kinds of purposes, most of which you will never pursue!) Health, mood and sexiness are good reasons, but still doing a sport where we can use strength or endurance for some real goal makes a whole world of difference - for me at least. What do you think your "inner caveman" wants: to e.g. sprint because it makes him healthier, sexier and puts him in a better mood or to actually catch that deer, which means: to motivate your "inner caveman", you better find yourself some actual deer to catch, some actual use-goal to pursue, and this is why sports work: the goal is to win a friendly match or something similar. Remember: to be fit means to be fit for something, and you need a something.

3) Commands and camaraderie. When the coach yells 25 push-ups and 20 people hit the ground and do it is a very different motivator than just bargaining with yourself to do it at home or to do a bit more before going home from the gym. Yes, you can hire personal trainers or find training buddies. But this is IMHO less ingrained (at least in Mitteleuropa) in the culture of fitness/exercise than in actual sports. Group coaching/training has a military feel to it, and militaries tend to be efficient at figuring these things out, for them it is a life and death issue.

Let's stop and reflect a bit on the weirdness of it all... for example people are using apps like Fitocracy and HabitRPG to turn their boring exercises into an RPG videogame, to gain XP and level up... when in reality martial arts belt tests have always been precisely that. Less formally, but rankings exist in sports all the way, for example just training and for fun -> allowed to play training matches against the other half of our team -> allowed to play training matches with other teams -> allowed to play in friendly matches and so on, or in boxing sandbag -> mittens -> light sparring -> full force sparring -> training match -> match or for Alpine skiers being allowed to go to green/blue/red/black pistes.

Sports are already an RPG, so why do we had to take strength and cardio training out of sports and just do these exercises without doing the sport part, then finding out it is boring and demotivating and turning it into an RPG again? Does this even makes sense? And if not, why did it ever happen so?

#### Why it happened so 1.

For busy and highly motivated people, exercise/fitness works. If 2 hours of sports coaching means 0.5 hour strength, 0.5 hour cardio, 0.5 hour technique and 0.5 playing, they may as well take the first hour and leave the second i.e. go to a gym, not coaching. It works if you are Elon Musk or anywhere close to making someone like him your goal model.

It obviously does not work well for people who are in this sense more typical i.e. more free time to kill (goals do not fill out time) but less willpower/motivation/discipline. I think fitness/exercise culture was generated by those highly succesful people. Esp. by fitness trainers who are almost fanatical about it in their own lives.

#### Why it happened so 2.

When I say "sports" a lot of people here "competing". "Being good enough to compete", "investing enough time to compete". They simply don't find anywhere near them half-serious, kinda-recreational sports opportunities where they are still pushed fairly hard by the coach, but they are not expected to compete much and not expected to be good at it.

To put it differently, we should draw a clear line between competitive sports where if you are like an "employee" of the coach and if not good enough the coach simply does not want you in his team, and recreational sports where you are a customer of the coach/trainer, you pay him to make you better at the sport than you previously were, no matter how bad that previous level was. I am talking about the second.

In Europe, anecdotally, without statistical evidence at hand, recreationally coached sports unions (Vereins in German) seems to be on the decline, body-building gyms attracting people away. Perhaps in America the whole culture is so competitive that they were never really a thing?

However AFAIK many recreationally coached sports are still available. We should make a full list in the comments but I can find two examples:

- Tennis and squash. However you will not be pushed much towards strength traning. But still, you get to the point where you want to hit the ball harder and it motivates you for strength training.

- Martial arts. Choose any you like, because liking is more important for the beginning than benefits, and it is not a final choice, learning multiple ones and later mixing them is the idea behind MMA. If and only if you think the art you chose is too easy on the strength training side, not pushing you enough, look towards BJJ, boxing or MMA, they seem to be the "buffest" ones around i.e. where coashes push you to strength training the most.

#### Why it happened so 3.

Back in 1960's or so obesity was not really an issue (less than half, PDF) like today is globally, or at least people were not very conscious about it. Some people did sports but otherwise exercise and fitness flew below the radar. It was roughly in in the later 70's, early 80's when people started to pay attention to exercise and fitness.  When Arnold's Pumping Iron popularized bod body building in 1977, it was not mainly about making obese men and women thinner but about making thin, scrawny men more muscular. The media bought into this new body image (fun homework idea: how Stallone went buffer during the Rocky series, reflecting the expectations of the media or the public, or how Spiderman comics changed 1960 to 1990). In 1982 Jane Fonda launched the aerobic movement, this was more directed on making overweight women thinner. This is roughly the recent origins of the fitness/exercise culture.

As of today, at least if fitness.reddit.com is a reliable predictor of opinions, "Arnold" won, "Jane" lost i.e. todays idea of fitness is more based on weight lifting / body building, including women, than aerobic in the Jane Fonda sense of the term. There are good scientific reasons for it, in a nutshell: weight lifting's effect on metabolic rate, insuline sensitivity, leptin sensitivity, HGH and T,  all having an effect on fat, and both building muscles and having them around having an effect on fat (source: book).

The issue is, people see sports like playing volleyball closer to "Jane" than "Arnold". With "Jane" losing to "Arnold", not only aerobic-cardio lost to weight training, but doing actual sports lost to going to gyms.

But let's not forget that the Arnoldian Way was originally about not being scrawny, not about not being fat, which brings us to:

#### Another weird factor

While todays main problem is obesity; popular, Arnoldian fitness culture is originally based on gaining muscle, not on losing fat. While gaining muscle is an excellent way to lose fat, probably the best way if you have the motivation/discipline for it (reasons see in above ref. book), the simple truth is that if a fat guy or gal just plays basketball 3 x 1 hours a week, he or she will gain muscle simply by throwing his or her heavy body around, with high enough body weight much of "cardio" doubles as strength training. For the skeptical, do you cardio while a petite woman is sitting on top of you - that is what it is like for  fat people.

Example: scrawny, ectomorph, hardgainer lifters often complain about calves being hard to grow. My cousin's "secret tip" for brutal calves is 1) be an obese 140 kg man 2) play soccer once a week.  (Don't  even think about suddenly starting playing soccer if you are that obese! Rather he went from 70kg to 140 during 10 years while playing soccer once a week and basically his joints gradually adapted to the weight!)

If a fat man or woman needs motivation or discipline to go to the gym to do boring gym stuff, but loves to play volleyball and has a chance to play 3 x 1 hours a week, he or she will put on muscle. After 3 months, being in a good, proud, and enthusiastic mood about being an active person now, he or she can easily add a push-up program (with handles, better for wrists, this but repeat every week 3-4 times over) on the rest of the days will result in respectable shoulders, pecs, arms.  But the order of things is important here, first do enjoyable activity, then leverage the good feels and pride into less enjoyable and more efficient ones!

Frankly most fitness trainers, bloggers don't really understand this. Typcially the are ex-scrawny guys desperate to gain, who consider in our example volleyball a just cardio (it is with 60-80kg but with 130kg it gives you quite some leg muscles), and push-ups not very efficient (again at 60-80 kg not, at 130 kg yes, ask a petite woman to sit on your back and try it that way and suddenly you will respect it more!).

#### The expanse of the problem

How widespread is the problem I am describing? Well, even here on LW, an otherwise excellently written article ignores the motivation and fun angle and talks about starting with a body weight routine then graduating to weight lifting and cardio, ignoring that 1) for many people these are boring activities 2) without having a sport goal, we do not use strength or endurance in a comfy urban existence and simply health, sexiness and good mood are not always strong motivators. I think the article can be "excused"  - quotes because I am not actually making a judgement here, hence no excuse is needed: it is an excellent article coming from a fitness / exercise culture where apparently EVERYBODY has this blind spot, I would never think about singling out the author and blaming him for it! -, anyway, this article can be explained by being written for typical "Bay Area Rationalists", wannabee Elon Musks who already have a lot of motivation and willpower and discipline but not enough free time, having a lot of goals. So they want to be time-efficient, not fun-efficient.

But yeah, for Average Guy or Gal where the opportunity cost is less time killed in front of the TV,  fun-efficient is more important than time-efficient. Making it twice as long but twice as fun makes them want to do it more. And hence it must be sports.

#### Solutions

If you are content with your exercise habits, you don' have a problem or not one this article can solve.

If your problem is that you are thin, scrawny (NOT skinnyfat), ectomorph, hardgainer, I cannot help you much: you need weight lifting / body building / advanced gymnastics, the only possible way out from gym culture I can think of is rock or wall or boulder climbing or parkour, basically figure out fun ways to efficiently use your low body weight as a resistance. But other than that no news here. Except one, but that would be better discussed in a separate article: why are many young men unhappy with being thinly athletic? Media image, surely, but I have an hunch it is not about looking better but about looking and feeling more respectable, and this may be a different kind of problem. I see 17 years old guys desperate to put on muscle not just to look better but also in order to be treated not like a boy but like a respectable adult man. And I see 40 years old guys who are more like, if I am not fat and have the cardio endurance to play tennis then I am OK, I am already a "someone", I have nothing to prove. Do you see anything like this?  But I think this requires a separate article to discuss.

If you are fat (or skinnyfat), and struggle with the motivation / discipline to exercise, I can help you. Forget fitness and exercise and start a sport you like.

##### Algorithm for deciding what sport you like

1) Do you watch any sports in TV or play them in videogames?

2) Do you watch action movies that involve one kind of fighting or another, or play suchlike in videogames? If yes, martial arts, for starters, an unarmed one, but let's not ignore the magnetic effect medieval longsword fighting tends to have on "geeks", quite possibly the only truly likable sport for RPG or fantasy or history fans. Watch this then Google "HEMA mycity" or even "Liechtenauer mycity" (He was the originator of that late Medieval tradition that produced the most often used longsword fencing books and longsword fighting clubs often mention the Liechtenauer tradition / school on their site. Other good choices: Fiore, Marozzo.)

3) Failing these, you may not want to do a sport as such, but maybe you still want to be with friends or coworkers who do it, and enjoy doing it with them, so ask around.

4) Whatever is close to your home or work.

5) If nothing helps and you are really clueless martial arts. You are an animal. There is _some_ size of fight in every dog.

6) If you are disabled, your choices are limited, from none at all to wheelchair basketball.

##### The motivation problem solving itself

The important thing is that you don't need motivation to do the sport, you just need the motivation to drag your butt to the training session, basically to "show up". Then you can just surrender your will to the trainer. Be a remote controlled robot, a zombie without will, executing the will of the trainer, Kadavergehorsam. This will get you through the first, sucky part of the session, cardio and strength training. Then the technique training will be more interesing and you will be glad for the rest, and finally you ge to play, spar, do the actual sport, enjoy the fun, and this sends you home with good feelings, eager for the next session. Most trainers I know in martial arts or soccer use this structure: warm up with cardio, do strength, do technique which doubles as a rest, and finally play and enjoy. This because it makes sense for the body, but also it is psychologically motivating, go through the sucky parts and then do the rewarding parts. Actual sports trainers seem to care more about motivation and psychology than the blogs of fitness trainers...

Oh BTW I purposefully formulated the robot-zombie-Kadavergehorsam part so that a lot of people shudder reading it and feel bad about it: 21st century people tend to value their autonomy... but it is a thing people like me need to face and better sooner than later. You need cardio and strength, there are some sports you can just warm up and then play it, but still you need to warm up and for an untrained person that leads to some panting. So there are only two ways: you want to do cardio and strength, or you don't but you surrender your will to someone who wants you to do it. How else you think your body will do it? It needs a control unit, and you have two choices: your will (motivation, discipline) or the will of someone else. If you hate yourself for your laziness, and I do, surrendering to someone you actually respect does not sound that bad once you get past the idea that it is in this age of autonomy "weird".

Another rule I want to recommend is one habit at one time. Exercise routines consisting of 4-5 different exercises are IMHO harder to stick to, it is easier to obssess about one thing. For this reason, if you started your sport, for a few weeks or months until you feel it is an organic feature of your life you would miss if you stopped, don't try other exercises! Keep yourself back, hold back on your newfound enthusiasm! It is similar to Pavel Tsatsouline saying (cannot find the source, sorry) to not train to failure, stop a training before you are exhausted, stop a training when there is still some hunger in you for more, so that you are motivated to do it the next time. The same way, if in the first weeks your enthusiasm makes you hunger for more, just stay hungry, do not satisfy it, let this motivation carry you until your sport is an ingrained habit.

##### The next step

Okay, so you are doing a sport 2-3 times a week for 2-3 months now, it is an ingrained habit now, you are getting in a better mood, being more proud of yourself, and you feel you can now do more. Look into what kind of muscles can you use for your sport, and what kind of muscles your sport does not develop enough! The answer is very often this: you jump around on your feet all the time, gravity is training your legs all right during your sport (again, I am talking about fat people, jumping around with 130 kg is different than with 65 kg), you are probably twisting and turning (training the abs), but your arms, upper body is not trained enough, and yet precisely this is what you need to hit a ball forcefully, throw a ball forcefull, or to throw a punch. In other words, for many sports, time to do push-ups at home on the non-training days. This trains the right kind of muscles for this.

Two things to consider here. One, I am not talking about a bodyweight routine: only push-ups. One habit, one obsession at one time! Make the choice simple for your brain: either I am at rest, or down the floor going up and down, no third choice! This is the secret for habit forming for me: don't fatigue your brain with having to choose to do 4-5 things, just 1 thing! Second, I am recommending to start the push-ups only months after you started a fun sport, not before, you will have way more motivation that way: now it is not just about health, sexiness and mood but about actually using that kind of strength for something, plus you are in a better mood and you have more self-respect, you are more in a can-do, want-to mood!  As above: with handles, better for wrists, this but repeat every week 3-4 times over if you are fat.

##### Parting thoughts

From that on - you are on your own. Once you hit the 100 mark, and do your sport 3 times a week, and all this coupled with a good diet, you probably have everything you need psychologically to go on your own way.

And be aware that we knew all this in the 1960's or so. People - well, at least boys - were pushed to do sports. I hope we can bring it back, and for people who do not have unusual amounts of motivation or discipline, body-building or weight-lifting will not be seen as the alternative to being a couch potato, but rather both as separate, special sports for those who specifically like them, and for everybody else just a part of their sport training that aims, primarily, at being fit to play or spar or  occasionally compete in stuff that is fun.

##### For people with alcohol problems

You probably want to both stop drinking and start a sport or exercise routine. Being hung over is a huge exercise demotivator, and exercise makes it easier to deal with the depression / bad mood of cravings. Where to start? For me, stopping drinking then starting sports or exercising did not work, I could not deal with the bad mood. Joining a boxing gym while still drinking, dragging my hung-over ass to the session, using my own volition only to go there than handing it over to the trainer, cursing myself while the trainer made me sweat and burp it out, yet being in a better mood the day after and feeling more proud of myself and not like a worthless piece of feces, made me - fingers crossed - gather the strenght to quit. It is only a few days ago but I mention it as a form of public commitment. So if stopping first then doing sports or exercising did not work, the other way around may still work for you, and remember, with a coached recreational sport, you don't need to have the willpower, motivation or discipline to do it! You just need enough motivation to show up, and then simply you surrender your will to the trainer, you need to make no more choices.

Adamzerner makes a good point about how skill differences cause problems in team sports, suggesting, to me, to better start with an individual / pair one practiced in group settings (martial arts, tennis or squash courses with other people, dancing and so on) at least until your motoric skills, coordination, speed, cardio picks up.

* * *

Footnotes remarks:

[1] Not actually historically / chronologically so.
[2] Yes, weight lifting, body building, running are actual sports as well, but they are used by many, many more people than those who wish to compete in them either for sports training or just exercise/fitness.

## Impartial ethics and personal decisions

9 08 March 2015 12:14PM

Some moral questions I’ve seen discussed here:

• A trolley is about to run over five people, and the only way to prevent that is to push a fat bystander in front of the trolley to stop it. Should I?
• Is it better to allow 3^^^3 people to get a dust speck in their eye, or one man to be tortured for 50 years?
• Who should I save, if I have to pick between one very talented artist, and five random nobodies?
• Do I identify as an utilitarian? a consequentialist? a deontologist? a virtue ethicist?

Yet I spend time and money on my children and parents, that may be “better” spent elsewhere under many moral systems. And if I cared as much about my parents and children as I do about random strangers, many people would see me as somewhat of a monster.

In other words, “commonsense moral judgements” finds it normal to care differently about different groups; in roughly decreasing order:

• immediate family
• friends, pets, distant family
• neighbors, acquaintances, coworkers
• fellow citizens
• foreigners
• sometimes, animals
• (possibly, plants...)
… and sometimes, we’re even perceived as having a *duty* to care more about one group than another (if someone saved three strangers instead of two of his children, how would he be seen?).

In consequentialist / utilitarian discussions, a regular discussion is “who counts as agents worthy of moral concern” (humans? sentient beings? intelligent beings? those who feel pain? how about unborn beings?), which covers the later part of the spectrum. However I have seen little discussion of the earlier part of the spectrum (friends and family vs. strangers), and it seems to be the one on which our intuitions agree the most reliably - which is why I think it deserves more of our attention (and having clear ideas about it might help about the rest).

Let’s consider two rough categories of decisions:

• impersonal decisions: what should government policy be? By what standard should we judge moral systems? On which cause is charity money best spent? Who should I hire?
• personal decisions: where should I go on holidays this summer? Should I lend money to an unreliable friend? Should I take a part-time job so I can take care of my children and/or parents better? How much of my money should I devote to charity? In which country should I live?

Impartial utilitarianism and consequentialism (like the question at the head of this post) make sense for impersonal decisions (including when an individual is acting in a role that require impartiality - a ruler, a hiring manager, a judge), but clash with our usual intuitions for personal decisions. Is this because under those moral systems we should apply the same impartial standards for our personal decisions, or because those systems are only meant for discussing impersonal decisions, and personal decisions require additional standards ?

I don’t really know, and because of that, I don’t know whether or not I count as a consequentialist (not that I mind much apart from confusion during the yearly survey; not knowing my values would be a problem, but not knowing which label I should stick on them? eh, who cares).

I also have similar ambivalence about Effective Altruism:

• If it means that I should care as much about poor people in third world countries than I do about my family and friends, then it’s a bit hard to swallow.
• However, if it means that assuming one is going to spend money to help people, one should better make sure that money helps them in the most effective way possible.

Scott’s “give ten percent” seems like a good compromise on the first point.

So what do you think? How does "caring for your friend’s and family" fit in a consequentialist/utilitarian framework ?

Other places this has been discussed:

• This was a big debate in ancient China, between the Confucians who considered it normal to have “care with distinctions” (愛有差等), whereas Mozi preached “universal love” (兼愛) in opposition to that, claiming that care with distinctions was a source of conflict and injustice.
• Impartiality” is a big debate in philosophy - the question of whether partiality is acceptable or even required.
• The philosophical debate between “egoism and altruism” seems like it should cover this, but it feels a bit like a false dichotomy to me (it’s not even clear whether “care only for one’s friends and family” counts as altruism or egoism)
• Special obligations” (towards Friends and family, those one made a promise to) is a common objection to impartial, impersonal moral theories
• The Ethics of Care seem to cover some of what I’m talking about.
• A middle part of the spectrum - fellow citizens versus foreigners - is discussed under Cosmopolitanism.
• Peter Singer’s “expanding circle of concern” presents moral progress as caring for a wider and wider group of people (counterpoint: Gwern's Narrowing Circle) (I haven't read it, so can't say much)

Other related points:

• The use of “care” here hides an important distinction between “how one feels” (My dog dying makes me feel worse than hearing about a schoolbus in China falling off a cliff) and “how one is motivated to act” (I would sacrifice my dog to save a schoolbus in China from falling off a cliff). Yet I think we have the gradations on both criteria.
• Hanson’s “far mode vs. near mode” seems pretty relevant here.

## In memory of Leonard Nimoy, most famous for playing the (straw) rationalist Spock, what are your top 3 ST:TOS episodes with him?

9 27 February 2015 08:57PM

Hopefully at least one or two would show a virtue of non-straw rationality.

Episode list

## Indifferent vs false-friendly AIs

8 24 March 2015 12:13PM

A putative new idea for AI control; index here.

For anyone but an extreme total utilitarian, there is a great difference between AIs that would eliminate everyone as a side effect of focusing on their own goals (indifferent AIs) and AIs that would effectively eliminate everyone through a bad instantiation of human-friendly values (false-friendly AIs). Examples of indifferent AIs are things like paperclip maximisers, examples of false-friendly AIs are "keep humans safe" AIs who entomb everyone in bunkers, lobotomised and on medical drips.

The difference is apparent when you consider multiple AIs and negotiations between them. Imagine you have a large class of AIs, and that they are all indifferent (IAIs), except for one (which you can't identify) which is friendly (FAI). And you now let them negotiate a compromise between themselves. Then, for many possible compromises, we will end up with most of the universe getting optimised for whatever goals the AIs set themselves, while a small portion (maybe just a single galaxy's resources) would get dedicated to making human lives incredibly happy and meaningful.

But if there is a false-friendly AI (FFAI) in the mix, things can go very wrong. That is because those happy and meaningful lives are a net negative to the FFAI. These humans are running dangers - possibly physical, possibly psychological - that lobotomisation and bunkers (or their digital equivalents) could protect against. Unlike the IAIs, which would only complain about the loss of resources to the FAI, the FFAI finds the FAI's actions positively harmful (and possibly vice versa), making compromises much harder to reach.

And the compromises reached might be bad ones. For instance, what if the FAI and FFAI agree on "half-lobotomised humans" or something like that? You might ask why the FAI would agree to that, but there's a great difference to an AI that would be friendly on its own, and one that would choose only friendly compromises with a powerful other AI with human-relevant preferences.

Some designs of FFAIs might not lead to these bad outcomes - just like IAIs, they might be content to rule over a galaxy of lobotomised humans, while the FAI has its own galaxy off on its own, where its humans take all these dangers. But generally, FFAIs would not come about by someone designing a FFAI, let alone someone designing a FFAI that can safely trade with a FAI. Instead, they would be designing a FAI, and failing. And the closer that design got to being FAI, the more dangerous the failure could potentially be.

So, when designing an FAI, make sure to get it right. And, though you absolutely positively need to get it absolutely right, make sure that if you do fail, the failure results in a FFAI that can safely be compromised with, if someone else gets out a true FAI in time.

## Identity and quining in UDT

8 17 March 2015 08:01PM

Outline: I describe a flaw in UDT that has to do with the way the agent defines itself (locates itself in the universe). This flaw manifests in failure to solve a certain class of decision problems. I suggest several related decision theories that solve the problem, some of which avoid quining thus being suitable for agents that cannot access their own source code.

EDIT: The decision problem I call here the "anti-Newcomb problem" already appeared here. Some previous solution proposals are here. A different but related problem appeared here.

Updateless decision theory, the way it is usually defined, postulates that the agent has to use quining in order to formalize its identity, i.e. determine which portions of the universe are considered to be affected by its decisions. This leaves the question of which decision theory should agents that don't have access to their source code use (as humans intuitively appear to be). I am pretty sure this question has already been posed somewhere on LessWrong but I can't find the reference: help? It also turns out that there is a class of decision problems for which this formalization of identity fails to produce the winning answer.

When one is programming an AI, it doesn't seem optimal for the AI to locate itself in the universe based solely on its own source code. After all, you build the AI, you know where it is (e.g. running inside a robot), why should you allow the AI to consider itself to be something else, just because this something else happens to have the same source code (more realistically, happens to have a source code correlated in the sense of logical uncertainty)?

Consider the following decision problem which I call the "UDT anti-Newcomb problem". Omega is putting money into boxes by the usual algorithm, with one exception. It isn't simulating the player at all. Instead, it simulates what would a UDT agent do in the player's place. Thus, a UDT agent would consider the problem to be identical to the usual Newcomb problem and one-box, receiving \$1,000,000. On the other hand, a CDT agent (say) would two-box and receive \$1,000,1000 (!) Moreover, this problem reveals UDT is not reflectively consistent. A UDT agent facing this problem would choose to self-modify given the choiceThis is not an argument in favor of CDT. But it is a sign something is wrong with UDT, the way it's usually done.

The essence of the problem is that a UDT agent is using too little information to define its identity: its source code. Instead, it should use information about its origin. Indeed, if the origin is an AI programmer or a version of the agent before the latest self-modification, it appears rational for the precursor agent to code the origin into the successor agent. In fact, if we consider the anti-Newcomb problem with Omega's simulation using the correct decision theory XDT (whatever it is), we expect an XDT agent to two-box and leave with \$1000. This might seem surprising, but consider the problem from the precursor's point of view. The precursor knows Omega is filling the boxes based on XDT, whatever the decision theory of the successor is going to be. If the precursor knows XDT two-boxes, there is no reason to construct a successor that one-boxes. So constructing an XDT successor might be perfectly rational! Moreover, a UDT agent playing the XDT anti-Newcomb problem will also two-box (correctly).

To formalize the idea, consider a program $P$ called the precursor which outputs a new program $A$ called the successor. In addition, we have a program $U$ called the universe which outputs a number $U()$ called utility.

Usual UDT suggests for $A$ the following algorithm:

(1) $A(i):=(\underset{f:I \rightarrow O}{\arg\max} \: E[U()|\forall j \in I: A(j)=f(j)])(i)$

Here, $I$ is the input space, $O$ is the output space and the expectation value is over logical uncertainty. $A$ appears inside its own definition via quining.

The simplest way to tweak equation (1) in order to take the precursor into account is

(2) $A(i):=(\underset{f:I \rightarrow O}{\arg\max} \: E[U()|\forall j \in I: P()(j)=f(j)])(i)$

This seems nice since quining is avoided altogether. However, this is unsatisfactory. Consider the anti-Newcomb problem with Omega's simulation involving equation (2). Suppose the successor uses equation (2) as well. On the surface, if Omega's simulation doesn't involve $P$1, the agent will two-box and get \$1000 as it should. However, the computing power allocated for evaluation the logical expectation value in (2) might be sufficient to suspect $P$'s output might be an agent reasoning based on (2). This creates a logical correlation between the successor's choice and the result of Omega's simulation. For certain choices of parameters, this logical correlation leads to one-boxing.

The simplest way to solve the problem is letting the successor imagine that $P$ produces a lookup table. Consider the following equation:

(3) $A(i):=(\underset{f:I \rightarrow O}{\arg\max} \: E[U()|P()=LUT(f))(i)$

Here, $LUT(f)$ is a program which computes $f$ using a lookup table: all of the values are hardcoded.

For large input spaces, lookup tables are of astronomical size and either maximizing over them or imagining them to run on the agent's hardware doesn't make sense. This is a problem with the original equation (1) as well. One way out is replacing the arbitrary functions $f: I \rightarrow O$ with programs computing such functions. Thus, (3) is replaced by

(4) $A(i):=(\underset{\pi}{\arg\max} \: E[U()|P()=\pi)(i)$

Where $\pi$ is understood to range over programs receiving input in $I$ and producing output in $O$. However, (4) looks like it can go into an infinite loop since what if the optimal $\pi$ is described by equation (4) itself? To avoid this, we can introduce an explicit time limit $T$ on the computation. The successor will then spend some portion $T_1$ of $T$ performing the following maximization:

(4') $A(i):=(\underset{\pi}{\arg\max} \: E[U()|P()=S_{T_1}(\pi))(i)$

Here, $S_{T_1}(\pi)$ is a program that does nothing for time $T_1$ and runs $\pi$ for the remaining time $T_2=T-T_1$. Thus, the successor invests $T_1$ time in maximization and $T_2$ in evaluating the resulting policy $\pi$ on the input it received.

In practical terms, (4') seems inefficient since it completely ignores the actual input for a period $T_1$ of the computation. This problem exists in original UDT as well. A naive way to avoid it is giving up on optimizing the entire input-output mapping and focus on the input which was actually received. This allows the following non-quining decision theory:

(5) $A(i):=\underset{o \in O}{\arg\max} \: E[U()|P() \in F_{i,o}]$

Here $F_{i,o}$ is the set of programs which begin with a conditional statement that produces output $o$ and terminate execution if received input was $i$. Of course, ignoring counterfactual inputs means failing a large class of decision problems. A possible win-win solution is reintroducing quining2:

(6) $A(i):=\underset{o \in O}{\arg\max} \: E[U()|P()=\hat{F}_{i,o}(A)]$

Here, $\hat{F}_{i,o}$ is an operator which appends a conditional as above to the beginning of a program. Superficially, we still only consider a single input-output pair. However, instances of the successor receiving different inputs now take each other into account (as existing in "counterfactual" universes). It is often claimed that the use of logical uncertainty in UDT allows for agents in different universes to reach a Pareto optimal outcome using acausal trade. If this is the case, then agents which have the same utility function should cooperate acausally with ease. Of course, this argument should also make the use of full input-output mappings redundant in usual UDT.

In case the precursor is an actual AI programmer (rather than another AI), it is unrealistic for her to code a formal model of herself into the AI. In a followup post, I'm planning to explain how to do without it (namely, how to define a generic precursor using a combination of Solomonoff induction and a formal specification of the AI's hardware).

1 If Omega's simulation involves $P$, this becomes the usual Newcomb problem and one-boxing is the correct strategy.

2 Sorry agents which can't access their own source code. You will have to make do with one of (3), (4') or (5).

8 11 March 2015 01:40PM

A putative new idea for AI control; index here.

Many of the ideas presented here require AIs to be antagonistic towards each other - or at least hypothetically antagonistic towards hypothetical other AIs. This can fail if the AIs engage in acausal trade, so it would be useful if we could prevent such things from happening.

Now, I have to admit I'm still quite confused by acausal trade, so I'll simplify it to something I understand much better, an anthropic decision problem.

## Staples and paperclips, cooperation and defection

Cilppy has a utility function p, linear in paperclips, while Stapley has a utility function s, linear in staples (and both p and s are normalised to zero with one aditional item adding 1 utility). They are not causally connected, and each must choose "Cooperate" or "Defect". If they "Cooperate", they create 10 copies of the items they do not value (so Clippy creates 10 staples, Stapley creates 10 paperclips). If they choose defect, they create one copy of the item they value (so Clippy creates 1 paperclip, Stapley creates 1 staple).

Assume both agents know these facts, both agents use anthropic decision theories, and both agents are identical apart from their separate locations and distinct utility functions.

Then the outcome is easy: both agents will consider that "cooperate-cooperate" or "defect-defect" are the only two possible options, "cooperate-cooperate" gives them the best outcome, so they will both cooperate. It's a sweet story of cooperation and trust between lovers that never agree and never meet.

## Breaking cooperation

How can we demolish this lovely agreement? As I often do, I will assume that there is some event X that will turn Clippy on, with P(X) ≈ 1 (hence P(¬X) << 1). Similarly there is an event Y that turns Stapley on. Since X and Y are almost certain, they should not affect the results above. If the events don't happen, the AIs will never get turned on at all.

Now I am going to modify utility p, replacing it with

p' = p - E(p|¬X).

This p with a single element subtracted off it, the expected value of p given that Clippy has not been turned on. This term feels like a constant, but isn't exactly, as we shall see. Do the same modification to utility s, using Y:

s' = s - E(s|¬Y).

Now contrast "cooperate-cooperate" and "defect-defect". If Clippy and Stapley are both cooperators, then p=s=10. However, if the (incredibly unlikely) ¬X were to happen, then Clippy would not exist, but Stapley would still cooperate (as Stapley has no way of knowing about Clippy's non-existence), and create ten paperclips. So E(p|¬X) = E(p|X) ≈ 10, and p' ≈ 0. Similarly s' ≈ 0.

If both agents are defectors, though, then p=s=1. Since each agent creates its own valuable object, E(p|¬X) = 0 (Clippy cannot create a paperclip if Clippy does not exist) and similarly E(s|¬Y)=0.

So p'=s'=1, and both agents will choose to defect.

If this is a good analogue for acausal decision making, it seems we can break that, if needed.

## "Spiritual" techniques that actually work thread

8 11 March 2015 10:35AM