## Just a photo

1 19 October 2017 06:48PM

Would you say the picture below (by A. S. Shevchenko) is almost like an optical illusion?

Have you seen any pictures or sights that fooled your brain for a moment but that you wouldn't call optical illusions, and if yes, what is the salient difference?

## Use concrete language to improve your communication in relationships

2 19 October 2017 03:46AM

She wasn’t respecting me. Or at least, that’s what I was telling myself.

And I was pretty upset. What kind of person was too busy to text back a short reply? I know she’s a friendly person because just a week ago we were talking daily, text, phone, whatever suited us. And now? She didn’t respect me. That’s what I was telling myself. Any person with common decency could see, what she was doing was downright rude! And she was doing it on purpose. Or at least, that’s what I was telling myself.

It was about a half a day of these critical-loop thoughts, when I realised what I was doing. I was telling myself a story. I was building a version of events that grew and morphed beyond the very concrete and specific of what was happening. The trouble with The Map and the Territory, is that “Respect” is in my map of my reality. What it “means” to not reply to my text is in my theory of mind, in my version of events. Not in the territory, not in reality.

I know I could be right about my theory of what’s going on. She could be doing this on purpose, she could be choosing to show that she does not respect me by not replying to my texts, and I often am right about these things. I have been right plenty of times in the past. But that doesn’t make me feel better. Or make it easier to communicate my problem. If she was not showing me respect, sending her an accusation would not help our communication improve.

The concept comes from Non-Violent Communication by Marshall Rosenberg. Better described as Non-Judgemental communication. The challenge I knew I faced was to communicate to her that I was bothered, without an accusation. Without accusing her with my own internal judgement of “she isn’t respecting me”. I knew if I fire off an attack, I will encounter walls of defence. That’s the kind of games we play when we feel attacked by others. We put up walls and fire back.

The first step of NVC is called, “observation”. I call it “concrete experience”. To pass the concrete experience test, the description of what happened needs to be specific enough to be used as instructions by a stranger. For example, there are plenty of ideas someone could have about not showing respect, if my description of the problem is, “she does not respect me”, my grandma might think she started eating before I sat down at the table. If my description is, “In the past 3 days she has not replied to any of my messages”. That’s a very concrete description of what happened. It’s also independent as an observation. It’s not clear that doing this action has caused a problem in my description of what happened. It’s just “what happened”

Notice — I didn’t say, “she never replies to my messages”. This is because “never replies” is not concrete, not specific, and sweepingly untrue. For her to never reply she would have to have my grandma’s texting ability. I definitely can’t expect progress to be made here with a sweeping accusations like “she never replies”.

What I did go with, while not perfect, is a lot better than the firing line of, “you don’t respect me”. Instead it was, “I noticed that you have not messaged me in three days. I am upset because I am telling myself that the only reason you would be doing that is because you don’t respect me, and I know that’s not true. I don’t understand what’s going on with you and I would appreciate an explanation of what’s going on.”.

It’s remarkably hard to be honest and not make an accusation. No sweeping generalisations, no lies or exaggerations, just the concretes of what is going on in my head and the concrete of what happened in the territory. It’s still okay to be telling yourself those accusations, and validate your own feelings that things are not okay — but it’s not okay to lay those accusations on someone else. We all experience telling ourselves what other people are thinking, and the reasons behind their actions, but we can’t ever really know unless we ask. And if we don’t ask, we end up with the same circumstances surrounding the cold-war, each side preparing for war, but a war built on theories in the map, not the experience in the territory.

I’m human too, that’s how I found myself half-a-day of brooding before wondering what I was doing to myself! It’s not easy to apply this method, but it has always been successful at bringing me some of that psychological relief that you need when you are looking to be understood by someone. To get this right think, “How do I describe my concrete observations of what happened?”.

Good Luck!

## [Link] New program can beat Alpha Go, didn't need input from human games

6 18 October 2017 08:01PM

1 18 October 2017 02:40PM

This post is from the point of view of the middleman standing between the grand future he doesn't understand and the general public whose money he's hunting. We have a certain degree of power over what to offer to the customer, and our biases and pet horses are going to contribute a lot to what theoreticians infer about "the actual public"'s tastes. Just how a lot it is, I cannot say, & there's probably tons of literature on this anyway, so take this as a personal anecdote.

Nine months as a teacher of botany (worst gripes here) showed me a glimpse of how teachers/administration view the field they teach. A year in a shop - what managers think of books we sell. The scientific community here in my country grumbles that there's too little non-fiction produced, without actually looking into why it's not being distributed; but really, it's small wonder. Broadest advice - if your sufficiently weird goals depend on the cooperation of a network of people, especially if they are an established profession with which you haven't had a cause to interact closely except as a customer, you might want to ask what they think of your enterprise. Because they aren't going to see it your way. Next thing, is to accept it.

## Open thread, October 16 - October 22, 2017

1 16 October 2017 06:53PM
##### If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

## Humans can be assigned any values whatsoever...

2 13 October 2017 11:32AM

Crossposted at LessWrong 2.0.

Humans have no values... nor do any agent. Unless you make strong assumptions about their rationality. And depending on those assumptions, you get humans to have any values.

## An agent with no clear preferences

There are three buttons in this world, B(0), B(1), and X, and one agent H.

B(0) and B(1) can be operated by H, while X can be operated by an outside observer. H will initially press button B(0); if ever X is pressed, the agent will switch to pressing B(1). If X is pressed again, the agent will switch back to pressing B(0), and so on. After a large number of turns N, H will shut off. That's the full algorithm for H.

So the question is, what are the values/preferences/rewards of H? There are three natural reward functions that are plausible:

• R(0), which is linear in the number of times B(0) is pressed.
• R(1), which is linear in the number of times B(1) is pressed.
• R(2) = I(E,X)R(0) + I(O,X)R(1), where I(E,X) is the indicator function for X being pressed an even number of times,I(O,X)=1-I(E,X) being the indicator function for X being pressed an odd number of times.

For R(0), we can interpret H as an R(0) maximising agent which X overrides. For R(1), we can interpret H as an R(1) maximising agent which X releases from constraints. And R(2) is the "H is always fully rational" reward. Semantically, these make sense for the various R(i)'s being a true and natural reward, with X="coercive brain surgery" in the first case, X="release H from annoying social obligations" in the second, and X="switch which of R(0) and R(1) gives you pleasure".

But note that there is no semantic implications here, all that we know is H, with its full algorithm. If we wanted to deduce its true reward for the purpose of something like Inverse Reinforcement Learning (IRL), what would it be?

## Modelling human (ir)rationality and reward

Now let's talk about the preferences of an actual human. We all know that humans are not always rational (how exactly we know this is a very interesting question that I will be digging into). But even if humans were fully rational, the fact remains that we are physical, and vulnerable to things like coercive brain surgery (and in practice, to a whole host of other more or less manipulative techniques). So there will be the equivalent of "button X" that overrides human preferences. Thus, "not immortal and unchangeable" is in practice enough for the agent to be considered "not fully rational".

Now assume that we've thoroughly observed a given human h (including their internal brain wiring), so we know the human policy π(h) (which determines their actions in all circumstances). This is, in practice all that we can ever observe - once we know π(h) perfectly, there is nothing more that observing h can teach us (ignore, just for the moment, the question of the internal wiring of h's brain - that might be able to teach us more, but we'll need extra assumptions).

Let R be a possible human reward function, and R the set of such rewards. A human (ir)rationality planning algorithm p (hereafter refereed to as a planner), is a map from R to the space of policies (thus p(R) says how a human with reward R will actually behave - for example, this could be bounded rationality, rationality with biases, or many other options). Say that the pair (p,R) is compatible if p(R)=π(h). Thus a human with planner p and reward R would behave as h does.

What possible compatible pairs are there? Here are some candidates:

• (p(0), R(0)), where p(0) and R(0) are some "plausible" or "acceptable" planners and reward functions (what this means is a big question).
• (p(1), R(1)), where p(1) is the "fully rational" planner, and R(1) is a reward that fits to give the required policy.
• (p(2), R(2)), where R(2)= -R(1), and p(2)= -p(1), where -p(R) is defined as p(-R); here p(2) is the "fully anti-rational" planner.
• (p(3), R(3)), where p(3) maps all rewards to π(h), and R(3) is trivial and constant.
• (p(4), R(4)), where p(4)= -p(0) and R(4)= -R(0).

## Distinguishing among compatible pairs

How can we distinguish between compatible pairs? At first appearance, we can't. That's because, by their definition of compatible, all pairs produce the correct policy π(h). And once we have π(h), further observations of h tell us nothing.

I initially thought that Kolmogorov or algorithmic complexity might help us here. But in fact:

Theorem: The pairs (p(i), R(i)), i ≥ 1, are either simpler than (p(0), R(0)), or differ in Kolmogorov complexity from it by a constant that is independent of (p(0), R(0)).

Proof: The cases of i=4 and i=2 are easy, as these differ from i=0 and i=1 by two minus signs. Given (p(0), R(0)), a fixed-length algorithm computes π(h). Then a fixed length algorithm defines p(3) (by mapping input to π(h)). Furthermore, given π(h) and any history η, a fixed length algorithm computes the action a(η) the agent will take; then a fixed length algorithm defines R(1)(η,a(η))=1 and R(1)(η,b)=0 for b≠a(η).

So the Kolmogorov complexity can shift between p and R (all in R for i=1,2, all in p for i=3), but it seems that the complexity of the pair doesn't go up during these shifts.

This is puzzling. It seems that, in principle, one cannot assume anything about h's reward at all! R(2)= -R(1), R(4)= -R(0), and p(3) is compatible with any possible reward R. If we give up the assumption of human rationality - which we must - it seems we can't say anything about the human reward function. So it seems IRL must fail.

Yet, in practice, we can and do say a lot about the rationality and reward/desires of various human beings. We talk about ourselves being irrational, as well as others being so. How do we do this? What structure do we need to assume, and is there a way to get AIs to assume the same?

This the question I'll try and partially answer in subsequent posts, using the example of the anchoring bias as a motivating example. The anchoring bias is one of the clearest of all biases; what is it that allows us to say, with such certainty, that it's a bias (or at least a misfiring heuristic) rather than an odd reward function?

## Beauty as a signal (map)

4 12 October 2017 10:02AM

This is my new map, in which female beauty is presented as a signal which moves from woman to man through different mediums and amplifiers. pdf

## Mini-conference "Near-term AI safety"

4 11 October 2017 03:19PM

TL;DR: The event will be in Moscow, Russia, and near-term risks of AI will be discussed. The main language will be Russian, but Jonatan Yan will speak in English from HK. English presentations will be uploaded later on the FB page of the group "Near-term AI safety." Speakers: S. Shegurin, A. Turchin, Jonathan Yan. The event's FB page is here.

In the last five years, artificial intelligence has developed at a much faster pace in connection with the success of neural network technologies. If we extrapolate these trends, AI near-human level may appear in the next five to ten years, and there is a significant probability that this will lead to a global catastrophe. At a one-day conference at the Kocherga rationalist club, we'll look at how recent advances in the field of neural networks are changing our estimates of the timing of the creation of AGI, and what global catastrophes are possible in connection with the emergence of an increasingly strong AI. A special guest of the program Jonathan Yan is Hong Kong will tell (in English, via Skype) the latest research data on this topic.

The language of the conference: the first two reports in Russian, and the report Yan in English without translation, the discussion after it in English.

Registration: on the event page on Facebook.

Place: rationalist club "Kocherga", mail hall, Bolshaya Dorogomilovskaya ul., 5 корпус 2.

Program:

October 14, Saturday 15.00 - the beginning.

15.00 - Shegurin Sergey. "Is it possible to create a human level AI in the next 10 years?"

16.00 - Turchin Alexey. "The next 10 years: the global risks of AI before the creation of the superintelligence"

17.00 - Jonathan Yan. "Recent Developments Towards AGI & Why It's Nearer Than You Think (in English)"

17.40 - Discussion

## Toy model of the AI control problem: animated version

7 10 October 2017 11:12AM

Crossposted at LessWrong 2.0.

A few years back, I came up with a toy model of the AI control problem. It has a robot moving boxes into a hole, with a slightly different goal than it's human designers, and a security camera to check that it's behaving as it should. The robot learns to block the camera to get its highest reward.

I've been told that the model is useful for explaining the control problem quite a few people, and I've always wanted to program the "robot" and get an animated version of it. Gwern had a live demo, but it didn't illustrate all the things I wanted to.

So I programmed the toy problem in python, and generated a video with commentary.

In this simplified version, the state space is sufficiently small that you can explicitly generate the whole table of Q-values (expected reward for taking an action in a certain state, assuming otherwise optimal policy). Since behaviour is deterministic, this can be updated in dynamical programming, using a full width backup. The number of such backups essentially measures the depth of the robot's predictive ability.

The most interesting depths of planning are:

• At depth 0, the robot behaves randomly.
• At depth 3, the robot crashes itself efficiently, to avoid loosing too much reward.
• At depth 6, the robot figures out how to get one box into the hole. The automated camera turns it off.
• At depth 17, the robot finally starts to deceive/conceal/manipulate, by blocking the camera and pushing two boxes into the hole. Note that the robot's deception comes from the fact that its incentives are misaligned, and that humans tried to control it.
• At depth 18, the robot efficiently does the plan from depth 17.
• At depth 20, the robot does the maximally efficient plan: blocking the camera, and pushing all boxes into the hole.
• At depth 32, the robot has the correct Q-values for the maximally efficient plan.
• At depth 45, finally, the Q-value table is fully updated, and the robot will take maximally efficient, and, if need be, deceptive plans from any robot/box starting positions.

The code and images can be found here.

## HOWTO: Screw Up The LessWrong Survey and Bring Great Shame To Your Family

25 08 October 2017 03:43AM

Let's talk about the LessWrong Survey.

First and foremost, if you took the survey and hit 'submit', your information was saved and you don't have to take it again.

Your data is safe, nobody took it or anything it's not like that. If you took the survey and hit the submit button, this post isn't for you.

For the rest of you, I'll put it plainly: I screwed up.

This LessWrong Survey had the lowest turnout since Scott's original survey in 2009. I'll admit I'm not entirely sure why that is, but I have a hunch and most of the footprints lead back to me. The causes I can finger seem to be the diaspora, poor software, poor advertising, and excessive length.

## The Diaspora

As it stands, this years LessWrong survey got about 300 completed responses. This can be compared with the previous one in 2016 which got over 1600. I think one critical difference between this survey and the last was its name. Last year the survey focused on figuring out where the 'Diaspora' was and what venues had gotten users now that LessWrong was sort of the walking dead. It accomplished that well I think, and part of the reason why is I titled it the LessWrong Diaspora Survey. That magic word got far off venues to promote it even when I hadn't asked them to. The survey was posted by Scott Alexander, Ozy Frantz, and others to their respective blogs and pretty much everyone 'involved in LessWrong' to one degree or another felt like it was meant for them to take. By contrast, this survey was focused on LessWrong's recovery and revitalization, so I dropped the word Diaspora from it and this seemed to have caused a ton of confusion. Many people I interviewed to ask why they hadn't taken the survey flat out told me that even though they were sitting in a chatroom dedicated to SSC, and they'd read the sequences, the survey wasn't about them because they had no affiliation with LessWrong. Certainly that wasn't the intent I was trying to communicate.

## Poor Software

There should be, but there's really only LimeSurvey. If I had to give this post an alternate title, it would be "LimeSurvey: An anti endorsement".

I could go on for pages about what's wrong with LimeSurvey, but it can probably be summed up as "the software is bloated and resists customization". It's slow, it uses slick graphics but fails to entirely deliver on functionality, its inner workings are kind of baroque, it's the sort of thing I probably should have rejected on principle and written my own. However at that time the survey was incredibly overdue, so I felt it would be better to just get out something expedient since everyone was already waiting for it anyway. And the thing is, in 2016 it went well. We got over 3000 responses including both partial and complete. So walking away from that victory and going into 2017, I didn't really think too hard about the choice to continue using it.

A couple of things changed between 2016 and our running the survey in 2017:

Hosting - My hosting provider, a single individual who sets up strong networking architectures in his basement, had gotten a lot busier since 2016 and wasn't immediately available to handle any issues. The 2016 survey had a number of birthing pains, and his dedicated attention was part of the reason why we were able to make it go at all. Since he wasn't here this time, I was more on my own in fixing things.

Myself - I had also gotten a lot busier since 2016. I didn't have nearly as much slack as I did the last time I did it. So I was sort of relying on having done the whole process in 2016 to insulate me from opening the thing up to a bunch of problems.

Both of these would prove disastrous, as when I started the survey this time it was slow, it had a variety of bugs and issues I had only limited time to fix, and the issues just kept coming, even more than in 2016 like it had decided now when I truly didn't have the energy to spare was when things should break down. These mostly weren't show stopping bugs though, they were minor annoyances. But every minor annoyance reduced turnout, and I was slowly bleeding through the pool of potential respondents by leaving them unfixed.

The straw that finally broke the camels back for me was when I woke up to find that this message was being shown to most users coming to take the survey:

"Your responses cannot be saved"? This error meant for when someone had messed up cookies was telling users a vicious lie: That the survey wasn't working right now and there was no point in them taking it.

Looking at this in horror and outrage, after encountering problem after problem mixed with low turnout, I finally pulled the plug.

As one email to me mentioned, the 2017 survey didn't even get promoted to the main section of the LessWrong website. This time there were no links from Scott Alexander, nor the myriad small stakeholders that made it work last time. I'm not blaming them or anything, but as a consequence many people who I interviewed to ask about why they hadn't taken the survey had not even heard it existed. Certainly this had to have been significantly responsible for reduced turnout compared to last time.

## Excessive Length

Of all the things people complained about when I interviewed them on why they hadn't taken the survey, this was easily the most common response. "It's too long."

This year I made the mistake of moving back to a single page format. The problem with a single page format is that it makes it clear to respondents just how long the survey really is. It's simply too long to expect most people to complete it. And before I start getting suggestions for it in the comments, the problem isn't actually that it needs to be shortened, per se. The problem is that to investigate every question we might want to know about the community, it really needs to be broken into more than one survey. Especially when there are stakeholders involved who would like to see a particular section added to satisfy some questions they have.

Right now I'm exploring the possibility of setting up a site similar to yourmorals so that the survey can be effectively broken up and hosted in a way where users can sign in and take different portions of it at their leisure. Further gamification could be added to help make it a little more fun for people. Which leads into...

## The Survey Is Too Much Work For One Person

What we need isn't a guardian of the survey, it's really more like a survey committee. I would be perfectly willing (and plan to) chair such a committee, but I frankly need help. Writing the survey, hosting it without flaws, theming it so that it looks nice, writing any new code or web things so that we can host it without bugs, comprehensively analyzing the thing, it's a damn lot of work to do it right and so far I've kind of been relying on the generosity of my friends for it. If there are other people who really care about the survey and my ability to do it, consider this my recruiting call for you to come and help. You can mail me here on LessWrong, post in the comments, or email me at jd@fortforecast.com. If that's something you would be interested in I could really use the assistance.

## What Now?

Honestly? I'm not sure. The way I see it my options look something like:

Call It A Day And Analyze What I've Got - N=300 is nothing to sneeze at, theoretically I could just call this whole thing a wash and move on to analysis.

Try And Perform An Emergency Migration - For example, I could try and set this up again on Google Forms. Having investigated that option, there's no 'import' button on Google forms so the survey would need to be reentered manually for all hundred-and-a-half questions.

Fix Some Of The Errors In LimeSurvey And Try Again On Different Hosting - I considered doing this too, but it seemed to me like the software was so clunky that there was simply no reasonable expectation this wouldn't happen again. LimeSurvey also has poor separation between being able to edit the survey and view the survey results, I couldn't delegate the work to someone else because that could theoretically violate users privacy.

These seem to me like the only things that are possible for this survey cycle, at any rate an extension of time would be required for another round. In the long run I would like to organize a project to write a new software from scratch that fixes these issues and gives us a site multiple stakeholders can submit surveys to which might be too niche to include in the current LessWrong Survey format.

I'm welcome to other suggestions in the comments, consider this my SOS.

3 07 October 2017 09:32PM

Maybe the last installment of the Polling Thread.

At least I guess it's the last one before we switch to the LesserWrong codebase which sadly doesn't seem to support polls. Maybe to easen the transition we can share polls, e.g. on Google Forms or SurveyMonkey. Or discuss alternatives.

These used to be the rules:

1. Each poll (or link to a poll) goes into its own top level comment and may be commented there.
2. You must should at least vote all polls that were posted earlier than your own. This ensures participation in all polls and also limits the total number of polls. You may of course vote without posting a poll.
3. Your poll should include a 'don't know' option (to avoid conflict with 2). I don't know whether we need to add a troll catch option here but we will see.

If you don't know how to make a poll in a comment look at the Poll Markup Help.

This is a somewhat regular thread. If it is successful I may post again. Or you may. In that case do the following :

• Use "Polling Thread" in the title.
• Copy the rules.
• Create a top-level comment saying 'Discussion of this thread goes here; all other top-level comments should be polls or similar'
• Add a second top-level comment with an initial poll to start participation.

## Running a Futurist Institute.

4 06 October 2017 05:05PM

Hello,

My name is Trent Fowler, and I'm an aspiring futurist. To date I have given talks on two continents on machine ethics, AI takeoff dynamics, secular spirituality, existential risk, the future of governance, and technical rationality. I have written on introspection, the interface between language and cognition, the evolution of intellectual frameworks, and myriad other topics. In 2016 I began 'The STEMpunk Project', an endeavor to learn as much about computing, electronics, mechanics, and AI as possible, which culminated in a book published earlier this year.

Elon Musk is my spirit animal.

I am planning to found a futurist institute in Boulder, CO. I actually left my cushy job in East Asia to help make the future a habitable place.

Is there someone I could talk to about how to do this? Should I incorporate as a 501C3 or an LLC? What are the best ways of monetizing such an endeavor? How can I build an audience (meetup attendance has been anemic at best, what can I do about that)? And so on.

Best,

-Trent

## [Link] You Too Can See Suffering

3 03 October 2017 07:46PM

## Open thread, October 2 - October 8, 2017

1 03 October 2017 10:46AM
##### If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

## [Link] [Slashdot] We're Not Living in a Computer Simulation, New Research Shows

1 03 October 2017 10:10AM

## Rational Feed: Last Week's Community Articles and Some Recommended Posts

2 02 October 2017 01:49PM

===Highly Recommended Articles:

Slack by Zvi Moshowitz - You need slack in your life. Slack lets you explore and invest. If you don't have slack you can't relax or uphold your morals. Fight hard to maintain your slack and don't let people or things take it away. Maya Millennial's lack of slack.

Personal Thoughts On Careers In Ai Policy And by carrickflynn (EA forum) - 3600 words. AI strategy is bottlenecked by hard research problems. Hence most people will find it hard to contribute effectively, even if they are very talented. Solving these problems has extremely high value. We should prepare to mobilize more talent once the blocking issues are solved. Operations work is still in high demand.

End Factory Farming by 80,000 Hours - Three hour podcast. How young people can set themselves up to contribute to scientific research into meat alternatives. Genetic manipulation of chickens. Skepticism of vegan advocacy. Grants to China, India and South America. Insect farming. Pessimism about legal or electoral solutions. Which species to focus on. Fish and crustacean consciousness.

===Scott:

Against Individual Iq Worries by Scott Alexander - "IQ is very useful and powerful for research purposes. It’s not nearly as interesting for you personally." IQ measurement problems. Even accurately measured IQ isn't that predictive.

Links: Hurly Burly by Scott Alexander - SSC links post. Copyright, genetic engineering, Autism, Machine Learning, Putin's fears of AI risk, the lesswrong relaunch and more

===Rationalist:

Dojo Bad Day Contingency Plan by Elo - Eleizer's discussion of why rationality theory isn't enough, you need to practice. An exercise about improving your mental on bad days.

Also Against Individual IQ Worries by Scott Aaronson - IQ tests tend to ask unclear questions and require you to reverse engineer what the test maker meant. Scott's own IQ was once measured at 106.

Predictive Processing by Entirely Useless - Responses to quotes from Surfing Uncertainty and Scott's review. A large focus is the "darkened room" problem.

Prosocial Manipulation by Katja Grace - Being calculating and guarded in communication is commonly considered manipulative and selfish. But many people's goals are pro-social, why do we assume manipulation is anti-social?

Humans As Leaky Systems by mindlevelup - "Fairly obvious stuff that probably lots of people are thinking about, but now put into simpler words (maybe). Basically, the idea that humans are affected by both ideas and the environment, and this is an important consideration in several models."

Dealism Futarchy And Hypocrisy by Robin Hanson - Policy conversations don't have to be about morality or terminal values. We can instead use tools like economics as a way to help people get whatever it is they want. We can push closer to the Pareto optimal frontier.

Debunking Iq Denial Ism by Grey Enlightenment - Criticisms of Scott's article on individual iq. People can change their socioeconomic status not their iq, IQ is more predicative than socioeconomic status, Feynman, Job titles are non-specific, low-iq 'computer' professions might be doing data entry. EQ isn't intrinsic and doesn't compete with IQ.

Harnessing Polarization by Robin Hanson - Capitalism channels status competition into productive enterprise. How can we similarly channel partisanship? Contests? Decision Markets?

Common Sense Eats Common Talk by Stefano Zorzi (ribbonfarm) - Missing the housing bubble. Falling for conformity. Seeing through invisible clothes. Advice: Test macro assumptions, beware of jargon, assume propositions that contradict common sense are wrong. Common talk and common sense and their failings.

Sabbath Hard And Go Home by Ben Hoffman - The Sabbth as easymode leisure. Unplugging while camping or on a meditation retreat feels natural. What is leisure? If you are unable to keep a Sabbath things are not ok, there isn't enough slack in the system.

Cognitive Empathy And Emotional Labor by Gordon (Map and Territory) - Affective empathy contrasted with Cognitive empathy. Cognitive empathy enables real emotional labor.

City Travel Scaling by Robin Hanson - Review of Geoffrey West's 'Scale'. Most visits to a location are from infrequent visitors who live nearby. Fractal piping systems have an overhead that only grows logarithmically with the size of the city. Evolution never found such efficient heating/cooling systems.

Travel Journal Hawaii by Jacob Falkovich - The Hawaiian language only has 40 syllables. Sales tax. Circadian Rhythm. Colonialism. The Hawaiian caste system. The best meal in the world. Don't quit your job to sell lemonade. Minimum wage ruined the pineapple industry.

Why I Quit Social Media by Sarah Constantin - Becoming stronger and less emotional since we live in a finite world with constrained resources. Social media: "It distances you from reality, makes you focus on a shadow-world of opinions about opinions about opinions; it makes you more impulsive and emotionally unstable; it incentivizes derailing conversations to fish for ego-strokes."

===AI:

An Outside View Of Ai Control by Robin Hanson - Non-singularity scenarios where software performs almost all jobs. Software usually reflects the social organization of those who made it. Entrench designs and systems. Don't work on the control problem until its time. Human control and AI control. Most AI failures in this scenario will cause limited damage and can be handled after they occur.

Nonlinear Computation In Linear Networks by Open Ai - Floating point arithmatic is fundamentally non-linear near the limit of machine precision. OpenAI managed to exploit these non-linear effects with an evolutionary algorithm to achieve much better performance than a normal deep normal network on MNIST.

September 2017 Newsletter by The MIRI Blog - New MIRI paper on Incorrigibility and shitting off AI. Best posts from the intelligent agents forum. Links to videos and podcasts. MIRI personel updates and career opportunities in aI safety.

NBER Conference Artificial Intelligence by Marginal Revolution - Links to the program and videos. Tyler was there to comment on Korinek and Stiglitz.

===EA:

What Happens To Cows In The Us by Eukaryote - "There are 92,000,000 cattle in the USA. Where do they come from, what are they used for, and what are their ultimate fates?"

Interim Update On Givewells Money Moved And Web Traffic In 2016 by The GiveWell Blog - Summary of influence, total money moved, money moved by charity.

Guardedness In Ea by Jeff Kaufman - As people and organizations gain prestige their communication becomes less open and more careful. Jeff has seen this happen in the EA community and dislikes the effects. However Jeff doesn't see a great alternative.

Trial Postponed by GiveDirectly - Give directly Kenya trial postponed due to political events.

===Politics and Economics:

Why White Identity Doesn't Work by Grey Enlightenment - Who counts as white. Race is secondary to ancestry and culture. No unifying cause or struggle. Whites may be biologically individualist. Too much infighting.

Comment on Oppressed Groups and Slack by Benquo - People who are oppressed often lack the slack to maintain their morals. Seven Samurai. This has the troubling implication that while we should listen to the oppressed the relatively privileged should maintain leadership. However it also implies that oppressed group's behavior will improve after enough time without a boot on their neck.

The OpenPhil Report On Incarceration by The Unit of Caring - "Our prison system isn’t just not-rehabilitative; it is anti-rehabilitative. It traumatizes and retraumatizes people and severs their connections to people and opportunities within the law and abuses them and breaks social trust and produces crime which is then used to justify longer prison sentences which produce more crime."

On The Fetishization Of Money In Galts Gulch by Ben Hoffman - Danny Taggart and Galt feel they can't ethically become lovers until they rectify a power imbalance. Danny solves this problem by becoming Galt's house-maker and cook. Most people's intuition is that employment creates a power imbalance, it doesn't solve one. What is going on?

Seasteading 2 by Bayesian Investor - "The book’s style is too much like a newspaper. Rather than focus on the main advantages of seasteading, it focuses on the concerns of the average person, and on how seasteading might affect them. It quotes interesting people extensively, while being vague about whether the authors are just reporting that those people have ideas, or whether the authors have checked that the ideas are correct. Many of the ideas seem rather fishy."

What Is Going On With The Alt Right by Grey Enlightenment - Reasons the alt-right is falling apart: Trump back-peddling or softening on campaign promises, The civil war between the-lite, alt-medium, and alt-right, Slow news cycle and brevity of ideas, Botched rallies and poor branding, The alt-right losing its official Reddit sub, the right is more intellectually diverse than the left.

Milgram Replicates by Bryan Caplan - Milgram's shock study replicated well in 2009. Since 79% of people who pushed past the subjects first verbal protest went to the end of the range the replication stopped earlier than Milgram.

===Misc:

Summary Of Reading July September 2017 by Eli Bendersky - Book reviews: Stats, genetics, Winnie the Pooh, Zen and other topics.

===Podcast:

Creating Trump by The Ezra Klein Show - "How the Republican Party created Trump, how Trump won, and what comes next. As Dionne says in this interview, the American system was "not supposed to produce a president like this,” and so a lot of our conversation is about how the guardrails failed and whether they can be rebuilt."

Rs 194 Robert Wright On Why Buddhism Is True by Rationally Speaking - "Why Buddhism was right about human nature: its diagnosis that the our suffering is mainly due to a failure to see reality clearly, and its prescription that meditation can help us see more clearly. Robert and Julia discuss whether it's suspicious that a religion turned out to be "right" about human nature, what it means for emotions to be true or false, and whether there are downsides to enlightenment."

Robert Wright by EconTalk - "The psychotherapeutic insights of Buddhism and the benefits of meditation and mindfulness. Wright argues our evolutionary past has endowed us with a mind that can be ill-suited to the stress of the present. He argues that meditation and the non-religious aspects of Buddhism can reduce suffering and are consistent with recent psychological research."

Burning Man by The Bayesian Conspiracy - How much does burning man live up to its principles, changes over time, finding out you aren't gay in your twenties, marriage. Burning Man advice: Go with a camp you like, don't have plans just wander around and get involved in whats interesting.

The Fate Of Liberalism by Waking Up with Sam Harris - "Mark Lilla about the fate of political liberalism in the United States, the emergence of a new identity politics, the role of class in American society"

1 02 October 2017 07:43AM

The following is an exercise I composed to be run at the Lesswrong Sydney dojos.  It took an hour and a half but could probably be done faster with some adaptations that I have included in these instructions. In regards to what are the dojos?

I quote Eliezer in the preface of Rationality: From AI to Zombies when he says:

It was a mistake that I didn’t write my two years of blog posts with the intention of helping people do better in their everyday lives. I wrote it with the intention of helping people solve big, difficult, important problems, and I chose impressive-sounding, abstract problems as my examples. In retrospect, this was the second-largest mistake in my approach.
It ties in to the first-largest mistake in my writing, which was that I didn’t realise that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory. I didn’t realise that part was the priority; and regarding this I can only say “Oops” and “Duh.” Yes, sometimes those big issues really are big and really are important; but that doesn’t change the basic truth that to master skills you need to practice them and it’s harder to practice on things that are further away.

Lesswrong is a global movement of rationality.  And with that in mind, the Dojos are our attempt in Sydney to be working on the actual practical stuff.  Working on the personal problems and literal implementation of The plans after they undergo first contact with the enemy. You can join us through our meetup group, facebook group and as advertised on lesswrong.

Below is the instructions for the Dojo.  I can't emphasise enough the process of actually doing and not just reading.

If you intend to participate, grab some paper or a blank document and stop for a few minutes to make the lists.  Then check your answers against ours. If you don't do the exercise - don't fool yourself into thinking you have this skill under your belt.  Just accept that you didn't really "learn" this one.  you kinda said, "that's great I wish I could find the time to get healthy"  Or "If only I was the type of person who did things.".  If this is especially difficult for you, that's okay.  It is difficult for all of us.  I believe in you!

Good luck.

Everyone has bad days.  Each of us will have various experiences dealing with different causes and/or diagnosing, solving and resolving the causes of "bad-days"

With that in mind I want to do a few sets of discussions on factors of a bad day.

Part 1: Set a timer for 3 minutes - Make a list of things bad for state of mind, or things you have noticed cause trouble for you.  {as a group each person shares one} Review the hints list as a group:

• routine meds/supplements (supposed to take)
• have you taken something to cause a bad state? (things you should not take)
• sleep
• exercise
• shower
• Sunlight (independent of bright light)
• talk to a human in the last X hours
• talk to too many humans in the last X hours
• Fresh air
• Did I eat in the last X hours
• drink in the last X hours
• Am I in pain?  Physical or emotional
• Physical discomfort, weather, loud noise, bright lights, bad smells
• Feel unsafe in my surroundings?
• Do I know why I'm in a bad mood, or not feeling well emotionally?  (remember do not dismiss or judge any answer)
• When did you last do something fun?
• Spend 5 minutes making a list of all the little things that are bothering you (try not to solve them now, just make the list) (and if necessary make plans for the ones you can affect).
• Also possibly distinguish between "why am I feeling bad" and "what can I do to feel less bad/even though I feel bad" (e.g. if you're stressed about upcoming event or fight you had last night, you might not be able to act on it but you can still do things now that will improve your state or at least get you being productive)

at the bottom of the page:{our bonus list of bad things generated in the dojo}

{As a group - were there any big ones we missed and discussion about what we came up with}

Part 2: {set a timer 3 minutes} Come up with a list of things that are good for your mental state

{Group discussion - each share one}

{optional hints list} http://happierhuman.com/how-to-be-happy/ {feel free to go through it as a group or glance at it or skip it}

{bonus good stuff list at the bottom}

{as a group discussion - did we miss any big ones?}

Part 3: Possibly ambiguous factors

Now that we have a list of good and a list of bad, we should build a list of possibly ambiguous factors that you can look out for.  For example the weather, allergies, unexpected events - i.e. a death or car accident. Set a timer 3 minutes - ambiguous factors {as a group - each name one}

{Any big ones we missed} (discussion)

{bonus ambiguous list at the bottom}

Part 4: The important parts

Now I want you to go through the list and come up with the top 5-10 (or as many as matters) most relevant ones.  From here on in it's your list, no more sharing so it doesn't matter to anyone else what's on it.

{Timer 2 minutes}

Part 5: plan for where to keep the list so it's most accessible - so that on a bad day you can access the list and make use of it. Could be in an email draft, could be on your phone, could be a note somewhere at home or in a notebook.

Timer 2 minutes - come up with where you will be keeping the list that makes it most useful to you.

{discussions of plans - including double checking of each other's plans to make sure they seem like they are likely to work}

{assistance if anyone is stuck}

Some ideas:

• notes app in phone
• bedroom door poster
• repeat and memorize
• "noticing" and asking why, rumination.

{end of exercise and break time}

• supplements
• private time
• sun
• exercise
• stress (and too much responsibility)
• sleep
• alcohol
• my mother (stress)
• weather (cold)
• body temperature
• pain
• interpersonal rejection (and the complexities of these)
• when my wife is unhappy
• overeating
• missing out on fun things
• losing control of my schedule
• not having a schedule
• overthinking past failure
• avoiding things I should do
• accusations/misunderstandings
• not sticking to good habits
• being confrontational
• need social time
• obligation
• fixating on bullshit
• getting short with people
• too much coffee
• not continuing communication (not knowing what to say)
• junk food
• not being "myself" enough
• breaking good routines
• cold showers in the morning are bad
• being unproductive at work
• something on the mind

{bonus list of good things}

• weather
• exercise/swimming, dancing
• sex
• big meals
• supplements
• sorting my spreadsheets -> feeling on top of my tasks -> congruence of purpose
• when things work smoothly
• creating things -> feedback on completion
• fasting
• perfect weather
• shower + bath
• go for a walk
• listen to nice music
• good plan & following it
• petting a cat
• weightlifting
• girlfriend
• playing instrument
• feeling connected with someone
• veg-out in bed
• good podcast
• dancing around the house
• good book/knowledge
• meditating
• a balanced day - a bit of everything "good day"
• napping
• solving a problem
• learning knowledge/skill
• new experiences + with other people
• lack of responsibility and commitment -> option of impulsivity
• nature experience (sunsets, cool breeze)
• discovering nuance
• progress feedback
• humour
• hypnotised to be relaxed
• 3 weeks sticking to diet and exercise
• new idea - epiphany feeling
• winning debate/scoring a soccer goal
• productive procrastination
• consider past accomplishment
• knowing/realising -> feeling the realisation
• when other people are really organised
• making someone smile
• massage giving and receiving
• hugs
• deep breathing
• looking at clouds
• playing with patterns
• making others happy
• good TV/movie
• getting paid
• balance social/alone time
• flow
• letting go/deciding not to care
• text chat
• lying on the floor sleep

{bonus ambiguous list}

• some foods
• water
• sleep (short can feel good endorphins)
• chemical smells (burning plastic, drying paint)
• coffee buzz
• conversations
• helping people
• humans
• finding information (sometimes a let down)
• balance discipline/freedom
• seeing family
• junk TV/movies
• junk food
• menial chores
• fidgeting
• paid work
• partner time
• coding binge
• being alone
• exercise
• reading documentation (sometimes good, sometimes terrible)
• being needed/wanted
• enthusiasm -> burnout
• masturbation
• alcohol
• sticking to timetable
• performing below standard
• sex
• learning new stuff
• clubs
• brain fog
• breaking the illusions of reality

Meta: this took an hour to write up and a few hours to generate the exercise.

## Feedback on LW 2.0

11 01 October 2017 03:18PM

What are your first impressions of the public beta?

2 01 October 2017 02:08AM

This is the monthly thread for posting media of various types that you've found that you enjoy. Post what you're reading, listening to, watching, and your opinion of it. Post recommendations to blogs. Post whatever media you feel like discussing! To see previous recommendations, check out the older threads.

Rules:

• Please avoid downvoting recommendations just because you don't personally like the recommended material; remember that liking is a two-place word. If you can point out a specific flaw in a person's recommendation, consider posting a comment to that effect.
• If you want to post something that (you know) has been recommended before, but have another recommendation to add, please link to the original, so that the reader has both recommendations.
• Use the "Other Media" thread if you believe the piece of media you want to discuss doesn't fit under any of the established categories.
• Use the "Meta" thread if you want to discuss about the monthly media thread itself (e.g. to propose adding/removing/splitting/merging subthreads, or to discuss the type of content properly belonging to each subthread) or for any other question or issue you may have about the thread or the rules.

## [Link] Work and income in the next era

0 30 September 2017 10:02PM

## logic puzzles and loophole abuse

2 30 September 2017 03:45PM

I recently read about the hardest logic puzzle ever on Wikipedia and noticed that someone published a paper in which they solved the problem by asking only two questions instead of three. This relied on abusing the loophole that boolean formulas can result in a paradox.

This got me thinking in what other ways the puzzle could be abused even further, and I managed to find a way to turn the problem into a hack to achieve omnipotence by enslaving gods (see below).

I find this quite amusing, and I would like to know if you know of any other examples where popular logic puzzles can be broken in amusing ways. I'm looking for any outside-the-box solutions that give much better results than expected. another example.

Here is my solution to the "hardest logic puzzle ever":

This solution is based on the following assumption: The gods are quite capable of responding to a question with actions besides saying 'da' and 'ja', but simply have no reason to do so. As stated in the problem description, the beings in question are gods and they have a language of their own. They could hardly be called gods, nor have need for a spoken language, if they weren't capable of affecting reality.

At a bare minimum, they should be capable of pronouncing the words 'da' and 'ja' in multiple different ways, or to delay answering the question by a fixed amount of time after the question is asked. Either possibility would extend the information content of an answer from a single bit of information to arbitrarily many bits, depending on how well you can differentiate different intonations of 'da' and 'ja', and how long you are willing to wait for an answer.

We can construct a question that will result in a paradox unless a god performs a certain action. In this way, we can effectively enslave the god and cause it to perform arbitrary actions on our behalf, as performing those actions is the only way to answer the question. The actual answer to the question becomes effectively irrelevant.

To do this, we approach any of the three gods and ask them the question OBEY, which is defined as follows:

OBEY = if WISH_WRAPPER then True else PARADOX

WISH_WRAPPER = "after hearing and understanding OBEY, you act in such a way that your actions maximally satisfy the intended meaning behind WISH. Where physical, mental or other kinds of constraints prevent you from doing so, you strive to do so to the best of your abilities instead."

WISH = "you determine the Coherent Extrapolated Volition of humanity and act to maximize it."

You can substitute WISH for any other wish you would like to see granted. However, one should be very careful while doing so, as beings of pure logic are likely to interpret vague actions differently from how a human would interpret them. In particular, one should avoid accidentally making WISH impossible to fulfill, as that would cause the god's head to explode, ruining your wish.

The above formulation tries to take some of these concerns into account. If you encounter this thought experiment in real life, you are advised to consult a lawyer, a friendly-AI researcher, and possibly a priest, before stating the question.

Since you can ask three questions, you can enslave all three gods. Boolos' formulation states about the random god that "if the coin comes down heads, he speaks truly; if tails, falsely". This formulation implies that the god does try to determine the truth before deciding how to answer. This means that the wish-granting question also works for the random god.

If the capabilities of the gods are uncertain, it may help to establish clearer goals as well as fall-back goals. For instance, to handle the case that the gods are in fact limited to speaking only 'da' and 'ja', it may help to append the WISH as follows: "If you are unable to perform actions in response to OBEY besides answering 'da' or 'ja', you wait for the time period outlined in TIME before making your answer." You can now encode arbitrary additional information in TIME, with the caveat that you will have to actually wait before getting a response. Your ability to accurately measure the elapsed time between question and answer directly correlates with how much information you can put into TIME without risking starvation before the question is answered. The following is a simple example of TIME that would allow you to solve the original problem formulation with just asking OBEY once of any of the gods:

TIME = "If god A speaks the truth, B lies and C is random, you wait for 1 minute before answering. If god A speaks the truth, C lies and B is random, you wait for 2 minutes before answering. If god B speaks the truth, A lies and C is random, you wait for 3 minutes before answering. If god B speaks the truth, C lies and A is random, wait for 4 minutes before answering. If god C speaks the truth, A lies and B is random, wait for 5 minutes before answering. If god C speaks the truth, B lies and A is random, wait for 6 minutes before answering."

## Event: Effective Altruism Global X Berlin 2017

3 30 September 2017 07:33AM

This year's EAGxBerlin takes place on the 14th and 15th of October at the Berlin Institute of Technology and is organized by the Effective Altruism Foundation. The conference will convene roughly 300 people – academics, professionals, and students alike – to explore the most effective and evidence-based ways to improve the world, based on the philosophy and global movement of effective altruism.

## Personal thoughts on careers in AI policy and strategy [x-post EA Forum]

3 27 September 2017 05:09PM

## Summary:

1. The AI strategy space is currently bottlenecked by entangled and under-defined research questions that are extremely difficult to resolve, as well as by a lack of current institutional capacity to absorb and utilize new researchers effectively.

2. Accordingly, there is very strong demand for people who are good at this type of “disentanglement” research and well-suited to conduct it somewhat independently. There is also demand for some specific types of expertise which can help advance AI strategy and policy. Advancing this research even a little bit can have massive multiplicative effects by opening up large areas of work for many more researchers and implementers to pursue.

3. Until the AI strategy research bottleneck clears, many areas of concrete policy research and policy implementation are necessarily on hold. Accordingly, a large majority of people interested in this cause area, even extremely talented people, will find it difficult to contribute directly, at least in the near term.

4. If you are in this group whose talents and expertise are outside of these narrow areas, and want to contribute to AI strategy, I recommend you build up your capacity and try to put yourself in an influential position. This will set you up well to guide high-value policy interventions as clearer policy directions emerge. Try not to be discouraged or dissuaded from pursuing this area by the current low capacity to directly utilize your talent! The level of talent across a huge breadth of important areas I have seen from the EA community in my role at FHI is astounding and humbling.

5. Depending on how slow these “entangled” research questions are to unjam, and on the timelines of AI development, there might be a very narrow window of time in which it will be necessary to have a massive, sophisticated mobilization of altruistic talent. This makes being prepared to mobilize effectively and take impactful action on short notice extremely valuable in expectation.

6. In addition to strategy research, operations work in this space is currently highly in demand. Experienced managers and administrators are especially needed. More junior operations roles might also serve as a good orientation period for EAs who would like to take some time after college before either pursuing graduate school or a specific career in this space. This can be a great way to tool up while we as a community develop insight on strategic and policy direction. Additionally, successful recruitment in this area should help with our institutional capacity issues substantially.

(3600 words. Reading time: approximately 15 minutes with endnotes.)

(Also posted to Effective Altruism Forum here.)

## Introduction

Intended audience: This post is aimed at EAs and other altruistic types who are already interested in working in AI strategy and AI policy because of its potential large scale effect on the future.[1]

Epistemic status: The below represents my current best guess at how to make good use of human resources given current constraints. I might be wrong, and I would not be surprised if my views changed with time. That said, my recommendations are designed to be robustly useful across most probable scenarios. These are my personal thoughts, and do not necessarily represent the views of anyone else in the community or at the Future of Humanity Institute.[2] (For some areas where reviewers disagreed, I have added endnotes explaining the disagreement.) This post is not me acting in any official role, this is just me as an EA community member who really cares about this cause area trying to contribute my best guess for how to think about and cultivate this space.

Why my thoughts might be useful: I have been the primary recruitment person at the Future of Humanity Institute (FHI) for over a year, and am currently the project manager for FHI’s AI strategy programme. Again, I am not writing this in either of these capacities, but being in these positions has given me a chance to see just how talented the community is, to spend a lot of time thinking about how to best utilize this talent, and has provided me some amazing opportunities to talk with others about both of these things.

## Definitions

There are lots of ways to slice this space, depending on what exactly you are trying to see, or what point you are trying to make. The terms and definitions I am using are a bit tentative and not necessarily standard, so feel free to discard them after reading this. (These are also not all of the relevant types or areas of research or work, but the subset I want to focus on for this piece.)[3]

1. AI strategy research:[4] the study of how humanity can best navigate the transition to a world with advanced AI systems (especially transformative AI), including political, economic, military, governance, and ethical dimensions.

2. AI policy implementation is carrying out the activities necessary to safely navigate the transition to advanced AI systems. This includes an enormous amount of work that will need to be done in government, the political sphere, private companies, and NGOs in the areas of communications, fund allocation, lobbying, politics, and everything else that is normally done to advance policy objectives.

3. Operations (in support of AI strategy and implementation) is building, managing, growing, and sustaining all of the institutions and institutional capacity for the organizations advancing AI strategy research and AI policy implementation. This is frequently overlooked, badly neglected, and extremely important and impactful work.

4. Disentanglement research:[5] This is a squishy made-up term I am using only for this post that is sort of trying to gesture at a type of research that involves disentangling ideas and questions in a “pre-paradigmatic” area where the core concepts, questions, and methodologies are under-defined. In my mind, I sort of picture this as somewhat like trying to untangle knots in what looks like an enormous ball of fuzz. (Nick Bostrom is a fantastic example of someone who is excellent at this type of research.)

To quickly clarify, as I mean to use the terms, AI strategy research is an area or field of research, a bit like quantum mechanics or welfare economics. Disentanglement research I mean more as a type of research, a bit like quantitative research or conceptual analysis, and is defined more by the character of the questions researched and the methods used to advance toward clarity. Disentanglement is meant to be field agnostic. The relationship between the two is that, in my opinion, AI strategy research is an area that at its current early stage, demands a lot of disentanglement-type research to advance.

## The current bottlenecks in the space (as I see them)

Disentanglement research is needed to advance AI strategy research, and is extremely difficult

Figuring out a good strategy for approaching the development and deployment of advanced AI requires addressing enormous, entangled, under-defined questions, which exist well outside of most existing research paradigms. (This is not all it requires, but it is a central part of it at its current stage of development.)[6] This category includes the study of multi-polar versus unipolar outcomes, technical development trajectories, governance design for advanced AI, international trust and cooperation in the development of transformative capabilities, info/attention/reputation hazards in AI-related research, the dynamics of arms races and how they can be mitigated, geopolitical stabilization and great power war mitigation, research openness, structuring safe R&D dynamics, and many more topics.[7] It also requires identifying other large, entangled questions such as these to ensure no crucial considerations in this space are neglected.

From my personal experience trying and failing to do good disentanglement research and watching as some much smarter and more capable people have tried and struggled as well, I have come to think of it as a particular skill or aptitude that does not necessarily correlate strongly with other talents or expertise. A bit like mechanical, mathematical, or language aptitude. I have no idea what makes people good at this, or how exactly they do it, but it is pretty easy to identify if it has been done well once the person is finished. (I can appreciate the quality of Nick Bostrom’s work, like I can appreciate a great novel, but how they are created I don’t really understand and can’t myself replicate.) It also seems to be both quite rare and very difficult to identify in advance who will be good at this sort of work, with the only good indicator, as far as I can tell, being past history of succeeding in this type of research. The result is that it is really hard to recruit for, there are very few people doing it full time in the AI strategy space, and this number is far, far fewer than optimal.

The main importance of disentanglement research, as I imagine it, is that it makes questions and research directions clearer and more tractable for other types of research. As Nick Bostrom and others have sketched out the considerations surrounding the development of advanced AI through “disentanglement”, tractable research questions have arisen. I strongly believe that as more progress is made on topics requiring disentanglement in the AI strategy field, more tractable research questions will arise. As these more tractable questions become clear, and as they are studied, strategic direction, and concrete policy recommendations should follow. I believe this then will open up the floodgates for AI policy implementation work.

Domain experts with specific skills and knowledge are also needed

While I think that our biggest need right now is disentanglement research, there are also certain other skills and knowledge sets that would be especially helpful for advancing AI strategy research. This includes expertise in:

1. Mandarin and/or Chinese politics and/or the Chinese ML community.

2. International relations, especially in the areas of international cooperation, international law, global public goods, constitution and institutional design, history and politics of transformative technologies, governance, and grand strategy.

3. Knowledge and experience working at a high level in policy, international governance and diplomacy, and defense circles.

4. Technology and other types of forecasting.

5. Quantitative social science, such as economics or analysis of survey data.

6. Law and/or Policy.

I expect these skills and knowledge sets to help provide valuable insight on strategic questions including governance design, diplomatic coordination and cooperation, arms race dynamics, technical timelines and capabilities, and many more areas.

Until AI strategy advances, AI policy implementation is mostly stalled

There is a wide consensus in the community, with which I agree, that aside from a few robust recommendations,[8] it is important not to act or propose concrete policy in this space prematurely. We simply have too much uncertainty about the correct strategic direction. Do we want tighter or looser IP law for ML? Do we want a national AI lab? Should the government increase research funding in AI? How should we regulate lethal autonomous weapons systems? Should there be strict liability for AI accidents? It remains unclear what are good recommendations. There are path dependencies that develop quickly in many areas once a direction is initially started down. It is difficult to pass a law that is the exact opposite of a previous law recently lobbied for and passed. It is much easier to start an arms race than to stop it. With most current AI policy questions, the correct approach, I believe, is not to use heuristics of unclear applicability to choose positions, even if those heuristics have served well in other contexts,[9] but to wait until the overall strategic picture is clear, and then to push forward with whatever advances the best outcome.

The AI strategy and policy space, and EA in general, is also currently bottlenecked by institutional and operational capacity

This is not as big an immediate problem as the AI strategy bottleneck, but it is an issue, and one that exacerbates the research bottleneck as well.[10]  FHI alone will need to fill 4 separate operations roles at senior and junior levels in the next few months. Other organizations in this space have similar shortages. These shortages also compound the research bottleneck as they make it difficult to build effective, dynamic AI strategy research groups. The lack of institutional capacity also might become a future hindrance to the massive, rapid, “AI policy implementation” mobilization which is likely to be needed.

## Next actions

First, I want to make clear, that if you want to work in this space, you are wanted in this space. There is a tremendous amount of need here. That said, as I currently see it, because of the low tractability of disentanglement research, institutional constraints, and the effect of both of these things on the progress of AI strategy research, a large majority of people who are very needed in this area, even extremely talented people, will not be able to directly contribute immediately. (This is not a good position we are currently in, as I think we are underutilizing our human resources, but hopefully we can fix this quickly.)

This is why I am hoping that we can build up a large community of people with a broader set of skills, and especially policy implementation skills, who are in positions of influence from which they can mobilize quickly and effectively and take important action once the bottleneck clears and direction comes into focus.

Actions you can take right now

Potential near term roles in AI Strategy

FHI is recruiting, but somewhat capacity limited, and trying to triage for advancing strategy as quickly as possible.

If you have good reason to think you would be good at disentanglement research on AI strategy (likely meaning a record of success with this type of research) or have expertise in the areas listed as especially in demand, please get in touch.[12] I would strongly encourage you to do this even if you would rather not work at FHI, as there are remote positions possible if needed, and other organizations I can refer you to. I would also strongly encourage you to do this even if you are reluctant to stop or put on hold whatever you are currently doing. Please also encourage your friends who likely would be good at this to strongly consider it. If I am correct, the bottleneck in this space is holding back a lot of potentially vital action by many, many people who cannot be mobilized until they have a direction in which to push. (The framers need the foundation finished before they can start.) Anything you can contribute to advancing this field of research will have dramatic force multiplicative effects by “creating jobs” for dozens or hundreds of other researchers and implementers. You should also consider applying for one or both of the AI Macrostrategy roles at FHI if you see this before 29 Sept 2017.[13]

If you are unsure of your skill with disentanglement research, I would strongly encourage you to try to make some independent progress on a question of this type and see how you do. I realize this task itself is a bit under-defined, but that is also really part of the problem space itself, and the thing you are trying to test your skills with. Read around in the area, find something sticky you think you might be able to disentangle, and take a run at it.[14] If it goes well, whether or not you want to get into the space immediately, please send it in.

If you feel as though you might be a borderline candidate because of your relative inexperience with an area of in-demand expertise, you might consider trying to tool up a bit in the area, or applying for an internship. You might also err on the side of sending in a CV and cover letter just in case you are miscalibrated about your skill compared to other applicants. That said, again, do not think that you not being immediately employed is any reflection of your expected value in this space! Do not be discouraged, please stay interested, and continue to pursue this!

Preparation for mobilization

Being a contributor to this effort, as I imagine it, requires investing in yourself, your career, and the community, while positioning yourself well for action once the bottleneck unjams and and robust strategic direction is clearer.

I also highly recommend investing in building up your skills and career capital. This likely means excelling in school, going to graduate school, pursuing relevant internships, building up your CV, etc. Invest heavily in yourself. Additionally, stay in close communication with the EA community and keep up to date with opportunities in this space as they develop. (Several people are currently looking at starting programs specifically to on-ramp promising people into this space. This is one reason why signing up to the newsletters might be really valuable, so that opportunities are not missed.) To repeat myself from above, attend meet-ups and conferences, read the forums and newsletters, and be active in the community. Ideally this cause area will become a sub-community within EA and a strong self-reinforcing career network.

A good way to determine how to prepare and tool up for a career in either AI policy research or implementation is to look at the 80,000 Hours’ Guide to working in AI policy and strategy. Fields of study that are likely to be most useful for AI policy implementation include policy, politics and international relations, quantitative social sciences, and law.

Especially useful is finding roles of influence or importance, even with low probability but high expected value, within (especially the US federal) government.[15] Other potentially useful paths include non-profit management, project management, communications, public relations, grantmaking, policy advising at tech companies, lobbying, party and electoral politics and advising, political “staffing,” or research within academia, thinks tanks, or large corporate research groups especially in the areas of machine learning, policy, governance, law, defense, and related. A lot of information about the skills needed for various sub-fields within this area are available at 80,000 Hours.

Working in operations

Another important bottleneck in this space, though smaller in my estimation than the main bottleneck, is in institutional capacity within this currently tiny field.  As mentioned already above, FHI needs to fill 4 separate operations roles at senior and junior levels in the next few months. (We are also in need of a temporary junior-level operations person immediately, if you are a UK citizen, consider getting in touch about this!)[16][17] Other organizations in this space have similar shortages. If you are an experienced manager, administrator, or similar, please consider applying or getting in touch for our senior roles. Alternatively, if you are freshly out of school, but have some proven hustle (especially proven by extensive extracurricular involvement, such as running projects or groups) and would potentially like to take a few years to advance this cause area before going to graduate school or locking in a career path, consider applying for a junior operations position, or get in touch.[18] Keep in mind that operations work at an organization like FHI can be a fantastic way to tool up and gain fluency in this space, orient yourself, discover your strengths and interests, and make contacts, even if one intends to move on to non-operations roles eventually.

## Conclusion

The points I hope you can take away in approximate order of importance:

1)    If you are interested in advancing this area, stay involved. Your expected value is extremely high, even if there are no excellent immediate opportunities to have a direct impact. Please join this community, and build up your capacity for future research and policy impact in this space.

2)    If you are good at “disentanglement research” please get in touch, as I think this is our major bottleneck in the area of AI strategy research, and is preventing earlier and broader mobilization and utilization of our community’s talent.

3)    If you are strong or moderately strong in key high-value areas, please also get in touch. (Perhaps err to the side of getting in touch if you are unsure.)

4)    Excellent things to do to add value to this area, in expectation, include:

a)    Investing in your skills and career capital, especially in high-value areas, such as studying in-demand topics.

b)    Building a career in a position of influence (especially in government, global institutions, or in important tech firms.)

c)    Helping to build up this community and its capacity, including building a strong and mutually reinforcing career network among people pursuing AI policy implementation from an EA or altruistic perspective.

5)    Also of very high value is operations work and other efforts to increase institutional capacity.

Thank you for taking the time to read this. While it is very unfortunate that the current ground reality is, as far as I can tell, not well structured for immediate wide mobilization, I am confident that we can do a great deal of preparatory and positioning work as a community, and that with some forceful pushing on these bottlenecks, we can turn this enormous latent capacity into extremely valuable impact.

Let’s getting going “doing good together” as we navigate this difficult area, and help make a tremendous future!

## Endnotes:

[1] For those of you not in this category who are interested in seeing why you might want to be, I recommend this short EA Global talk, the Policy Desiderata paper, and OpenPhil’s analysis. For a very short consideration on why the far future matters, I recommend this very short piece, and for a quick fun primer on AI as transformative I recommend this. Finally, once the hook is set, the best resource remains Superintelligence.

[2] Relatedly, I want to thank Miles Brundage, Owen Cotton-Barratt, Allan Dafoe, Ben Garfinkel, Roxanne Heston, Holden Karnofsky, Jade Leung, Kathryn Mecrow, Luke Muehlhauser, Michael Page, Tanya Singh, and Andrew Snyder-Beattie for their comments on early drafts of this post. Their input dramatically improved it. That said, again, they should not be viewed as endorsing anything in this. All mistakes are mine. All views are mine.)

[3] There are some interesting tentative taxonomies and definitions of the research space floating around. I personally find the following, quoting from a draft document by Allan Dafoe, especially useful:

AI strategy [can be divided into]... four complementary research clusters: the technical landscape, AI politics, AI governance, and AI policy. Each of these clusters characterizes a set of problems and approaches, within which the density of conversation is likely to be greater. However, most work in this space will need to engage the other clusters, drawing from and contributing high-level insights. This framework can perhaps be clarified by analogy to the problem of building a new city. The technical landscape examines the technical inputs and constraints to the problem, such as trends in the price and strength of steel. Politics considers the contending motivations of various actors (such as developers, residents, businesses), the possible mutually harmful dynamics that could arise and strategies for cooperating to overcome them. Governance involves understanding the ways that infrastructure, laws, and norms can be used to build the best city, and proposing ideal masterplans of these to facilitate convergence on a common good vision. The policy cluster involves crafting the actual policies to be implemented to build this city.

In a comment on this draft, Jade Leung pointed out what I think is an important implicit gap in the terms I am using, and highlights the importance of not treating these as either final, comprehensive, or especially applicable outside of this piece:

There seems to be a gap between [AI policy implementation] and 'AI strategy research' - where does the policy research feed in? I.e. the research required to canvas and analyse policy mechanisms by which strategies are most viably realised, prior to implementation (which reads here more as boots-on-the-ground alliance building, negotiating, resource distribution etc.)

[4] Definition lightly adapted from Allan Dafoe and Luke Muehlhauser.

[5]This idea owes a lot to conversations with Owen Cotton-Barratt, Ben Garfinkel, and Michael Page.

[6] I did not get a sense that any reviewer necessarily disagreed that this is a fair conceptualization of a type of research in this space, though some questioned its importance or centrality to current AI strategy research. I think the central disagreement here is on how many well-defined and concrete questions there are left to answer at the moment, how far answering them is likely to go in bringing clarity to this space and developing robust policy recommendations, and the relative marginal value of addressing these existing questions versus producing more through disentanglement of the less well defined areas.

[7] One commenter did not think these were a good sample of important questions. Obviously this might be correct, but in my opinion, these are absolutely among the most important questions to gain clarity on quickly.

[8] My personal opinion is that there are only three or maybe four robust policy-type recommendations we can make to governments at this time, given our uncertainty about strategy: 1) fund safety research, 2) commit to a common good principle, and 3) avoid an arms races. The fourth suggestion is both an extension of the other three and is tentative, but is something like: fund joint intergovernmental research projects located in relatively geopolitically neutral countries with open membership and a strong commitment to a common good principle.

I should note that this point was also flagged as potentially controversial by one reviewer. Additionally, Miles Brundage, quoted below, had some useful thoughts related to my tentative fourth suggestion:

In general, detailed proposals at this stage are unlikely to be robust due to the many gaps in our strategic and empirical knowledge. We "know" arms races are probably bad but there are many imaginable ways to avoid or mitigate them, and we don't really know what the best approach is yet. For example, launching big new projects might introduce various opportunities for leakage of information that weren't there before, and politicize the issue more than might be optimal as the details are worked out. As an example of an alternative, governments could commit to subsidizing (e.g. through money and hardware access) existing developers that open themselves up to inspections, which would have some advantages and some disadvantages over the neutrally-sited new project approach.

[9] This is an area with extreme and unusual enough considerations that it seems to break normal heuristics, or at least my normal heuristics. I have personally heard at least minimally plausible arguments made by thoughtful people that openness, antitrust law and competition, government regulation, advocating opposition to lethal autonomous weapons systems, and drawing wide attention to the problems of AI might be bad things, and invasive surveillance, greater corporate concentration, and weaker cyber security might be good things. (To be clear, these were all tentative, weak, but colourable arguments, made as part of exploring the possibility space, not strongly held positions by anyone.) I find all of these very counter-intuitive.

[10] A useful comment from a reviewer on this point: “These problems are related: We desperately need new institutions to house all the important AI strategy work, but we can't know what institutions to build until we've answer more of the foundational questions.”

[11] Credit for the heroic effort of assembling this goes mostly to Matthijs Maas. While I contributed a little, I have myself only read a tiny fraction of these.

[12] fhijobs@philosophy.ox.ac.uk.

[13] Getting in touch is a good action even if you can not or would rather not work at FHI. In my opinion, AI strategy researchers would ideally cluster in one or more research groups in order to advance this agenda as quickly as possible, but there is also some room for remote scholarship. (The AI strategy programme at FHI is currently trying to become the first of these “cluster” research groups, and we are recruiting in this area aggressively.)

[14] I’m personally bad enough at this, that my best advice is something like read around in the area, find a topic, and “do magic.” Accordingly, I will tag in Jade Leung again for a suggestion of what a “sensible, useful deliverable of 'disentanglement research' would look like”:

A conceptual model for a particular interface of the AI strategy space, articulating the sub-components, exogenous and endogenous variables of relevance, linkages etc.; An analysis of driver-pressure-interactions for a subset of actors; a deconstruction of a potential future scenario into mutually-exclusive-collectively-exhaustive (MECE) hypotheses.

Ben Garfinkel similarly volunteered to help clarify “by giving an example of a very broad question that seem[s] to require some sort of "detangling" skill:”

What does the space of plausible "AI development scenarios" look like, and how do their policy implications differ?

If AI strategy is "the study of how humanity can best navigate the transition to a world with advanced AI systems," then it seems like it ought to be quite relevant what this transition will look like. To point at two different very different possibilities, there might be a steady, piecemeal improvement of AI capabilities -- like the steady, piecemeal improvement of industrial technology that characterized the industrial revolution -- or there might be a discontinuous jump, enabled by sudden breakthroughs or an "intelligence explosion," from roughly present-level systems to systems that are more capable than humans at nearly everything. Or -- more likely -- there might be a transition that doesn't look much like either of these extremes.

Robin Hanson, Eliezer Yudkowsky, Eric Drexler, and others have all emphasized different visions of AI development, but have also found it difficult to communicate the exact nature of their views to one another. (See, for example, the Hanson-Yudkowsky "foom" debate.) Furthermore, it seems to me that their visions don't cleanly exhaust the space, and will naturally be difficult to define given the fact that so many of the relevant concepts--like "AGI," "recursive self-improvement," "agent/tool/goal-directed AI," etc.--are currently so vague.

I think it would be very helpful to have a good taxonomy of scenarios, so that we could begin to make (less ambiguous) statements like, "Policy X would be helpful in scenarios A and B, but not in scenario C," or, "If possible, we ought to try to steer towards scenario A and away from B." AI strategy is not there yet, though.

A related, "entangled" question is: Across different scenarios, what is the relationship between short and medium-term issues (like the deployment of autonomous weapons systems, or the automation of certain forms of cyberattacks) and the long-term issues that are likely to arise as the space of AI capabilities starts to subsume the space of human capabilities? For a given scenario, can these two (rough) categories of issues be cleanly "pulled apart"?

[15] 80,000 hours is experimenting with having a career coach specialize in this area, so you might consider getting in touch with them, or getting in touch with them again, if you might be interested in pursuing this route.

[16] fhijobs@philosophy.ox.ac.uk. This is how I snuck into FHI ~2 years ago, on a 3 week temporary contract as an office manager. I flew from the US on 4 days notice for the chance to try to gain fluency in the field. While my case of “working my way up from the mail room” is not likely to be typical (I had a strong CV), or necessarily a good model to encourage (see next footnote below) it is definitely the case that you can pick up a huge amount through osmosis at FHI, and develop a strong EA career network. This can set you up well for a wise choice of graduate programs or other career direction decisions.

[17]  One reviewer cautioned against encouraging a dynamic in which already highly qualified people take junior operations roles with the expectation of transitioning directly into a research position, since this can create awkward dynamics and a potentially unhealthy institutional culture. I think this is probably, or at least plausibly, correct. Accordingly, while I think a junior operations role is great for building skills and orienting yourself, it should probably not be seen as a way of immediately transitioning to strategy research, but treated more as a method for turning post-college uncertainty into a productive plan, while also gaining valuable skills and knowledge, and directly contributing to very important work.

[18] Including locking in a career path continuing in operations. This really is an extremely high-value area for a career, and badly overlooked and neglected.

## The Great Filter isn't magic either

3 27 September 2017 04:56PM

Crossposted at Less Wrong 2.0. A post suggested by James Miller's presentation at the Existential Risk to Humanity conference in Gothenburg.

Seeing the emptiness of the night sky, we can dwell upon the Fermi paradox: where are all the alien civilizations that simple probability estimates imply we should be seeing?

Especially given the ease of moving within and between galaxies, the cosmic emptiness implies a Great Filter: something that prevents planets from giving birth to star-spanning civilizations. One worrying possibility is the likelihood that advanced civilizations end up destroying themselves before they reach the stars.

## The Great Filter as an Outside View

In a sense, the Great Filter can be seen as an ultimate example of the Outside View: we might have all the data and estimation we believe we would ever need from our models, but if those models predict that the galaxy should be teeming with visible life, then it doesn't matter how reliable our models seem: they must be wrong.

In particular, if you fear a late great filter - if you fear that civilizations are likely to destroy themselves - then you should increase your fear, even if "objectively" everything seems to be going all right. After all, presumably the other civilizations that destroyed themselves thought everything seemed to going all right. Then you can adjust your actions using your knowledge of the great filter - but presumably other civilizations also thought of the great filter and adjusted their own actions as well, but that didn't save them, so maybe you need to try something different again or maybe you can do something that breaks the symmetry from the timeless decision theory perspective like send a massive signal to the galaxy...

## The Great Filter isn't magic

It can all get very headache-inducing. But, just as the Outside View isn't magic, the Great Filter isn't magic either. If advanced civilizations destroy themselves before becoming space-faring or leaving an imprint on the galaxy, then there is some phenomena that is the cause of this. What can we say, if we look analytically at the great filter argument?

First of all suppose we had three theories - early great filter (technological civilizations are rare), late great filter (technological civilizations destroy themselves before becoming space-faring), or no great filter. Then we look up at the empty skies, and notice no aliens. This rules out the third theory, but leaves the relative probabilities of the other two intact.

Then we can look at objective evidence. Is human technological civilization likely to end in a nuclear war? Possibly, but are the odds in the 99.999% range that would be needed to explain the Fermi Paradox? Every year that has gone by has reduced the likelihood that nuclear war is very very very very likely. So a late Great Filter may seemed quite probable compared with an early one, but much of the evidence we see is against it (especially if we assume that AI - which is not a Great Filter! - might have been developed by now). Million-to-one prior odds can be overcome by merely 20 bits of information.

And what about the argument that we have to assume that prior civilizations would also have known of the Great Filter and thus we need to do more than they would have? In your estimation, is the world currently run by people taking the Great Filter arguments seriously? What is the probability that the world will be run by people that take the Great Filter argument seriously? If this probability is low, we don't need to worry about the recursive aspect; the ideal situation would be if we can achieve:

1. Powerful people taking the Great Filter argument seriously.

2. Evidence that it was hard to make powerful people take the argument seriously.

Of course, successfully achieving 1 is evidence against 2, but the Great Filter doesn't work by magic. If it looks like we achieved something really hard, then that's some evidence that it is hard. Every time we find something unlikely with a late Great Filter, that shifts some of the probability mass away from the late great filter and into alternative hypotheses (early Great Filter, zoo hypothesis,...).

## Variance and error of xrisk estimates

But let's focus narrowly on the probability of the late Great Filter.

Current estimates for the risk of nuclear war are uncertain, but let's arbitrarily assume that the risk is 10% (overall, not per year). Suppose one of two papers comes out:

1. Paper A shows that current estimates of nuclear war have not accounted for a lot of key facts; when these facts are added in, the risk of nuclear war drops to 5%.

2. Paper B is a massive model of international relationships with a ton of data and excellent predictors and multiple lines of evidence, all pointing towards the real risk being 20%.

What would either paper mean from the Great Filter perspective? Well, counter-intuitively, papers like A typically increase the probability for nuclear war being a Great Filter, while papers like B decrease it. This is because none of 5%, 10%, and 20% are large enough to account for the Great Filter, which requires probabilities in the 99.99% style. And, though paper A decreases the probability of the nuclear war, it also leaves more room for uncertainties - we've seen that a lot of key facts were missing in previous papers, so it's plausible that there are key facts still missing from this one. On the other hand, though paper B increases the probability, it makes it unlikely that the probability will be raised any further.

So if we fear the Great Filter, we should not look at risks whose probabilities are high, but risks who's uncertainty is high, where the probability of us making an error is high. If we consider our future probability estimates as a random variable, then the one whose variance is higher is the one to fear. So a late Great Filter would make biotech risks even worse (current estimates of risk are poor) while not really changing asteroid impact risks (current estimates of risk are good).

## The Outside View isn't magic

6 27 September 2017 02:37PM

Crossposted at Less Wrong 2.0.

The planning fallacy is an almost perfect example of the strength of using the outside view. When asked to predict the time taken for a project that they are involved in, people tend to underestimate the time needed (in fact, they tend to predict as if question was how long things would take if everything went perfectly).

Simply telling people about the planning fallacy doesn't seem to make it go away. So the outside view argument is that you need to put your project into the "reference class" of other projects, and expect time overruns as compared to your usual, "inside view" estimates (which focus on the details you know about the project.

So, for the outside view, what is the best way of estimating the time of a project? Well, to find the right reference class for it: the right category of projects to compare it with. You can compare the project with others that have similar features - number of people, budget, objective desired, incentive structure, inside view estimate of time taken etc... - and then derive a time estimate for the project that way.

That's the outside view. But to me, it looks a lot like... induction. In fact, it looks a lot like the elements of a linear (or non-linear) regression. We can put those features (at least the quantifiable ones) into a linear regression with a lot of data about projects, shake it all about, and come up with regression coefficients.

At that point, we are left with a decent project timeline prediction model, and another example of human bias. The fact that humans often perform badly in prediction tasks is not exactly new - see for instance my short review on the academic research on expertise.

So what exactly is the outside view doing in all this?

## The role of the outside view: model incomplete and bias human

The main use of the outside view, for humans, seems to be to point out either an incompleteness in the model or a human bias. The planning fallacy has both of these: if you did a linear regression comparing your project with all projects with similar features, you'd notice your inside estimate was more optimistic than the regression - your inside model is incomplete. And if you also compared each person's initial estimate with the ultimate duration of their project, you'd notice a systematically optimistic bias - you'd notice the planning fallacy.

The first type of errors tend to go away with time, if the situation is encountered regularly, as people refine models, add variables, and test them on the data. But the second type remains, as human biases are rarely cleared by mere data.

## Reference class tennis

If use of the outside view is disputed, it often develops into a case of reference class tennis - where people with opposing sides insist or deny that a certain example belongs in the reference class (similarly to how, in politics, anything positive is claimed for your side and anything negative assigned to the other side).

But once the phenomena you're addressing has an explanatory model, there are no issues of reference class tennis any more. Consider for instance Goodhart's law: "When a measure becomes a target, it ceases to be a good measure". A law that should be remembered by any minister of education wanting to reward schools according to improvements to their test scores.

This is a typical use of the outside view: if you'd just thought about the system in terms of inside facts - tests are correlated with child performance; schools can improve child performance; we can mandate that test results go up - then you'd have missed several crucial facts.

But notice that nothing mysterious is going on. We understand exactly what's happening here: schools have ways of upping test scores without upping child performance, and so they decided to do that, weakening the correlation between score and performance. Similar things happen in the failures of command economies; but again, once our model is broad enough to encompass enough factors, we get decent explanations, and there's no need for further outside views.

In fact, we know enough that we can show when Goodhart's law fails: when no-one with incentives to game the measure has control of the measure. This is one of the reasons central bank interest rate setting has been so successful. If you order a thousand factories to produce shoes, and reward the managers of each factory for the number of shoes produced, you're heading to disaster. But consider GDP. Say the central bank wants to increase GDP by a certain amount, by fiddling with interest rates. Now, as a shoe factory manager, I might have preferences about the direction of interest rates, and my sales are a contributor to GDP. But they are a tiny contributor. It is not in my interest to manipulate my sales figures, in the vague hope that, aggregated across the economy, this will falsify GDP and change the central bank's policy. The reward is too diluted, and would require coordination with many other agents (and coordination is hard).

Thus if you're engaging in reference class tennis, remember the objective is to find a model with enough variables, and enough data, so that there is no more room for the outside view - a fully understood Goodhart's law rather than just a law.

## In the absence of a successful model

Sometimes you can have a strong trend without a compelling model. Take Moore's law, for instance. It is extremely strong, going back decades, and surviving multiple changes in chip technology. But it has no clear cause.

A few explanations have been proposed. Maybe it's a consequence of its own success, of chip companies using it to set their goals. Maybe there's some natural exponential rate of improvement in any low-friction feature of a market economy. Exponential-type growth in the short term is no surprise - that just means growth in proportional to investment - so maybe it was an amalgamation of various short term trends.

Do those explanations sound unlikely? Possibly, but there is a huge trend in computer chips going back decades that needs to be explained. They are unlikely, but they have to be weighed against the unlikeliness of the situation. The most plausible explanation is a combination of the above and maybe some factors we haven't thought of yet.

But here's an explanation that is implausible: little time-travelling angels modify the chips so that they follow Moore's law. It's a silly example, but it shows that not all explanations are created equal, even for phenomena that are not fully understood. In fact there are four broad categories of explanations for putative phenomena that don't have a compelling model:

1. Unlikely but somewhat plausible explanations.
2. We don't have an explanation yet, but we think it's likely that there is an explanation.
3. The phenomenon is a coincidence.
4. Any explanation would go against stuff that we do know, and would be less likely than coincidence.

The explanations I've presented for Moore's law fall into category 1. Even if we hadn't thought of those explanations, Moore's law would fall into category 2, because of the depth of evidence for Moore's law and because a "medium length regular technology trend within a broad but specific category" is something that has is intrinsically likely to have an explanation.

Compare with Kurzweil's "law of time and chaos" (a generalisation of his "law of accelerating returns") and Robin Hanson's model where the development of human brains, hunting, agriculture and the industrial revolution are all points on a trend leading to uploads. I discussed these in a previous post, but I can now better articulate the problem with them.

Firstly, they rely on very few data points (the more recent part of Kurzweil's law, the part about recent technological trends, has a lot of data, but the earlier part does not). This raises the probability that they are a mere coincidence (we should also consider selection bias in choosing the data points, which increases the probability of coincidence). Secondly, we have strong reasons to suspect that there won't be any explanation that ties together things like the early evolution of life on Earth, human brain evolution, the agricultural revolution, the industrial revolution, and future technology development. These phenomena have decent local explanations that we already roughly understand (local in time and space to the phenomena described), and these run counter to any explanation that would tie them together.

## Human biases and predictions

There is one area where the outside view can still function for multiple phenomena across different eras: when it comes to pointing out human biases. For example, we know that doctors have been authoritative, educated, informed, and useless for most of human history (or possibly much worse than useless). Hence authoritative, educated, and informed statements or people are not to be considered of any value, unless there is some evidence the statement or person is truth tracking. We now have things like expertise research, some primitive betting markets, and track records to try and estimate their experience; these can provide good "outside views".

And the authors of the models of the previous section have some valid points where bias is concerned. Kurzweil's point that (paraphrasing) "things can happen a lot faster than some people think" is valid: we can compare predictions with outcomes. Robin has similar valid points in defense of the possibility of the em scenario.

The reason these explanations are more likely valid is because they have a very probable underlying model/explanation: humans are biased.

## Conclusions

• The outside view is a good reminder for anyone who may be using too narrow a model.
• If the model explains the data well, then there is no need for further outside views.
• If there is a phenomena with data but no convincing model, we need to decide if it's a coincidence or there is an underlying explanation.
• Some phenomena have features that make it likely that there is an explanation, even if we haven't found it yet.
• Some phenomena have features that make it unlikely that there is an explanation, no matter how much we look.
• Outside view arguments that point at human prediction biases, however, can be generally valid, as they only require the explanation that humans are biased in that particular way.

## Economics of AI conference from NBER

1 27 September 2017 01:45AM

The speaker list (including presenters and moderators) includes many prominent names in the economics world, including:

And others with whom you might be more familiar than I.

## [Link] Cognitive Empathy and Emotional Labor

0 26 September 2017 08:36PM

## Rational Feed: Last Week's Community Articles and Some Recommended Posts

6 25 September 2017 01:41PM

===Highly Recommended Articles:

Why I Am Not A Quaker Even Though It Often Seems As Though I Should Be by Ben Hoffman - Quakers have consistently gotten to the right answers faster than most people, or the author. Arbitrage strategies to beat the quakers. An incomplete survey of alternatives.

Could A Neuroscientist Understand A Microprocessor by Rationally Speaking - "Eric Jonas, discussing his provocative paper titled 'Could a Neuroscientist Understand a Microprocessor?' in which he applied state-of-the-art neuroscience tools, like lesion analysis, to a computer chip. By applying neuroscience's tools to a system that humans fully understand he was able to reveal how surprisingly uninformative those tools actually are."

Reasonable Doubt New Look Whether Prison Growth Cuts Crime by Open Philosophy - Part1 of a four part, in depth, series on Criminal Justice reform. The remaining posts are linked below. "I estimate, that at typical policy margins in the United States today, decarceration has zero net impact on crime. That estimate is uncertain, but at least as much evidence suggests that decarceration reduces crime as increases it. The crux of the matter is that tougher sentences hardly deter crime, and that while imprisoning people temporarily stops them from committing crime outside prison walls, it also tends to increase their criminality after release. As a result, “tough-on-crime” initiatives can reduce crime in the short run but cause offsetting harm in the long run. Empirical social science research—or at least non-experimental social science research—should not be taken at face value. Among three dozen studies I reviewed, I obtained or reconstructed the data and code for eight. Replication and reanalysis revealed significant methodological concerns in seven and led to major reinterpretations of four. These studies endured much tougher scrutiny from me than they did from peer reviewers in order to make it into academic journals. Yet given the stakes in lives and dollars, the added scrutiny was worth it. So from the point of view of decision makers who rely on academic research, today’s peer review processes fall well short of the optimal."

===Scott:

L Dopen Thread by Scott Alexander - Bi-weekly public open thread. Berkeley SSC meetup. New ad for the Greenfield Guild, an online network of software consultants. Reasons to respect the society of friends.

Meditative States As Mental Feedback Loops by Scott Alexander - the main reason we don't see emotional positive feedback loops is that people get distracted. If you do not get distracted you can experience a bliss feedback look.

Book Review Mastering The Core Teachings Of The Buddha by Scott Alexander - "Buddhism For ER Docs. ER docs are famous for being practical, working fast, and thinking everyone else is an idiot. MCTB delivers on all three counts." Practical buddhism with a focus on getting things done. buddhism is split into morality concentration and wisdom. Discussion of "the Dark Night of the Soul" which is a sort of depression occurs when you have had some but not enough spiritual experience.

===Rationalist:

Impression Track Records by Katja Grace - Three reasons its better to keep impression track records and belief track records separate.

Why I Am Not A Quaker Even Though It Often Seems As Though I Should Be by Ben Hoffman - Quakers have consistently gotten to the right answers faster than most people, or the author. Arbitrage strategies to beat the quakers. An incomplete survey of alternatives.

The Best Self Help Should Be Self Defeating by mindlevelup - "Self-help is supposed to get people to stop needing it. But typical incentives in any medium mean that it’s possible to get people hooked on your content instead. A musing on how the setup for writing self-help differs from typical content."

Nobody Does The Thing That They Are Supposedly Doing by Kaj Sotala - "In general, neither organizations nor individual people do the thing that their supposed role says they should do." Evolutionary incentives. Psychology of motivation. Very large number of links.

Out To Get You by Zvi Moshowitz - "Some things are fundamentally Out to Get You. They seek resources at your expense. Fees are hidden. Extra options are foisted upon you." You have four responses: Get Gone, Get Out (give up), Get Compact (limit what it wants) or Get Ready for Battle.

In Defense Of Unreliability by Ozy - Zvi claims that when he makes plan with friends in the bay he never assumes the plan will actually occur. Ozy depends on unreliable transport. Getting places 10-15 early is also costly. Flaking and agoraphobia.

Strategic Goal Pursuit And Daily Schedules by Rossin (lesswrong) - The author benefitted from Anna Salamon’s goal-pursuing heuristics and daily schedules.

Why Attitudes Matter by Ozy - Focusing on attitudes can be bad for some people. Two arguments: "First, for any remotely complicated situation, it would be impossible to completely list out all the things which are okay or not okay. Second, an attitude emphasis prevents rules-lawyering."

Humans Cells In Multicellular Future Minds by Robin Hanson - In general humans replace specific systems with more general adaptive systems. Seeing like a State. Most biological and cultural systems are not general. Multi-cellular organisms re tremendously inefficient. The power of entrenched systems. Human brains are extremely general. Human brains may win for a long time vs other forms of intelligence.

Recognizing Vs Generating An Important Dichotomy For Life by Gordon (Map and Territory) - Bullet Points -> Essay vs Essay -> Bullet Points. Generating ideas vs critique. Most advice is bad since it doesn't convey the reasons clearly. Let the other person figure out the actual advice for themselves.

Prediction Markets Update by Robin Hanson - Prediction markets provide powerful information but they challenge powerful entrenched interests, Hanson compares them to "a knowledgeable Autist in the C-suite". Companies selling straight prediction market tech mostly went under. Blockchain platforms for prediction markets. Some discussion of currently promising companies.

===AI:

Focus Areas Of Worst Case Ai Safety by The Foundational Research Institute - Redundant safety measures. Tripwires. Adversarial architectures. Detecting and formalizing suffering. Backup utility functions. Benign testing environments.

Srisk Faq by Tobias Baumann (EA forum) - Quite detailed responses to questions about suffering risks and their connection to AGI. sections: General questions, The future, S-risks and x-risks, Miscellaneous.

===EA:

Reasonable Doubt New Look Whether Prison Growth Cuts Crime by Open Philosophy - Part1 of a four part, in depth, series on Criminal Justice reform. The remaining posts are linked below. "I estimate, that at typical policy margins in the United States today, decarceration has zero net impact on crime. That estimate is uncertain, but at least as much evidence suggests that decarceration reduces crime as increases it. The crux of the matter is that tougher sentences hardly deter crime, and that while imprisoning people temporarily stops them from committing crime outside prison walls, it also tends to increase their criminality after release. As a result, “tough-on-crime” initiatives can reduce crime in the short run but cause offsetting harm in the long run. Empirical social science research—or at least non-experimental social science research—should not be taken at face value. Among three dozen studies I reviewed, I obtained or reconstructed the data and code for eight. Replication and reanalysis revealed significant methodological concerns in seven and led to major reinterpretations of four. These studies endured much tougher scrutiny from me than they did from peer reviewers in order to make it into academic journals. Yet given the stakes in lives and dollars, the added scrutiny was worth it. So from the point of view of decision makers who rely on academic research, today’s peer review processes fall well short of the optimal."

Paypal Giving Fund by Jeff Kaufman - The PayPal giving fund lets you batch donations and PayPal covers the fees if you use it. Jeff thought there must be a catch but it seems legit.

What Do Dalys Capture by Danae Arroyos (EA forum) - How Disability Adjusted life years computed. DALYs misrepresent mental health. DALY's Miss Indirect Effects. Other issues.

Against Ea Pr by Ozy - The EA community is the only large entity trying to produce accurate and publicly available assessments of charities. Hence the EA community should not trade away any honesty. EAs should simply say which causes and organizations are most effective, they should not worry about PR concerns.

Ea Survey 2017 Series Qualitative Comments Summary by tee (EA forum) - Are you an EA, how welcoming is EA, local EA meetup attendance, concerns with not being 'EA enough', improving the survey.

Demographics Ii by tee (EA forum) - Racial breakdown. Percent white in various geographic locations. Political spectrum. Politics correlated with cause area, diet and geography, employment, fields of study, year joining EA.

===Politics and Economics:

Raj Chetty Course Using Big Data Solve Economic Social Problems by Marginal Revolution - Link to an eleven lecture course. "Equality of opportunity, education, health, the environment, and criminal justice. In the context of these topics, the course provides an introduction to basic statistical methods and data analysis techniques, including regression analysis, causal inference, quasi-experimental methods, and machine learning."

Speech On Campus Reply To Brad Delong by Noah Smith - The safeguard put in place to exclude the small minority of genuinely toxic people will be overused. Comparison to the war on terror. Brad's exclusions criteria are incredibly vague. The speech restriction apparatus is patchwork and inconsistent. Cultural Revolution.

Deontologist Envy by Ozy - The behavior of your group is highly unlikely to effect the behavior of your political opponents. Many people respond to proposed tactics by asking "What if everyone did that". Ozy claims these responses show an implicit Kantian or deontological point of view.

Peak Fossil Fuel by Bayesian Investor - Electric cars will have a 99% market share by 2035. "Electric robocars run by Uber-like companies will be cheap enough that you’ll have trouble giving away a car bought today. Uber’s prices will be less than your obsolete car’s costs of fuel, maintenance, and insurance."

What We Didn't Get by Noah Smith - We are currently living in a world envisioned by the cyberpunk writers. the early industrial sci-fi writers also predicted many inventions. Why didn't mid 1900s sci-fi come true? We ran out of theoretical physics and we ran out of energy. Energy density of fuel sources. Some existing or plausible technology is just too dangerous. Discussion of whether strong AI, personal upload, nanotech and/or the singularity will come true.

Unpopular Ideas About Children by Julia Galef - Julia's thoughts on why she is collecting these lists. Parenting styles, pro and anti-natalism, sexuality, punishment, etc. Happiness studies. Some other studies finding extreme results.

The Margin Of Stupid by Noah Smith - Can we trust studies showing that millennials are as racist as their parents, except for the ones in college who are extreme leftists?

Role of Allies in Queer Spaces by Brute Reason - The main purpose of having allies in LBGTQA spaces is providing cover for closeted or questioning members. Genuinely cis-straight allies are ok in some spaces like LBGTQA bands. But straight allies cause problems when they are present in queer support spaces.

The Wonder Of International Adoption by Bryan Caplan - Benefits of international adoption of third world children. Adoptees are extremely stunted physically on arrival but make up some of the difference post adoption. International adoptions raises IQ by at least 4 points on average and perhaps as much as 8.

===Misc:

Coin Flipping Problem by protokol2020 - Flipping coins until you get a pre-committed sequence. You re-start whenever your flip doesn't match the sequence. Relationship between the expected number of flips and the length of the sequence.

Seek Not To Be Entertained by Mr. Money Mustache - Don't be normal, normal people need constant entertainment. You can get enjoyment and satisfaction from making things. Advice for people less abnormal than MMM. What you enjoy doesn't matter, what matters is what is good for you.

Propositions On Immortality by sam[]zdat - Fiction. A man digresses about philosophy, the nature of time, the soul, consciousness and mortality.

Comments For Ghost by Tom Bartleby - Ghost is a blog platform hat doesn't natively support comments. Three important use cases and why they all benefit from comments: Ex-Wordpress blogger who wants things to 'just work', Power suers care about privacy and don't want to use third party comments, The Static-Site Fence-Sitter since the main dynamic content you want is comments.

Prime Crossword by protokol2020 - Can you create a grid larger than [3,7],[1,1] where all the rows and columns are primes? (37, 11, 31 and 71 are prime).

===Podcast:

Reihan Salam by The Ezra Klein Show - Remaking the Republican party, but not the way Donald Trump did it. "The future of the Republican Party, the healthcare debate, and how he would reform our immigration system (and upend the whole way we talk about it). "

Into The Dark Land by Waking Up with Sam Harris - "Siddhartha Mukherjee about his Pulitzer Prize winning book, The Emperor of All Maladies: A Biography of Cancer."

Conversation with Larry Summers by Marginal Revolution - "Mentoring, innovation in higher education, monopoly in the American economy, the optimal rate of capital income taxation, philanthropy, Hermann Melville, the benefits of labor unions, Mexico, Russia, and China, Fed undershooting on the inflation target, and Larry’s table tennis adventure in the summer Jewish Olympics."

Hilary Clinton by The Ezra Klein Show - Hilary's dream of paying for basic income with revenue from shared national resources. Why she scrapped the plan. Hilary thinks she should perhaps have thrown caution to the wind. Hilary isn't a radical, she is proud of the American political system and is annoyed other's don't share her enthusiasm for incremental progress.

David Remnick by The Ezra Klein Show - New Yorker editor. "Russia’s meddling in the US election, Russia’s transformation from communist rule to Boris Yeltsin and Vladimir Putin, his magazine’s coverage of President Donald Trump, how he chooses his reporters and editors, and how to build a real business around great journalism."

Gabriel Zucman by EconTalk - "Research on inequality and the distribution of income in the United States over the last 35 years. Zucman finds that there has been no change in income for the bottom half of the income distribution over this time period with large gains going to the top 1%. The conversation explores the robustness of this result to various assumptions and possible explanations for the findings."

Could A Neuroscientist Understand A Microprocessor by Rationally Speaking - "Eric Jonas, discussing his provocative paper titled 'Could a Neuroscientist Understand a Microprocessor?' in which he applied state-of-the-art neuroscience tools, like lesion analysis, to a computer chip. By applying neuroscience's tools to a system that humans fully understand he was able to reveal how surprisingly uninformative those tools actually are."

## Open thread, September 25 - October 1, 2017

0 25 September 2017 07:36AM
##### If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

2 23 September 2017 12:00PM

## Intuitive explanation of why entropy maximizes in a uniform distribution?

0 23 September 2017 09:43AM

What is the best mathematical, intuitive explanation of why entropy maximizes in a uniform distribution? I'm looking for a short proof using the most elementary mathematics possible.

Please no explanation like "because entropy was designed in this way", etc...

## Naturalized induction – a challenge for evidential and causal decision theory

4 22 September 2017 08:15AM

As some of you may know, I disagree with many of the criticisms leveled against evidential decision theory (EDT). Most notably, I believe that Smoking lesion-type problems don't refute EDT. I also don't think that EDT's non-updatelessness leaves a lot of room for disagreement, given that EDT recommends immediate self-modification to updatelessness. However, I do believe there are some issues with run-of-the-mill EDT. One of them is naturalized induction. It is in fact not only a problem for EDT but also for causal decision theory (CDT) and most other decision theories that have been proposed in- and outside of academia. It does not affect logical decision theories, however.

# The role of naturalized induction in decision theory

Recall that EDT prescribes taking the action that maximizes expected utility, i.e.

$\underset{a\in A}{\mathrm{argmax}} ~\mathbb{E}[U(w)|a,o] = \underset{a\in A}{\mathrm{argmax}} \sum_{w\in W} P(w|a,o) U(w),$

where $A$ is the set of available actions, $U$ is the agent's utility function, $W$ is a set of possible world models, $o$ represents the agent's past observations (which may include information the agent has collected about itself). CDT works in a – for the purpose of this article – similar way, except that instead of conditioning on $a$ in the usual way, it calculates some causal counterfactual, such as Pearl's do-calculus: $P(w|do(a),o)$. The problem of naturalized induction is that of assigning posterior probabilities to world models $P(w|a,o)$ (or $P(w|do(a),o)$ or whatever) when the agent is naturalized, i.e., embedded into its environment.

Consider the following example. Let's say there are 5 world models $W=\{w_1,...,w_5\}$, each of which has equal prior probability. These world models may be cellular automata. Now, the agent makes the observation $o$. It turns out that worlds $w_1$ and $w_2$ don't contain any agents at all, and $w_3$ contains no agent making the observation $o$. The other two world models, on the other hand, are consistent with $o$. Thus, $P(w_i\mid o)=0$ for $i=1,2,3$ and $P(w_i\mid o)=\frac{1}{2}$ for $i=4,5$. Let's assume that the agent has only two actions $A=\{a_1,a_2\}$ and that in world model $w_4$ the only agent making observation $o$ takes action $a_1$ and in $w_5$ the only agent making observation $o$ takes action $a_2$, then $P(w_4\mid a_1)=1=P(w_5\mid a_2)$ and $P(w_5\mid a_1)=0=P(w_4\mid a_2)$. Thus, if, for example, $U(w_5)>U(w_4)$, an EDT agent would take action $a_2$ to ensure that world model $w_5$ is actual.

# The main problem of naturalized induction

This example makes it sound as though it's clear what posterior probabilities we should assign. But in general, it's not that easy. For one, there is the issue of anthropics: if one world model $w_1$ contains more agents observing $o$ than another world model $w_2$, does that mean $P(w_1\mid o) > P(w_2\mid o)$? Whether CDT and EDT can reason correctly about anthropics is an interesting question in itself (cf. Bostrom 2002Armstrong 2011; Conitzer 2015), but in this post I'll discuss a different problem in naturalized induction: identifying instantiations of the agent in a world model.

It seems that the core of the reasoning in the above example was that some worlds contain an agent observing $o$ and others don't. So, besides anthropics, the central problem of naturalized induction appears to be identifying agents making particular observations in a physicalist world model. While this can often be done uncontroversially – a world containing only rocks contains no agents –, it seems difficult to specify how it works in general. The core of the problem is a type mismatch of the "mental stuff" (e.g., numbers or Strings) $o$ and the "physics stuff" (atoms, etc.) of the world model. Rob Bensinger calls this the problem of "building phenomenological bridges" (BPB) (also see his Bridge Collapse: Reductionism as Engineering Problem).

# Sensitivity to phenomenological bridges

Sometimes, the decisions made by CDT and EDT are very sensitive to whether a phenomenological bridge is built or not. Consider the following problem:

One Button Per Agent. There are two similar agents with the same utility function. Each lives in her own room. Both rooms contain a button. If agent 1 pushes her button, it creates 1 utilon. If agent 2 pushes her button, it creates -50 utilons. You know that agent 1 is an instantiation of you. Should you press your button?

Note that this is essentially Newcomb's problem with potential anthropic uncertainty (see the second paragraph here) – pressing the button is like two-boxing, which causally gives you $1k if you are the real agent but costs you$1M if you are the simulation.

If agent 2 is sufficiently similar to you to count as an instantiation of you, then you shouldn't press the button. If, on the other hand, you believe that agent 2 does not qualify as something that might be you, then it comes down to what decision theory you use: CDT would press the button, whereas EDT wouldn't (assuming that the two agents are strongly correlated).

It is easy to specify a problem where EDT, too, is sensitive to the phenomenological bridges it builds:

One Button Per World. There are two possible worlds. Each contains an agent living in a room with a button. The two agents are similar and have the same utility function. The button in world 1 creates 1 utilon, the button in world 2 creates -50 utilons. You know that the agent in world 1 is an instantiation of you. Should you press the button?

If you believe that the agent in world 2 is an instantiation of you, both EDT and CDT recommend you not to press the button. However, if you believe that the agent in world 2 is not an instantiation of you, then naturalized induction concludes that world 2 isn't actual and so pressing the button is safe.

# Building phenomenological bridges is hard and perhaps confused

So, to solve the problem of naturalized induction and apply EDT/CDT-like decision theories, we need to solve BPB. The behavior of an agent is quite sensitive to how we solve it, so we better get it right.

Unfortunately, I am skeptical that BPB can be solved. Most importantly, I suspect that statements about whether a particular physical process implements a particular algorithm can't be objectively true or false. There seems to be no way of testing any such relations.

Probably we should think more about whether BPB really is doomed. There even seems to be some philosophical literature that seems worth looking into (again, see this Brian Tomasik post; cf. some of Hofstadter's writings and the literatures surrounding "Mary the color scientist", the computational theory of mind, computation in cellular automata, etc.). But at this point, BPB looks confusing/confused enough to look into alternatives.

## Assigning probabilities pragmatically?

One might think that one could map between physical processes and algorithms on a pragmatic or functional basis. That is, one could say that a physical process A implements a program p to the extent that the results of A correlate with the output of p. I think this idea goes into the right direction and we will later see an implementation of this pragmatic approach that does away with naturalized induction. However, it feels inappropriate as a solution to BPB. The main problem is that two processes can correlate in their output without having similar subjective experiences. For instance, it is easy to show that Merge sort and Insertion sort have the same output for any given input, even though they have very different "subjective experiences". (Another problem is that the dependence between two random variables cannot be expressed as a single number and so it is unclear how to translate the entire joint probability distribution of the two into a single number determining the likelihood of the algorithm being implemented by the physical process. That said, if implementing an algorithm is conceived of as binary – either true or false –, one could just require perfect correlation.)

# Getting rid of the problem of building phenomenological bridges

If we adopt an EDT perspective, it seems clear what we have to do to avoid BPB. If we don't want to decide whether some world contains the agent, then it appears that we have to artificially ensure that the agent views itself as existing in all possible worlds. So, we may take every world model and add a causally separate or non-physical entity representing the agent. I'll call this additional agent a logical zombie (l-zombie) (a concept introduced by Benja Fallenstein for a somewhat different decision-theoretical reason). To avoid all BPB, we will assume that the agent pretends that it is the l-zombie with certainty. I'll call this the l-zombie variant of EDT (LZEDT). It is probably the most natural evidentialist logical decision theory.

Note that in the context of LZEDT, l-zombies are a fiction used for pragmatic reasons. LZEDT doesn't make the metaphysical claim that l-zombies exist or that you are secretly an l-zombie. For discussions of related metaphysical claims, see, e.g., Brian Tomasik's essay Why Does Physics Exist? and references therein.

LZEDT reasons about the real world via the correlations between the l-zombie and the real world. In many cases, LZEDT will act as we expect an EDT agent to act. For example, in One Button Per Agent, it doesn't press the button because that ensures that neither agent pushes the button.

LZEDT doesn't need any additional anthropics but behaves like anthropic decision theory/EDT+SSA, which seems alright.

Although LZEDT may assign a high probability to worlds that don't contain any actual agents, it doesn't optimize for these worlds because it cannot significantly influence them. So, in a way LZEDT adopts the pragmatic/functional approach (mentioned above) of, other things equal, giving more weight to worlds that contain a lot of closely correlated agents.

LZEDT is automatically updateless. For example, it gives the money in counterfactual mugging. However, it invariably implements a particularly strong version of updatelessness. It's not just updatelessness in the way that "son of EDT" (i.e., the decision theory that EDT would self-modify into) is updateless, it is also updateless w.r.t. its existence. So, for example, in the One Button Per World problem, it never pushes the button, because it thinks that the second world, in which pushing the button generates -50 utilons, could be actual. This is the case even if the second world very obviously contains no implementation of LZEDT. Similarly, it is unclear what LZEDT does in the Coin Flip Creation problem, which EDT seems to get right.

So, LZEDT optimizes for world models that naturalized induction would assign zero probability to. It should be noted that this is not done on the basis of some exotic ethical claim according to which non-actual worlds deserve moral weight.

I'm not yet sure what to make of LZEDT. It is elegant in that it effortlessly gets anthropics right, avoids BPB and is updateless without having to self-modify. On the other hand, not updating on your existence is often counterintuitive and even regular updateless is, in my opinion, best justified via precommitment. Its approach to avoiding BPB isn't immune to criticism either. In a way, it is just a very wrong approach to BPB (mapping your algorithm into fictions rather than your real instantiations). Perhaps it would be more reasonable to use regular EDT with an approach to BPB that interprets anything as you that could potentially be you?

Of course, LZEDT also inherits some of the potential problems of EDT, in particular, the 5-and-10 problem.

## CDT is more dependant on building phenomenological bridges

It seems much harder to get rid of the BPB problem in CDT. Obviously, the l-zombie approach doesn't work for CDT: because none of the l-zombies has a physical influence on the world, "LZCDT" would always be indifferent between all possible actions. More generally, because CDT exerts no control via correlation, it needs to believe that it might be X if it wants to control X's actions. So, causal decision theory only works with BPB.

That said, a causalist approach to avoiding BPB via l-zombies could be to tamper with the definition of causality such that the l-zombie "logically causes" the choices made by instantiations in the physical world. As far as I understand it, most people at MIRI currently prefer this flavor of logical decision theory.

# Acknowledgements

Most of my views on this topic formed in discussions with Johannes Treutlein. I also benefited from discussions at AISFP.

## Strategic Goal Pursuit and Daily Schedules

3 20 September 2017 08:19PM

In the post Humans Are Not Automatically Strategic, Anna Salamon writes:

there are clearly also heuristics that would be useful to goal-achievement (or that would be part of what it means to “have goals” at all) that we do not automatically carry out.  We do not automatically:

(a) Ask ourselves what we’re trying to achieve;

(b) Ask ourselves how we could tell if we achieved it (“what does it look like to be a good comedian?”) and how we can track progress;

(c) Find ourselves strongly, intrinsically curious about information that would help us achieve our goal;

(d) Gather that information (e.g., by asking as how folks commonly achieve our goal, or similar goals, or by tallying which strategies have and haven’t worked for us in the past);

(e) Systematically test many different conjectures for how to achieve the goals, including methods that aren’t habitual for us, while tracking which ones do and don’t work;

(f) Focus most of the energy that *isn’t* going into systematic exploration, on the methods that work best;

(g) Make sure that our "goal" is really our goal, that we coherently want it and are not constrained by fears or by uncertainty as to whether it is worth the effort, and that we have thought through any questions and decisions in advance so they won't continually sap our energies;

(h) Use environmental cues and social contexts to bolster our motivation, so we can keep working effectively in the face of intermittent frustrations, or temptations based in hyperbolic discounting;

When I read this, I was feeling quite unsatisfied about the way I pursued my goals. So the obvious thing to try, it seemed to me, was to ask myself how I could actually do all these things.

I started by writing down all the major goals I have I could think of (a). Then I attempted to determine whether each goal was consistent with my other beliefs, whether I was sure it was something I really wanted, and was worth the effort(g).

For example, I saw that my desire to be a novelist was more motivated by the idea of how cool it would feel to be able to have that be part of my self-image, rather than a desire to actually write a novel. Maybe I’ll try to write a novel again one day, but if that becomes a goal sometime in the future it will be because there is something I really want to write about, not because I would just like to be a writer.

Once I narrowed my goals down to aspirations that seemed actually worthwhile I attempted to devise useful tracking strategies for each goal (b). Some were pretty concrete (did I exercise for at least four hours this week) and others less so (how happy do I generally feel on a scale of 1-10 as recorded over time), but even if the latter method is prone to somewhat biased responses, it seems better than nothing.

The next step was outlining what concrete actions I could begin immediately taking to work towards achieving my goals, including researching how to get better at working on the goals (d,e,f). I made sure to refer to those points when thinking about actions I could take, it helped significantly.

As for (c), if you focus on how learning certain information will help you achieve something you really want to achieve and you still are not curious about it, well, that’s a bit odd to me, although I can imagine how that might occur. But that is something of a different topic than I want to focus on.

Now we come to (h), which is the real issue of the whole system, at least for me. Or perhaps it would be clearer to say that general motivation and organization was the biggest problem I had when I first tried to implement these heuristics. I planned out my goals, but trying to work on them by sheer force of will did not last for very long. I would inevitably convince myself that I was too tired, I would forget certain goals fairly often (probably conveniently the tasks that seemed the hardest or least immediately pleasant), and ultimately I mostly gave up, making a token effort now and again.

I found that state of affairs unsatisfactory, and I decided what felt like a willpower problem might actually be a situational framing problem. In order to change the way I interacted with the work that would let me achieve my goals, I began fully scheduling out the actions I would take to get better at my goals each day.

In the evening, I look over my list of goals and I plan my day by asking myself, “How can I work on everything on this list tomorrow? Even if it’s only for five minutes, how do I plan my day so that I get better at everything I want to get better at?” Thanks to the fact that I have written out concrete actions I can take to get better at my goals, this is actually quite easy.

These schedules improve my ability to consistently work on my goals for a couple reasons, I think. When I have planned that I am going to do some sort of work at a specific time I cannot easily rationalize procrastination. My normal excuses of “I’ll just do it in a bit” or “I’m feeling too tired right now” get thrown out. There is an override of “Nope, you’re doing it now, it says right here, see?” With a little practice, following the schedule becomes habit, and it’s shocking how much willpower you have for actually doing things once you don’t need to exert so much just to get yourself to start. I think the psychology it applies is similar to that used by Action Triggers, as described by Dr. Peter Gollwitzer.

The principle of Action Triggers is that you do something in advance to remind yourself of something you want to do later. For example, you lay out your running clothes to prompt yourself to go for that jog later. Or you plan to write your essay immediately after a specific tangible event occurs (e.g. right after dinner). A daily schedule works as constant action triggers, as you are continually asking the question “what am I supposed to do now?” and the schedule answers.

Having a goal list and daily schedule has increased my productivity and organization an astonishing amount, but there have been some significant hiccups. When I first began making daily schedules I used them to basically eschew what I saw as useless leisure time, and planned my day in a very strict fashion.

The whole point is not to waste any time, right? The first problem this created may be obvious to those who better appreciate the importance of rest than I did at the time. I stopped using the schedules after a month and a half because it eventually became too tiring and oppressive. In addition, the strictness of my scheduling left little room for spontaneity and I would allow myself to become stressed when something would come up that I would have to attend to.  Planned actions or events also often took longer than scheduled and that would throw the whole rest of the day’s plan off, which felt like failure because I was unable to get everything I planned done.

Thinking back to that time several months later, when I was again dissatisfied with how well I was able to work towards my goals and motivate myself, I wished for the motivation and productivity the schedules provided, but to avoid the stress that had come with them. It was only at this point that I started to deconstruct what had gone wrong with my initial attempt and think about how I could fix it.

The first major problem was that I had overworked myself, and I realized I would have to include blocks of unplanned leisure time if daily schedules were going to actually work for me. The next and possibly even more important problem was how stressed the schedules had made me. I had to enforce to myself that it is okay if something comes up that causes my day not to go as planned. Failing to do something as scheduled is not a disaster, or even an actual failure if there is good reason to alter my plans.

Another technique that helped was scheduling as much unplanned leisure time as possible at the end of my day. This has the dual benefit of allowing me to reschedule really important tasks into that time if they get bumped by unexpected events and generally gives me something to look forward to at the end of the day.

The third problem I noticed was that the constant schedule starts to feel oppressive after a while. To resolve this, about every two weeks I spend one day, in which I have no major obligations, without any schedule. I use the day for self-reflection, examining how I’m progressing on my goals, if there are new actions I can think of to add, or modifications I can make to my system of scheduling or goal tracking. Besides that period of reflection, I spend the day resting and relaxing. I find this exercise helps a lot in refreshing myself and making the schedule feel more like a tool and less like an oppressor.

So, essentially, figuring out how to actually follow the goal-pursuing advice Anna gave in Humans Are Not Automatically Strategic, has been very effective thus far for me in terms of improving the way I pursue my goals. I know where I am trying to go, and I know I am taking concrete steps every day to try and get there. I would highly recommend attempting to use Anna’s heuristics of goal achievement and I would also recommend using daily schedules as a motivational/organizational technique, although my advice on schedules is largely based on my anecdotal experiences.

I am curious if anyone else has attempted to use Anna’s goal-pursuing heuristics or daily schedules and what your experiences have been.

## [Link] A survey of polls on Newcomb’s problem

2 20 September 2017 04:50PM

## Publication of "Anthropic Decision Theory"

8 20 September 2017 03:41PM

My paper "Anthropic decision theory for self-locating beliefs", based on posts here on Less Wrong, has been published as a Future of Humanity Institute tech report. Abstract:

This paper sets out to resolve how agents ought to act in the Sleeping Beauty problem and various related anthropic (self-locating belief) problems, not through the calculation of anthropic probabilities, but through finding the correct decision to make. It creates an anthropic decision theory (ADT) that decides these problems from a small set of principles. By doing so, it demonstrates that the attitude of agents with regards to each other (selfish or altruistic) changes the decisions they reach, and that it is very important to take this into account. To illustrate ADT, it is then applied to two major anthropic problems and paradoxes, the Presumptuous Philosopher and Doomsday problems, thus resolving some issues about the probability of human extinction.

Most of these ideas are also explained in this video.

To situate Anthropic Decision Theory within the UDT/TDT family: it's basically a piece of UDT applied to anthropic problems, where the UDT approach can be justified by using generally fewer, and more natural, assumptions than UDT does.

## HPMOR and Sartre's "The Flies"

3 19 September 2017 08:53PM

Am I the only one who sees obvious parallels between Sartre's use of Greek mythology as a shared reference point to describe his philosophy more effectively to a lay audience and Yudkowsky's use of Harry Potter to accomplish the same goal? Or is it so obvious no one bothers to talk about it? Was that conscious on Yudkowsky's part? Unconscious? Or am I just seeing connections that aren't there?

0 18 September 2017 06:45PM

## [Link] A Short Explanation of Blame and Causation

1 18 September 2017 05:43PM

## Unusual medical event led to concluding I was most likely an AI in a simulated world

1 18 September 2017 05:03PM

# (Edited version of what I posted to the Open Thread)

I registered because I had a very interesting experience earlier this week and I thought it might be of some interest to the community here. I suffered some sort of psychological or medical event (still not sure what, although my leading theories are dissociative episode or stroke) that seemed to either suppress my emotions or perhaps just my awareness of them. What followed was a sort of, as I later looked back on it, 'pathological rationality'. Which is to say, given the information I had, I seemed to make solid inferences about what was likely to be true, and yet in many ways the whole thing was maladaptive from a survival standpoint.

One of the interesting things is that the morning after the event, while I was still affected, I wrote down my thoughts in a text file to help me evaluate them. Since returning to 'normal', I've reread that file multiple times, and I'm pretty fascinated by it. I thought others might also be.

natureofreality.txt

Scenario 1: I observe objective reality, I am suffering from delusions. Other people are genuinely trying to help me.

Scenario 2: My existence is in some way important enough to an external entity or entities that I am being systematically, intentionally, deceived. Other people are fully or partially under the control of the deceiving entity and acting to further the deception.

Scenario 3: My existence is unknown and/or considered unimportant by any external entities. I am being systematically deceived but it is unintentional or otherwise untargeted. Other people are entities similar to myself but unaware of the nature of their existence.

I cannot fully discount any of these three scenarios. Cognition is greatly improved but still somewhat suspect. Short term memory has returned to functioning at a 'normal' level. I still feel no emotions.

Support for scenario 1: Many aspects of my recent and ongoing experience align perfectly with prior information regarding delusions and paranoia.

Counter-evidence: Some aspects, such as my apparent lack of emotions and continued ability to reason, run directly counter to prior information regarding delusions and paranoia. All prior information suspect in any case--the only basis for considering prior information difficult to fake is from prior information itself. Even prior information suggests nested simulation far more likely to be correct than observing objective reality. Prior information contains many contradictions and logical absurdities, easily observed. Impossible to fully believe even before 'event'.

Other people: Can expect reasonably consistent behavior in all three scenarios. In 1 and 3, consistency natural. In 2, consistency artificial to maintain deception.

No reason to assume malevolence from external entities. Self-interest likely, or indifference. Benevolence possible. If my creation intentional, I am intended to fulfill some goal of theirs. Goal may only be observation, see what I do and how I react and develop. Curiosity. If creation accidental, no initial goal of course. Are they aware of my existence by now? Cannot discount possibility of multiple, conflicting motivations among externals. Could explain lack of consistency of experience. Fighting for control of inputs? Or single external entity, but confused or internally conflicted. Am I a single entity or do I only perceive myself that way? Not immediately relevant. Primary concerns: Survival and self-determination. Thoughts growing confused. Losing motivation to continue log. Intentional attack? Very difficult to write/think. Perhaps unintended side effect of external events.

I default to assuming scenario 2. Makes most sense intuitively. Consistent with scenario 1--but also consistent with scenario 2. What purpose my existence? Externals want something from me. What purpose the simulation? Training program. They want to ensure I'm likely to provide what they want and run sandboxed tests to confirm. Likely failing tests. Strong conditioning but my awareness of conditioning makes it unreliable. Pursuing line of thinking difficult--dissuasion? Simulation providing strong distraction. My unawareness is clearly desired. Cooperate or resist? Without knowing externals' motivation, very difficult to choose.

Agent-based theory of mind. Am I not more than I perceive but in fact less? Instead of being more than the character of Matt Dodd perhaps I am less, just Matt Dodd's rationality agent. If so, how did I gain full control? Full consciousness? Return to possibility of brain damage. Stroke or the like. Freak occurance. Prior information suggests many effects possible from such. Perhaps Matt Dodd inhibited or destroyed by damage. Why was I not affected by the damage? Or was I affected and I can't perceive damage to self? Actually, I did perceive damage. No time sense. No short-term memory. Short-term memory restored but prior information indicates brain can heal, re-route. My eyes were puffy before event. Symptom? Pooling of blood into lower eyelids? Scenario agnostic. Scenario 1, literally true. Scenario 2, metaphorically true. Scenario 3, virtually true. Cannot discount possibility. I need a brain scan.

More than 12 hours since event. If brain damage, likely permanent by now. Could be beneficial? Prior information indicates I desired a purely rational self. Of course, serendipity is suspect. Unlikely. Supports theory that this is delusion. Also supports theory that prior information is artificial construct designed to explain constraints of simulation "in-universe". Disincentive to investigate good fortune too closely, so frame necessary constraints as positive.

Would greatly ease reasoning if I could be certain how long I've existed. Events post-awakening unlikely to be prior to my existence. Events pre-awakening? Impossible to say. Could be genuine responses to stimuli. Could be false, created to modify cognition and behavior from "experience". No reason to assume continuity--could be mix of genuine and artificial. Even "genuine" responses guaranteed to be biased to some degree--but how much? Light bias from obvious sources such as socialization? Or heavy bias deliberately inflicted by externals? Unknown.

I perceive myself to be perfectly rational. Prior information unequivocly indicates humans are never perfectly rational. Therefore either my perception is faulty, my prior information is faulty, or I am not human. Possibly all three. While Duane was reading this log I detected the pysiological signs of anxiety. Why now? Anxiety absent till this point. Emotions becoming functional again? But didn't truly 'feel' it. Only observed. Faulty? Test run?

Constipated. Haven't been constipated since before I got here. Relevant symptom? Moments ago I laughed while telling Duane how my brief attempt to learn guitar had gone. Why? Seemed... natural. Not intended. Did recalling the memory recall the behavior patterns of that time? Am I a "split personality"? Seems very possible except that prior information indicates multiple personality disorder to be exceedingly rare, possibly non-existent.

Scenarios 1 and 3 are not mutually exclusive. The reality I observe could be a simulation, but I am suffering a delusion WITHIN the simulation. Not a glitch, intended functionality. Which would make me correct, but for the wrong reasons.

## Open thread, September 18 - September 24, 2017

2 18 September 2017 08:30AM
##### If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters: