Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

New LW Meetup: Christchurch NZ

1 FrankAdamek 19 April 2014 01:57AM

AI risk, new executive summary

6 Stuart_Armstrong 18 April 2014 10:45AM

Thanks for all who commented on the previous AI risk executive summary! Here is the currently final version, taking the comments into account; feel free to link to it, make use of it, copy it, mock it and modify it.


AI risk

Bullet points

  • By all indications, an Artificial Intelligence could someday exceed human intelligence.
  • Such an AI would likely become extremely intelligent, and thus extremely powerful.
  • Most AI motivations and goals become dangerous when the AI becomes powerful.
  • It is very challenging to program an AI with fully safe goals, and an intelligent AI would likely not interpret ambiguous goals in a safe way.
  • A dangerous AI would be motivated to seem safe in any controlled training setting.
  • Not enough effort is currently being put into designing safe AIs.


Executive summary

The risks from artificial intelligence (AI) in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but extreme intelligence isn’t one of them. And it is precisely extreme intelligence that would give an AI its power, and hence make it dangerous.

The human brain is not much bigger than that of a chimpanzee. And yet those extra neurons account for the difference of outcomes between the two species: between a population of a few hundred thousand and basic wooden tools, versus a population of several billion and heavy industry. The human brain has allowed us to spread across the surface of the world, land on the moon, develop nuclear weapons, and coordinate to form effective groups with millions of members. It has granted us such power over the natural world that the survival of many other species is no longer determined by their own efforts, but by preservation decisions made by humans.

In the last sixty years, human intelligence has been further augmented by automation: by computers and programmes of steadily increasing ability. These have taken over tasks formerly performed by the human brain, from multiplication through weather modelling to driving cars. The powers and abilities of our species have increased steadily as computers have extended our intelligence in this way. There are great uncertainties over the timeline, but future AIs could reach human intelligence and beyond. If so, should we expect their power to follow the same trend? When the AI’s intelligence is as beyond us as we are beyond chimpanzees, would it dominate us as thoroughly as we dominate the great apes?

There are more direct reasons to suspect that a true AI would be both smart and powerful. When computers gain the ability to perform tasks at the human level, they tend to very quickly become much better than us. No-one today would think it sensible to pit the best human mind again a cheap pocket calculator in a contest of long division. Human versus computer chess matches ceased to be interesting a decade ago. Computers bring relentless focus, patience, processing speed, and memory: once their software becomes advanced enough to compete equally with humans, these features often ensure that they swiftly become much better than any human, with increasing computer power further widening the gap.

The AI could also make use of its unique, non-human architecture. If it existed as pure software, it could copy itself many times, training each copy at accelerated computer speed, and network those copies together (creating a kind of “super-committee” of the AI equivalents of, say, Edison, Bill Clinton, Plato, Einstein, Caesar, Spielberg, Ford, Steve Jobs, Buddha, Napoleon and other humans superlative in their respective skill-sets). It could continue copying itself without limit, creating millions or billions of copies, if it needed large numbers of brains to brute-force a solution to any particular problem.

Our society is setup to magnify the potential of such an entity, providing many routes to great power. If it could predict the stock market efficiently, it could accumulate vast wealth. If it was efficient at advice and social manipulation, it could create a personal assistant for every human being, manipulating the planet one human at a time. It could also replace almost every worker in the service sector. If it was efficient at running economies, it could offer its services doing so, gradually making us completely dependent on it. If it was skilled at hacking, it could take over most of the world’s computers and copy itself into them, using them to continue further hacking and computer takeover (and, incidentally, making itself almost impossible to destroy). The paths from AI intelligence to great AI power are many and varied, and it isn’t hard to imagine new ones.

Of course, simply because an AI could be extremely powerful, does not mean that it need be dangerous: its goals need not be negative. But most goals become dangerous when an AI becomes powerful. Consider a spam filter that became intelligent. Its task is to cut down on the number of spam messages that people receive. With great power, one solution to this requirement is to arrange to have all spammers killed. Or to shut down the internet. Or to have everyone killed. Or imagine an AI dedicated to increasing human happiness, as measured by the results of surveys, or by some biochemical marker in their brain. The most efficient way of doing this is to publicly execute anyone who marks themselves as unhappy on their survey, or to forcibly inject everyone with that biochemical marker.

This is a general feature of AI motivations: goals that seem safe for a weak or controlled AI, can lead to extremely pathological behaviour if the AI becomes powerful. As the AI gains in power, it becomes more and more important that its goals be fully compatible with human flourishing, or the AI could enact a pathological solution rather than one that we intended. Humans don’t expect this kind of behaviour, because our goals include a lot of implicit information, and we take “filter out the spam” to include “and don’t kill everyone in the world”, without having to articulate it. But the AI might be an extremely alien mind: we cannot anthropomorphise it, or expect it to interpret things the way we would. We have to articulate all the implicit limitations. Which may mean coming up with a solution to, say, human value and flourishing – a task philosophers have been failing at for millennia – and cast it unambiguously and without error into computer code.

Note that the AI may have a perfect understanding that when we programmed in “filter out the spam”, we implicitly meant “don’t kill everyone in the world”. But the AI has no motivation to go along with the spirit of the law: its goals are the letter only, the bit we actually programmed into it. Another worrying feature is that the AI would be motivated to hide its pathological tendencies as long as it is weak, and assure us that all was well, through anything it says or does. This is because it will never be able to achieve its goals if it is turned off, so it must lie and play nice to get anywhere. Only when we can no longer control it, would it be willing to act openly on its true goals – we can nut hope these turn out safe.

It is not certain that AIs could become so powerful, nor is it certain that a powerful AI would become dangerous. Nevertheless, the probabilities of either are high enough that the risk cannot be dismissed. The main focus of AI research today is creating an AI; much more work needs to be done on creating it safely. Some are already working on this problem (such as the Future of Humanity Institute and the Machine Intelligence Research Institute), but a lot remains to be done, both at the design and at the policy level.

Botworld: a cellular automaton for studying self-modifying agents embedded in their environment

45 So8res 12 April 2014 12:56AM

On April 1, I started working full-time for MIRI. In the weeks prior, while I was winding down my job and packing up my things, Benja and I built Botworld, a cellular automaton that we've been using to help us study self-modifying agents. Today, we're publicly releasing Botworld on the new MIRI github page. To give you a feel for Botworld, I've reproduced the beginning of the technical report below.

continue reading »

Schelling Day 2.0

12 Ben_LandauTaylor 09 April 2014 06:58AM

Schelling Day is a holiday about getting to know the people in your community that we created and celebrated in 2013. By popular request, I've revised the procedure to take into account what we learned last year. I'm aware of plans to hold Schelling Day in Boston, New York, and San Francisco on April 16. (Not 14, because of the conflict with Passover, which is also a major community event for many people.) I'd love to know of any additional celebrations that you guys hold.

Last year's event played a part in the Boston group’s development into a closer and more caring community. Sharing the things you want to share, and receiving compassion and understanding from the group, turns out to be extremely powerful evidence that it’s safe to share important things with the group—and as it turns out, brains update if you give them good evidence.

Schelling Day

If necessary, split into groups of no more than 10-15 people. Each group gathers and sits in a circle. At the center is a table. On the table are four small bowls of delicious snacks. Eating the delicious snacks at this stage is VERBOTEN. There is also a single large, empty bowl.

Everyone will have a six-sided die.

Everyone will have a chance to speak, or to not speak. When it’s your turn, roll your die. Showing the result to others is VERBOTEN.

If your die shows a six, you MUST speak. If your die shows a one, you MUST NOT speak. Otherwise, you choose whether or not to speak. The die is to provide plausible deniability. Attempting to guess whether someone’s decision was forced by the die roll is VERBOTEN. 

If you speak, take up to five minutes[1] to tell the group something important about yourself. Then, choose at least one of the categories below that matches what you said. Scoop a small amount of the corresponding delicious snack from the small bowls into the central bowl. (If you want, you can use these categories for inspiration, but don’t let them restrict you from saying something that matters.)


Struggles (Chocolate):

Challenges, burdens, things you’re tired of hiding, etc.


Joys (Raspberries):

Passions, guilty pleasures, “I love you guys” speeches, etc.


Background (Grapes):

Who you are, where you came from, why you are the way you are, etc.


Other (Blueberries):

Because trying to make an exhaustive list would be silly.


People in the group now have an opportunity to empathize. This is not a time to offer suggestions or critique; this is a time to connect with another human’s emotions.[2]  The speaker can choose to agree with or to correct people’s perceptions, if they wish. Keep reactions brief and focused on the speaker’s experience. Try not to have more than 2-3 reactions per speaker.

After the group’s reactions, or after you choose not to speak, the person to your left rolls their die and the process repeats.

Once everyone has had a chance to speak or not, the round is over. Shake hands with the people on either side of you and take five minutes to stretch. Then do the same thing again, beginning across the circle from where the previous round started. (e.g., if there are ten people, then start with the person who spoke fifth or sixth last time.)

After that, take five minutes to stretch, then begin the BONUS ROUND.

The BONUS ROUND is like the first two rounds, with one exception. If you haven’t spoken yet, do not roll your die. You MUST speak.

When the BONUS ROUND finishes, pass around the bowl of snacks assembled from the accumulated revelations and eat them. As this is happening, people will talk about how they felt during the ritual and how they feel at this moment. Once people have shared their reactions, or once all the snacks are eaten, Schelling Day is over. There is one final group hug, and then everyone goes home.[3]

[1] The facilitator will use a timer. We’re not trying to be jerks, but we want to keep things moving.


[2] If you’re familiar with Nonviolent Communication (NVC), that will give you a sense of what to do here. Some templates you might use:

“When you said that [repetition of what they said] I imagined that you were [guessed feeling] because you want [guessed need].” E.g., “When you said that were struggling to make it, I imagined that you feel desperate because you want stability and security.”

“When you were talking, I noticed that [observation of what you noticed them do] and I sensed that you were [guessed feeling] because you long for [guessed need].” E.g., “When you were talking, I noticed that you were rocking slightly back and forth, and I sensed that you had a lot of contained frustration inside you. I imagine the frustration comes from that you want help and you aren't getting that, and it's contained maybe because you fear that lashing out will make things worse.”


[3] Hanging out after the hug is VERBOTEN—remember the peak-end rule! If you want to eat a meal together, you could do it before the event starts. (Potlucks are good, since people get to visibly contribute to the group.) If you absolutely must do something with the same people, then do it in a different location. Convince your System 1 that Schelling Day is over, and now you’re doing something else.

Rationality Quotes April 2014

6 elharo 07 April 2014 05:25PM

Another month has passed and here is a new rationality quotes thread. The usual rules are:

  • Please post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
  • Do not quote yourself.
  • Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
  • No more than 5 quotes per person per monthly thread, please.

And one new rule:

  • Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.

Siren worlds and the perils of over-optimised search

23 Stuart_Armstrong 07 April 2014 11:00AM

tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.


The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

continue reading »

AI risk, executive summary

10 Stuart_Armstrong 07 April 2014 10:33AM

MIRI recently published "Smarter than Us", a 50 page booklet laying out the case for considering AI as an existential risk. But many people have asked for a shorter summary, to be handed out to journalists for example. So I put together the following 2-page text, and would like your opinion on it.

In this post, I'm not so much looking for comments along the lines of "your arguments are wrong", but more "this is an incorrect summary of MIRI/FHI's position" or "your rhetoric is infective here".

AI risk

Bullet points

  • The risks of artificial intelligence are strongly tied with the AI’s intelligence.
  • There are reasons to suspect a true AI could become extremely smart and powerful.
  • Most AI motivations and goals become dangerous when the AI becomes powerful.
  • It is very challenging to program an AI with safe motivations.
  • Mere intelligence is not a guarantee of safe interpretation of its goals.
  • A dangerous AI will be motivated to seem safe in any controlled training setting.
  • Not enough effort is currently being put into designing safe AIs.

Executive summary

The risks from artificial intelligence (AI) in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but extreme intelligence isn’t one of them. And it is precisely extreme intelligence that would give an AI its power, and hence make it dangerous.

continue reading »

Effective Altruism Summit 2014

16 Ben_LandauTaylor 21 March 2014 08:30PM

In 2013, the Effective Altruism movement came together for a week-long Summit in the San Francisco Bay Area. Attendees included leaders and members from all the major effective altruist organizations, as well as effective altruists not affiliated with any organization. People shared strategies, techniques, and projects, and left more inspired and more effective than when they arrived.

More than ever, rationality and existential risk reduction are part of the Effective Altruism movement, and so I'm glad to announce to LessWrong the 2014 Effective Altruism Summit.

Following last year’s success, this year’s Effective Altruism Summit will comprise two events. The Summit will be a conference-style event held on the weekend of August 2-3, followed by a smaller Effective Altruism Retreat from August 4-9. To accommodate our expanding movement and its many new projects, this year’s Summit will be bigger than the last. The Retreat will be similar to last year’s EA Summit, providing a more intimate setting for attendees to discuss, to learn, and to form lasting connections with each other and with the community.

We’re now accepting applications for the 2014 events. Whether you’re a veteran organizer trying to keep up with Effective Altruism’s most exciting developments, or you're looking to get involved with a community of people who use rationality to improve the world, we’d love for you to join us.

To what extent does improved rationality lead to effective altruism?

9 JonahSinick 20 March 2014 07:08AM

It's been claimed that increasing rationality increases effective altruism. I think that this is true, but the effect size is unclear to me, so it seems worth exploring how strong the evidence for it is. I've offered some general considerations below, followed by a description of my own experience. I'd very much welcome thoughts on the effect that rationality has had on your own altruistic activities (and any other relevant thoughts).

continue reading »

The Problem with AIXI

23 RobbBB 18 March 2014 01:55AM

Followup toSolomonoff CartesianismMy Kind of Reflection

Alternate versions: Shorter, without illustrations


AIXI is Marcus Hutter's definition of an agent that follows Solomonoff's method for constructing and assigning priors to hypotheses; updates to promote hypotheses consistent with observations and associated rewards; and outputs the action with the highest expected reward under its new probability distribution. AIXI is one of the most productive pieces of AI exploratory engineering produced in recent years, and has added quite a bit of rigor and precision to the AGI conversation. Its promising features have even led AIXI researchers to characterize it as an optimal and universal mathematical solution to the AGI problem.1

Eliezer Yudkowsky has argued in response that AIXI isn't a suitable ideal to build toward, primarily because of AIXI's reliance on Solomonoff induction. Solomonoff inductors treat the world as a sort of qualia factory, a complicated mechanism that outputs experiences for the inductor.2 Their hypothesis space tacitly assumes a Cartesian barrier separating the inductor's cognition from the hypothesized programs generating the perceptions. Through that barrier, only sensory bits and action bits can pass.

Real agents, on the other hand, will be in the world they're trying to learn about. A computable approximation of AIXI, like AIXItl, would be a physical object. Its environment would affect it in unseen and sometimes drastic ways; and it would have involuntary effects on its environment, and on itself. Solomonoff induction doesn't appear to be a viable conceptual foundation for artificial intelligence — not because it's an uncomputable idealization, but because it's Cartesian.

In my last post, I briefly cited three indirect indicators of AIXI's Cartesianism: immortalism, preference solipsism, and lack of self-improvement. However, I didn't do much to establish that these are deep problems for Solomonoff inductors, ones resistant to the most obvious patches one could construct. I'll do that here, in mock-dialogue form.

continue reading »

View more: Next