Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] The Leverhulme Centre for the Future of Intelligence officially launches.

1 ignoranceprior 21 October 2016 01:22AM

UC Berkeley launches Center for Human-Compatible Artificial Intelligence

10 ignoranceprior 29 August 2016 10:43PM

Source article: http://news.berkeley.edu/2016/08/29/center-for-human-compatible-artificial-intelligence/

UC Berkeley artificial intelligence (AI) expert Stuart Russell will lead a new Center for Human-Compatible Artificial Intelligence, launched this week.

Russell, a UC Berkeley professor of electrical engineering and computer sciences and the Smith-Zadeh Professor in Engineering, is co-author of Artificial Intelligence: A Modern Approach, which is considered the standard text in the field of artificial intelligence, and has been an advocate for incorporating human values into the design of AI.

The primary focus of the new center is to ensure that AI systems are beneficial to humans, he said.

The co-principal investigators for the new center include computer scientists Pieter Abbeel and Anca Dragan and cognitive scientist Tom Griffiths, all from UC Berkeley; computer scientists Bart Selman and Joseph Halpern, from Cornell University; and AI experts Michael Wellman and Satinder Singh Baveja, from the University of Michigan. Russell said the center expects to add collaborators with related expertise in economics, philosophy and other social sciences.

The center is being launched with a grant of $5.5 million from the Open Philanthropy Project, with additional grants for the center’s research from the Leverhulme Trust and the Future of Life Institute.

Russell is quick to dismiss the imaginary threat from the sentient, evil robots of science fiction. The issue, he said, is that machines as we currently design them in fields like AI, robotics, control theory and operations research take the objectives that we humans give them very literally. Told to clean the bath, a domestic robot might, like the Cat in the Hat, use mother’s white dress, not understanding that the value of a clean dress is greater than the value of a clean bath.

The center will work on ways to guarantee that the most sophisticated AI systems of the future, which may be entrusted with control of critical infrastructure and may provide essential services to billions of people, will act in a manner that is aligned with human values.

“AI systems must remain under human control, with suitable constraints on behavior, despite capabilities that may eventually exceed our own,” Russell said. “This means we need cast-iron formal proofs, not just good intentions.”

One approach Russell and others are exploring is called inverse reinforcement learning, through which a robot can learn about human values by observing human behavior. By watching people dragging themselves out of bed in the morning and going through the grinding, hissing and steaming motions of making a caffè latte, for example, the robot learns something about the value of coffee to humans at that time of day.

“Rather than have robot designers specify the values, which would probably be a disaster,” said Russell, “instead the robots will observe and learn from people. Not just by watching, but also by reading. Almost everything ever written down is about people doing things, and other people having opinions about it. All of that is useful evidence.”

Russell and his colleagues don’t expect this to be an easy task.

“People are highly varied in their values and far from perfect in putting them into practice,” he acknowledged. “These aspects cause problems for a robot trying to learn what it is that we want and to navigate the often conflicting desires of different individuals.”

Russell, who recently wrote an optimistic article titled “Will They Make Us Better People?,” summed it up this way: “In the process of figuring out what values robots should optimize, we are making explicit the idealization of ourselves as humans. As we envision AI aligned with human values, that process might cause us to think more about how we ourselves really should behave, and we might learn that we have more in common with people of other cultures than we think.”

Steelmaning AI risk critiques

26 Stuart_Armstrong 23 July 2015 10:01AM

At some point soon, I'm going to attempt to steelman the position of those who reject the AI risk thesis, to see if it can be made solid. Here, I'm just asking if people can link to the most convincing arguments they've found against AI risk.

EDIT: Thanks for all the contribution! Keep them coming...

Presidents, asteroids, natural categories, and reduced impact

1 Stuart_Armstrong 06 July 2015 05:44PM

A putative new idea for AI control; index here.

EDIT: I feel this post is unclear, and will need to be redone again soon.

This post attempts to use the ideas developed about natural categories in order to get high impact from reduced impact AIs.


Extending niceness/reduced impact

I recently presented the problem of extending AI "niceness" given some fact X, to niceness given ¬X, choosing X to be something pretty significant but not overwhelmingly so - the death of a president. By assumption we had a successfully programmed niceness, but no good definition (this was meant to be "reduced impact" in a slight disguise).

This problem turned out to be much harder than expected. It seems that the only way to do so is to require the AI to define values dependent on a set of various (boolean) random variables Zj that did not include X/¬X. Then as long as the random variables represented natural categories, given X, the niceness should extend.

What did we mean by natural categories? Informally, it means that X should not appear in the definitions of these random variables. For instance, nuclear war is a natural category; "nuclear war XOR X" is not. Actually defining this was quite subtle; diverting through the grue and bleen problem, it seems that we had to define how we update X and the Zj given the evidence we expected to find. This was put in equation as picking Zj's that minimize

  • Variance{log[ P(X∧Z|E)*P(¬X∧¬Z|E) / P(X∧¬Z|E)*P(¬X∧Z|E) ]} 

where E is the random variable denoting the evidence we expected to find. Note that if we interchange X and ¬X, the ratio inverts, the log changes sign - but this makes no difference to the variance. So we can equally well talk about extending niceness given X to ¬X, or niceness given ¬X to X.


Perfect and imperfect extensions

The above definition would work for an "perfectly nice AI". That could be an AI that would be nice, given any combination of estimates of X and Zj. In practice, because we can't consider every edge case, we would only have an "expectedly nice AI". That means that the AI can fail to be nice in certain unusual and unlikely edge cases, in certain strange set of values of Zj that almost never come up...

...or at least, that almost never come up, given X. Since the "expected niceness" was calibrated given X, the such an expectedly nice AI may fail to be nice if ¬X results in a substantial change in the probability of the Zj (see the second failure mode in this post; some of the Zj may be so tightly coupled to the value of X that an expected niceness AI considers them fixed, and this results in problems if ¬X happens and their values change).

One way of fixing this is to require that the "swing" of the Zj be small upon changing X to ¬X or vice versa. Something like, for all values of {aj}, the ratio P({Zj=aj} | X) / P({Zj=aj} | ¬X) is contained between 100 and 1/100. This means that a reasonably good "expected niceness" calibrated on the Zj will transfer from X to ¬X (though the error may grow). This approach has some other advantages, as we'll see in the next section.

Of course, problems arise if the Zj are defined by smashing together events relatively independent of X with some that are very strongly dependent on X, into the same variable. Thus we'll further require that Zj cannot be decomposed into natural subvariables whose values swing more under changes from X to ¬X.


High impact from reduced impact

With this, we are finally in a position to define a reduced impact AI design that can have a large impact. Imagine that an asteroid is coming to hit the Earth. Two AIs are charged with targeting a laser at it. The laser isn't perfectly effective, and the asteroid isn't perfectly certain to hit the Earth anyway: let's say aiming the laser correctly will reduce the odds of impact by a factor of 10 or so (well within the probability swing restrictions of the previous section).

In the original formulation of this, the AIs were tasked with outputting the x and y coordinates to aim the laser. Here, instead, they'll be tasked with constructing the motivation of a robotic subagent that will move around for a while, see things and interact, before physically aiming the laser in the right direction. This task is harder.

Then the top level AI charged with aiming the y coordinates is given the following task:

  • It is motivated to cause the correct y-coordinate to be outputted.
  • If the x-coordinate AI is not turned on (call this event ¬X), it is strongly motivated to have reduced impact.
  • It must phrase the robotic subagent's utility function in terms of the Zj that obey all the properties above.
  • The evidence E considered for the equation governing the naturalness of Zj is considered (for example) to be a random sampling of space at the time both AIs are turned on. Most importantly, this is defined to be a theoretical process that the AI cannot affect physically.

So what will the AI do? It will produce a robot that will aim to aim the y-coordinate of the laser correctly, given ¬X, and not expand dangerously. However, we've setup things so that X is virtually certain, so the robot will see another, x-aiming robot, wander around. The chances of this happening, given ¬X, are minuscule, so the swing of the probability from ¬X to X, is huge. So, by the restrictions we've put on the Zj, the robot will not be motivated to change its behaviour just because its sees the other robot.

And if the laser is fired, the impact on humanity will, conversely, not be that large, so well within the parameters of the Zj. Thus the "general reduced impact" portion of the robot's motivation will be preserved.

And, of course, the x-robot will have the mirror image motivations, so they will both aim correctly, it seems. This should evade the problem described here.


Failure modes

There are two ways the approach could fail: one bad way, and two graceful ways. The bad way is if the "no decomposition" requirements (one of them is presented above, the other is in this post) fail. In this case, the niceness or reduced impact are not guaranteed to be extended to ¬X, and the AI's behaviour can become undefined. Note that this is is not bad optimisation pressure, just an absence of optimisation pressure - but that could still be potentially dangerous.

The first graceful failure happens if the AI is not able to create a reduced impact sub-agent given the restrictions on the Zj above. In that case, the AI does nothing. The second graceful failure happens if the AI evades our attempts to increase its impact, given ¬X. In that case, it simply becomes a reduced impact AI that does little. Not ideal, but not deadly.


Overall status: I'm not sure the idea is sound, at least not yet. Critiques welcome.

Grue, Bleen, and natural categories

3 Stuart_Armstrong 06 July 2015 01:47PM

A putative new idea for AI control; index here.

In a previous post, I looked at unnatural concepts such as grue (green if X was true, blue if it was false) and bleen. This was to enable one to construct the natural categories that extend AI behaviour, something that seemed surprisingly difficult to do.

The basic idea discussed in the grue post was that the naturalness of grue and bleen seemed dependent on features of our universe - mostly, that it was easy to tell whether an object was "currently green" without knowing what time it was, but we could not know whether the object was "currently grue" without knowing the time.

So the naturalness of the category depended on the type of evidence we expected to find. Furthermore, it seemed easier to discuss whether a category is natural "given X", rather than whether that category is natural in general. However, we know the relevant X in the AI problems considered so far, so this is not a problem.


Natural category, probability flows

Fix a boolean random variable X, and assume we want to check whether the boolean random variable Z is a natural category, given X.

If Z is natural (for instance, it could be the colour of an object, while X might be the brightness), then we expect to uncover two types of evidence:

  • those that change our estimate of X; this causes probability to "flow" as follows (or in the opposite directions):

  • ...and those that change our estimate of Z:

Or we might discover something that changes our estimates of X and Z simultaneously. If the probability flows to X and and Z in the same proportions, we might get:

What is an example of an unnatural category? Well, if Z is some sort of grue/bleen-like object given X, then we can have Z = X XOR Z', for Z' actually a natural category. This sets up the following probability flows, which we would not want to see:

More generally, Z might be constructed so that X∧Z, X∧¬Z, ¬X∧Z and ¬X∧¬Z are completely distinct categories; in that case, there are more forbidden probability flows:


In fact, there are only really three "linearly independent" probability flows, as we shall see.


Less pictures, more math

Let's represent the four possible state of affairs by four weights (not probabilities):

Since everything is easier when it's linear, let's set w11 = log(P(X∧Z)) and similarly for the other weights (we neglect cases where some events have zero probability). Weights are correspond to the same probabilities iff you get from one set to another by multiplying by a strictly positive number. For logarithms, this corresponds to adding the same constant to all the log-weights. So we can normalise our log-weights (select a single set of representative log-weights for each possible probability sets) by choosing the w such that

w11 + w12 + w21 + w22 = 0.

Thus the probability "flows" correspond to adding together two such normalised 2x2 matrices, one for the prior and one for the update. Composing two flows means adding two change matrices to the prior.

Four variables, one constraint: the set of possible log-weights is three dimensional. We know we have two allowable probability flows, given naturalness: those caused by changes to P(X), independent of P(Z), and vice versa. Thus we are looking for a single extra constraint to keep Z natural given X.

A little thought reveals that we want to keep constant the quantity:

w11 + w22 - w12 - w21.

This preserves all the allowed probability flows and rules out all the forbidden ones. Translating this back to a the general case, let "e" be the evidence we find. Then if Z is a natural category given X and the evidence e, the following quantity is the same for all possible values of e:

log[P(X∧Z|e)*P(¬X∧¬Z|e) / P(X∧¬Z|e)*P(¬X∧Z|e)].

If E is a random variable representing the possible values of e, this means that we want

log[P(X∧Z|E)*P(¬X∧¬Z|E) / P(X∧¬Z|E)*P(¬X∧Z|E)]

to be constant, or, equivalently, seeing the posterior probabilities as random variables dependent on E:

  • Variance{log[ P(X∧Z|E)*P(¬X∧¬Z|E) / P(X∧¬Z|E)*P(¬X∧Z|E) ]} = 0.

Call that variance the XE-naturalness measure. If it is zero, then Z defines a XE-natural category. Note that this does not imply that Z and X are independent, or independent conditional on E. Just that they are, in some sense, "equally (in)dependent whatever E is".


Almost natural category

The advantage of that last formulation becomes visible when we consider that the evidence which we uncover is not, in the real world, going to perfectly mark Z as natural, given X. To return to the grue example, though most evidence we uncover about an object is going to be the colour or the time rather than some weird combination, there is going to be somebidy who will right things like "either the object is green, and the sun has not yet set in the west; or instead perchance, those two statements are both alike in falsity". Upon reading that evidence, if we believe it in the slightest, the variance can no longer be zero.

Thus we cannot expect that the above XE-naturalness be perfectly zero, but we can demand that it be low. How low? There seems no principled way of deciding this, but we can make one attempt: that we cannot lower it be decomposing Z.

What do we mean by that? Well, assume that Z is a natural category, given X and the expected evidence, but Z' is not. Then we can define a new category boolean Y to be Z with high probability, and Z' otherwise. This will still have low XE-naturalness measure (as Z does) but is obviously not ideal.

Reversing this idea, we say Z defines a "XE-almost natural category" if there is no "more XE-natural" category that extends X∧Z (and the other for conjunctions). Technically, if

X∧Z = X∧Y,

Then Y must have equal or greater XE-naturalness measure to Z. And similarly for X∧¬Z, ¬X∧Z, and ¬X∧¬Z.

Note: I am somewhat unsure about this last definition; the concept I want to capture is clear (Z is not the combination of more XE-natural subvariables), but I'm not certain the definition does it.


Beyond boolean

What if Z takes n values, rather than being a boolean? This can be treated simply.

If we set the wjk to be log-weights as before, there are 2n free variables. The normalisation constraint is that they all sum to a constant. The "permissible" probability flows are given by flows from X to ¬X (adding a constant to the first column, subtracting it from the second) and pure changes in Z (adding constants to various rows, summing to 0). There are 1+ (n-1) linearly independent ways of doing this.

Therefore we are looking for 2n-1 -(1+(n-1))=n-1 independent constraints to forbid non-natural updating of X and Z. One basis set for these constraints could be to keep constant the values of

wj1 + w(j+1)2 - wj2 - w(j+1)1,

where j ranges between 1 and n-1.

This translates to variance constraints of the type:

  • Variance{log[ P(X∧{Z=j}|E)*P(¬X∧{Z=j+1}|E) / P(X∧{Z=j+1}|E)*P(¬X∧{Z=j}|E) ]} = 0.

But those are n different possible variances. What is the best global measure of XE-naturalness? It seems it could simply be

  • Maxjk Variance{log[ P(X∧{Z=j}|E)*P(¬X∧{Z=k}|E) / P(X∧{Z=k}|E)*P(¬X∧{Z=j}|E) ]} = 0.

If this quantity is zero, it naturally sends all variances to zero, and, when not zero, is a good candidate for the degree of XE-naturalness of Z.

The extension to the case where X takes multiple values is straightforward:

  • Maxjklm Variance{log[ P({X=l}∧{Z=j}|E)*P({X=m}∧{Z=k}|E) / P({X=l}∧{Z=k}|E)*P({X=m}∧{Z=j}|E) ]} = 0.

Note: if ever we need to compare the XE-naturalness of random variables taking different numbers of values, it may become necessary to divide these quantities by the number of variables involved, or maybe substitute a more complicated expression that contains all the different possible variances, rather than simply the maximum.


And in practice?

In the next post, I'll look at using this in practice for an AI, to evade presidential deaths and deflect asteroids.

Top 9+2 myths about AI risk

44 Stuart_Armstrong 29 June 2015 08:41PM

Following some somewhat misleading articles quoting me, I thought Id present the top 9 myths about the AI risk thesis:

  1. That we’re certain AI will doom us. Certainly not. It’s very hard to be certain of anything involving a technology that doesn’t exist; we’re just claiming that the probability of AI going bad isn’t low enough that we can ignore it.
  2. That humanity will survive, because we’ve always survived before. Many groups of humans haven’t survived contact with more powerful intelligent agents. In the past, those agents were other humans; but they need not be. The universe does not owe us a destiny. In the future, something will survive; it need not be us.
  3. That uncertainty means that you’re safe. If you’re claiming that AI is impossible, or that it will take countless decades, or that it’ll be safe... you’re not being uncertain, you’re being extremely specific about the future. “No AI risk” is certain; “Possible AI risk” is where we stand.
  4. That Terminator robots will be involved. Please? The threat from AI comes from its potential intelligence, not from its ability to clank around slowly with an Austrian accent.
  5. That we’re assuming the AI is too dumb to know what we’re asking it. No. A powerful AI will know what we meant to program it to do. But why should it care? And if we could figure out how to program “care about what we meant to ask”, well, then we’d have safe AI.
  6. That there’s one simple trick that can solve the whole problem. Many people have proposed that one trick. Some of them could even help (see Holden’s tool AI idea). None of them reduce the risk enough to relax – and many of the tricks contradict each other (you can’t design an AI that’s both a tool and socialising with humans!).
  7. That we want to stop AI research. We don’t. Current AI research is very far from the risky areas and abilities. And it’s risk aware AI researchers that are most likely to figure out how to make safe AI.
  8. That AIs will be more intelligent than us, hence more moral. It’s pretty clear than in humans, high intelligence is no guarantee of morality. Are you really willing to bet the whole future of humanity on the idea that AIs might be different? That in the billions of possible minds out there, there is none that is both dangerous and very intelligent?
  9. That science fiction or spiritual ideas are useful ways of understanding AI risk. Science fiction and spirituality are full of human concepts, created by humans, for humans, to communicate human ideas. They need not apply to AI at all, as these could be minds far removed from human concepts, possibly without a body, possibly with no emotions or consciousness, possibly with many new emotions and a different type of consciousness, etc... Anthropomorphising the AIs could lead us completely astray.
Lists cannot be comprehensive, but they can adapt and grow, adding more important points:
  1. That AIs have to be evil to be dangerous. The majority of the risk comes from indifferent or partially nice AIs. Those that have some goal to follow, with humanity and its desires just getting in the way – using resources, trying to oppose it, or just not being perfectly efficient for its goal.
  2. That we believe AI is coming soon. It might; it might not. Even if AI is known to be in the distant future (which isn't known, currently), some of the groundwork is worth laying now.


A thought on AI unemployment and its consequences

7 Stuart_Armstrong 18 August 2014 12:10PM

I haven't given much thought to the concept of automation and computer induced unemployment. Others at the FHI have been looking into it in more details - see Carl Frey's "The Future of Employment", which did estimates for 70 chosen professions as to their degree of automatability, and extended the results of this using O∗NET, an online service developed for the US Department of Labor, which gave the key features of an occupation as a standardised and measurable set of variables.

The reasons that I haven't been looking at it too much is that AI-unemployment has considerably less impact that AI-superintelligence, and thus is a less important use of time. However, if automation does cause mass unemployment, then advocating for AI safety will happen in a very different context to currently. Much will depend on how that mass unemployment problem is dealt with, what lessons are learnt, and the views of whoever is the most powerful in society. Just off the top of my head, I could think of four scenarios on whether risk goes up or down, depending on whether the unemployment problem was satisfactorily "solved" or not:

AI risk\UnemploymentProblem solvedProblem unsolved
Risk reduced
With good practice in dealing
with AI problems, people and
organisations are willing and
able to address the big issues.
The world is very conscious of the
misery that unrestricted AI
research can cause, and very
wary of future disruptions. Those
at the top want to hang on to
their gains, and they are the one
with the most control over AIs
and automation research.
Risk increased
Having dealt with the easier
automation problems in a
particular way (eg taxation),
people underestimate the risk
and expect the same
solutions to work.
Society is locked into a bitter
conflict between those benefiting
from automation and those
losing out, and superintelligence
is seen through the same prism.
Those who profited from
automation are the most
powerful, and decide to push

But of course the situation is far more complicated, with many different possible permutations, and no guarantee that the same approach will be used across the planet. And let the division into four boxes not fool us into thinking that any is of comparable probability to the others - more research is (really) needed.

[LINK] AI risk summary published in "The Conversation"

8 Stuart_Armstrong 14 August 2014 11:12AM

A slightly edited version of "AI risk - executive summary" has been published in "The Conversation", titled "Your essential guide to the rise of the intelligent machines":

The risks posed to human beings by artificial intelligence in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but Arnie’s character lacks the one characteristic that we in the real world actually need to worry about – extreme intelligence.

Thanks again for those who helped forge the original article. You can use this link, or the Less Wrong one, depending on the audience.

Request for concrete AI takeover mechanisms

18 KatjaGrace 28 April 2014 01:04AM

Any scenario where advanced AI takes over the world requires some mechanism for an AI to leverage its position as ethereal resident of a computer somewhere into command over a lot of physical resources.

One classic story of how this could happen, from Eliezer:

  1. Crack the protein folding problem, to the extent of being able to generate DNA strings whose folded peptide sequences fill specific functional roles in a complex chemical interaction.
  2. Email sets of DNA strings to one or more online laboratories which offer DNA synthesis, peptide sequencing, and FedEx delivery. (Many labs currently offer this service, and some boast of 72-hour turnaround times.)
  3. Find at least one human connected to the Internet who can be paid, blackmailed, or fooled by the right background story, into receiving FedExed vials and mixing them in a specified environment.
  4. The synthesized proteins form a very primitive “wet” nanosystem which, ribosomelike, is capable of accepting external instructions; perhaps patterned acoustic vibrations delivered by a speaker attached to the beaker.
  5. Use the extremely primitive nanosystem to build more sophisticated systems, which construct still more sophisticated systems, bootstrapping to molecular nanotechnology—or beyond.

You can do a lot of reasoning about AI takeover without any particular picture of how the world gets taken over. Nonetheless it would be nice to have an understanding of these possible routes. For preparation purposes, and also because a concrete, plausible pictures of doom are probably more motivating grounds for concern than abstract arguments.

So MIRI is interested in making a better list of possible concrete routes to AI taking over the world. And for this, we ask your assistance.

What are some other concrete AI takeover mechanisms? If an AI did not have a solution to the protein folding problem, and a DNA synthesis lab to write off to, what else might it do? 

We would like suggestions that take an AI from being on an internet-connected computer to controlling substantial physical resources, or having substantial manufacturing ability.

We would especially like suggestions which are plausible given technology that normal scientists would expect in the next 15 years. So limited involvement of advanced nanotechnology and quantum computers would be appreciated. 

We welcome partial suggestions, e.g. 'you can take control of a self-driving car from the internet - probably that could be useful in some schemes'. 

Thank you!

AI ebook cover design brainstorming

3 lukeprog 26 September 2013 11:49PM

Thanks to everyone who brainstormed possible titles for MIRI’s upcoming ebook on machine intelligence. Our leading contender for the book title is Smarter than Us: The Rise of Machine Intelligence.

What we need now are suggestions for a book cover design. AI is hard to depict without falling back on cliches, such as a brain image mixed with computer circuitry, a humanoid robot, HAL, an imitation of Creation of Adam with human and robot fingers touching, or an imitation of March of Progress with an AI at the far right.

A few ideas/examples:

  1. Something that conveys ‘AI’ in the middle (a computer screen? a server tower?) connected by arrow/wires/something to various ‘skills/actions/influences’, like giving a speech, flying unmanned spacecraft, doing science, predicting the stock market, etc., in an attempt to convey the diverse superpowers of a machine intelligence.

  2. A more minimalist text-only cover.

  3. A fairly minimal cover with just an ominous-looking server rack in the middle, with a few blinking lights and submerged in darkness around it. A bit like this cover.

  4. Similar to the above, except a server farm along the bottom fading into the background, with a frame composition similar to this.

  5. A darkened, machine-gunned room with a laptop sitting alone on a desk, displaying the text of the title on the screen. (This is the scene from the first chapter, about a Terminator who encounters an unthreatening-looking laptop which ends up being way more powerful and dangerous than the Terminator because it is more intelligent.)

Alex Vermeer sketched the first four of these ideas:

Some general inspiration may be found here.

We think we want something kinda dramatic, rather than cartoony, but less epic and unbelievable than the Facing the Intelligence Explosion cover.


Help us name a short primer on AI risk!

7 lukeprog 17 September 2013 08:35PM

MIRI will soon publish a short book by Stuart Armstrong on the topic of AI risk. The book is currently titled “AI-Risk Primer” by default, but we’re looking for something a little more catchy (just as we did for the upcoming Sequences ebook).

The book is meant to be accessible and avoids technical jargon. Here is the table of contents and a few snippets from the book, to give you an idea of the content and style:

  1. Terminator versus the AI
  2. Strength versus Intelligence
  3. What Is Intelligence? Can We Achieve It Artificially?
  4. How Powerful Could AIs Become?
  5. Talking to an Alien Mind
  6. Our Values Are Complex and Fragile
  7. What, Precisely, Do We Really (Really) Want?
  8. We Need to Get It All Exactly Right
  9. Listen to the Sound of Absent Experts
  10. A Summary
  11. That’s Where You Come In …

The Terminator is a creature from our primordial nightmares: tall, strong, aggressive, and nearly indestructible. We’re strongly primed to fear such a being—it resembles the lions, tigers, and bears that our ancestors so feared when they wandered alone on the savanna and tundra.

As a species, we humans haven’t achieved success through our natural armor plating, our claws, our razor-sharp teeth, or our poison-filled stingers. Though we have reasonably efficient bodies, it’s our brains that have made the difference. It’s through our social, cultural, and technological intelligence that we have raised ourselves to our current position.

Consider what would happen if an AI ever achieved the ability to function socially—to hold conversations with a reasonable facsimile of human fluency. For humans to increase their social skills, they need to go through painful trial and error processes, scrounge hints from more articulate individuals or from television, or try to hone their instincts by having dozens of conversations. An AI could go through a similar process, undeterred by social embarrassment, and with perfect memory. But it could also sift through vast databases of previous human conversations, analyze thousands of publications on human psychology, anticipate where conversations are leading many steps in advance, and always pick the right tone and pace to respond with. Imagine a human who, every time they opened their mouth, had spent a solid year to ponder and research whether their response was going to be maximally effective. That is what a social AI would be like.

So, title suggestions?

Transparency in safety-critical systems

4 lukeprog 25 August 2013 06:52PM

I've just posted an analysis to MIRI's blog called Transparency in Safety-Critical Systems. Its aim is to explain a common view about transparency and system reliability, and then open a dialogue about which parts of that view are wrong, or don't apply well to AGI.

The "common view" (not universal by any means) explained in the post is, roughly:

Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.

Three caveats / open problems listed at the end of the post are:

  1. How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
  2. Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
  3. How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?

The MIRI blog has only recently begun to regularly host substantive, non-news content, so it doesn't get much commenting action yet. Thus, I figured I'd post here and try to start a dialogue. Comment away!

Responses to Catastrophic AGI Risk: A Survey

11 lukeprog 08 July 2013 02:33PM

A great many Less Wrongers gave feedback on earlier drafts of "Responses to Catastrophic AGI Risk: A Survey," which has now been released. This is the preferred discussion page for the paper.

The report, co-authored by past MIRI researcher Kaj Sotala and University of Louisville’s Roman Yampolskiy, is a summary of the extant literature (250+ references) on AGI risk, and can serve either as a guide for researchers or as an introduction for the uninitiated.

Here is the abstract:

Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next twenty to one hundred years. It has been suggested that AGI may pose a catastrophic risk to humanity. After summarizing the arguments for why AGI may pose such a risk, we survey the field’s proposed responses to AGI risk. We consider societal proposals, proposals for external constraints on AGI behaviors, and proposals for creating AGIs that are safe due to their internal design.

Elites and AI: Stated Opinions

10 lukeprog 15 June 2013 07:52PM

Previously, I asked "Will the world's elites navigate the creation of AI just fine?" My current answer is "probably not," but I think it's a question worth additional investigation.

As a preliminary step, and with the help of MIRI interns Jeremy Miller and Oriane Gaillard, I've collected a few stated opinions on the issue. This survey of stated opinions is not representative of any particular group, and is not meant to provide strong evidence about what is true on the matter. It's merely a collection of quotes we happened to find on the subject. Hopefully others can point us to other stated opinions — or state their own opinions.

continue reading »

Will the world's elites navigate the creation of AI just fine?

20 lukeprog 31 May 2013 06:49PM

One open question in AI risk strategy is: Can we trust the world's elite decision-makers (hereafter "elites") to navigate the creation of human-level AI (and beyond) just fine, without the kinds of special efforts that e.g. Bostrom and Yudkowsky think are needed?

Some reasons for concern include:

  • Otherwise smart people say unreasonable things about AI safety.
  • Many people who believed AI was around the corner didn't take safety very seriously.
  • Elites have failed to navigate many important issues wisely (2008 financial crisis, climate change, Iraq War, etc.), for a variety of reasons.
  • AI may arrive rather suddenly, leaving little time for preparation.

But if you were trying to argue for hope, you might argue along these lines (presented for the sake of argument; I don't actually endorse this argument):

  • If AI is preceded by visible signals, elites are likely to take safety measures. Effective measures were taken to address asteroid risk. Large resources are devoted to mitigating climate change risks. Personal and tribal selfishness align with AI risk-reduction in a way they may not align on climate change. Availability of information is increasing over time.
  • AI is likely to be preceded by visible signals. Conceptual insights often take years of incremental tweaking. In vision, speech, games, compression, robotics, and other fields, performance curves are mostly smooth. "Human-level performance at X" benchmarks influence perceptions and should be more exhaustive and come more rapidly as AI approaches. Recursive self-improvement capabilities could be charted, and are likely to be AI-complete. If AI succeeds, it will likely succeed for reasons comprehensible by the AI researchers of the time.
  • Therefore, safety measures will likely be taken.
  • If safety measures are taken, then elites will navigate the creation of AI just fine. Corporate and government leaders can use simple heuristics (e.g. Nobel prizes) to access the upper end of expert opinion. AI designs with easily tailored tendency to act may be the easiest to build. The use of early AIs to solve AI safety problems creates an attractor for "safe, powerful AI." Arms races not insurmountable.

The basic structure of this 'argument for hope' is due to Carl Shulman, though he doesn't necessarily endorse the details. (Also, it's just a rough argument, and as stated is not deductively valid.)

Personally, I am not very comforted by this argument because:

  • Elites often fail to take effective action despite plenty of warning.
  • I think there's a >10% chance AI will not be preceded by visible signals.
  • I think the elites' safety measures will likely be insufficient.

Obviously, there's a lot more for me to spell out here, and some of it may be unclear. The reason I'm posting these thoughts in such a rough state is so that MIRI can get some help on our research into this question.

In particular, I'd like to know:

  • Which historical events are analogous to AI risk in some important ways? Possibilities include: nuclear weapons, climate change, recombinant DNA, nanotechnology, chloroflourocarbons, asteroids, cyberterrorism, Spanish flu, the 2008 financial crisis, and large wars.
  • What are some good resources (e.g. books) for investigating the relevance of these analogies to AI risk (for the purposes of illuminating elites' likely response to AI risk)?
  • What are some good studies on elites' decision-making abilities in general?
  • Has the increasing availability of information in the past century noticeably improved elite decision-making?

AI risk-related improvements to the LW wiki

38 Kaj_Sotala 07 November 2012 09:24AM

Back in May, Luke suggested the creation of a scholarly AI risk wiki, which was to include a large set of summary articles on topics related to AI risk, mapped out in terms of how they related to the central debates about AI risk. In response, Wei Dai suggested that among other things, the existing Less Wrong wiki could be improved instead. As a result, the Singularity Institute has massively improved the LW wiki, in preparation for a more ambitious scholarly AI risk wiki. The outcome was the creation or dramatic expansion of the following articles:

In managing the project, I focused on content over presentation, so a number of articles still have minor issues such as the grammar and style having room for improvement. It's our hope that, with the largest part of the work already done, the LW community will help improve the articles even further.

Thanks to everyone who worked on these pages: Alex Altair, Adam Bales, Caleb Bell, Costanza Riccioli, Daniel Trenor, João Lourenço, Joshua Fox, Patrick Rhodes, Pedro Chaves, Stuart Armstrong, and Steven Kaas.

Desired articles on AI risk?

13 lukeprog 02 November 2012 05:39AM

I've once again updated my list of forthcoming and desired articles on AI risk, which currently names 17 forthcoming articles and books about AGI risk, and also names 26 desired articles that I wish researchers were currently writing.

But I'd like to hear your suggestions, too. Which articles not already on the list as "forthcoming" or "desired" would you most like to see written, on the subject of AGI risk?

Book/article titles reproduced below for convenience...

continue reading »

Computation Hazards

15 Alex_Altair 13 June 2012 09:49PM
This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine ahead of time which algorithms are computation hazards.

Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running most algorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.

But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that are computation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.


An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.


A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any approximation of Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.


An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a self-improving agent computation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.

Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will simulate people in order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer's mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

Similarly, one can imagine that a predictor or oracle might simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.

Predictors also have the problem of self-fulfilling prophecies (first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.

Overall, there is a large worry that sufficiently intelligent oracles or predictors may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.

Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were sure it was not a person. A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.

Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?


Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224-254, June 1964.

Building toward a Friendly AI team

24 lukeprog 06 June 2012 06:57PM

Series: How to Purchase AI Risk Reduction

A key part of SI's strategy for AI risk reduction is to build toward hosting a Friendly AI development team at the Singularity Institute.

I don't take it to be obvious that an SI-hosted FAI team is the correct path toward the endgame of humanity "winning." That is a matter for much strategic research and debate.

Either way, I think that building toward an FAI team is good for AI risk reduction, even if we decide (later) that an SI-hosted FAI team is not the best thing to do. Why is this so?

Building toward an SI-hosted FAI team means:

  1. Growing SI into a tighter, larger, and more effective organization in general.
  2. Attracting and creating people who are trustworthy, altruistic, hard-working, highly capable, extremely intelligent, and deeply concerned about AI risk. (We'll call these people "superhero mathematicians.")

Both (1) and (2) are useful for AI risk reduction even if an SI-hosted FAI team turns out not to be the best strategy.

This is because: Achieving part (1) would make SI more effective at whatever it is doing to reduce AI risk, and achieving part (2) would bring great human resources to the cause of AI risk reduction, which will be useful to a wide range of purposes (FAI team or otherwise).

So, how do we accomplish both these things?


Growing SI into a better organization

Like many (most?) non-profits with less than $1m/yr in funding, SI has had difficulty attracting the top-level executive talent often required to build a highly efficient and effective organization. Luckily, we have made rapid progress on this front in the past 9 months. For example we now have (1) a comprehensive donor database, (2) a strategic plan, (3) a team of remote contractors used to more efficiently complete large and varied projects requiring many different skillsets, (4) an increasingly "best practices" implementation of central management, (5) an office we actually use to work together on projects, and many other improvements.

What else can SI do to become a tighter, larger, and more effective organization?

  1. Hire a professional bookkeeper, implement additional bookkeeping and accounting best practices. (Currently underway.)
  2. Create a more navigable and up-to-date website. (Currently underway.)
  3. Improve our fundraising strategy, e.g. by creating a deck of slides for major donors which explains what we're doing and what we can do with more funding. (Currently underway.)
  4. Create standard policy documents that lower our risk of being distracted by an IRS audit. (Currently underway.)
  5. Shift the Singularity Summit toward being more directly useful for AI risk reduction, and also toward greater profitability—so that we have at least one funding source that is not donations. (Currently underway.)
  6. Spin off the Center for Applied Rationality so that SI is more solely focused on AI safety. (Currently underway.)
  7. Build a fundraising/investment-focused Board of Trustees (ala IAS or SU) in addition to our Board of Directors and Board of Advisors.
  8. Create an endowment to ensure ongoing funding for core researchers.
  9. Consult with the most relevant university department heads and experienced principal investigators (e.g. at IAS and Santa Fe) about how to start and run an effective team for advanced technical research.
  10. Do the things recommended by these experts (that are relevant to SI's mission).

They key point, of course, is that all these things cost money. They may be "boring," but they are incredibly important.


Attracting and creating superhero mathematicians

The kind of people we'd need for an FAI team are:

  1. Highly intelligent, and especially skilled in maths, probably at the IMO medal-winning level. (FAI team members will need to create lots of new math during the course of the FAI research initiative.)
  2. Trustworthy. (Most FAI work is not "Friendliness theory" but instead AI architectures work that could be made more dangerous if released to a wider community that is less concerned with AI safety.)
  3. Altruistic. (Since the fate of humanity may be in their hands, they need to be robustly altruistic.)
  4. Hard-working, determined. (FAI is a very difficult research problem and will require lots of hard work and also an attitude of "shut up and do the impossible.")
  5. Deeply committed to AI risk reduction. (It would be risky to have people who could be pulled off the team—with all their potentially dangerous knowledge—by offers from hedge funds or Google.)
  6. Unusually rational. (To avoid philosophical confusions, to promote general effectiveness and group cohesion, and more.)

There are other criteria, too, but those are some of the biggest.

We can attract some of the people meeting these criteria by using the methods described in Reaching young math/compsci talent. The trouble is that the number of people on Earth who qualify may be very close to 0 (especially given the "committed to AI risk reduction" criterion).

Thus, we'll need to create some superhero mathematicians.

Math ability seems to be even more "fixed" than the other criteria, so a (very rough) strategy for creating superhero mathematicians might look like this:

  1. Find people with the required level of math ability.
  2. Train them on AI risk and rationality.
  3. Focus on the few who become deeply committed to AI risk reduction and rationality.
  4. Select from among those people the ones who are most altruistic, trustworthy, hard-working, and determined. (Some training may be possible for these features, too.)
  5. Try them out for 3 months and select the best few candidates for the FAI team.

All these steps, too, cost money.


Strategic research on AI risk

7 lukeprog 06 June 2012 05:02PM

Series: How to Purchase AI Risk Reduction

Norman Rasmussen's analysis of the safety of nuclear power plants, written before any nuclear accidents had occurred, correctly predicted several details of the Three Mile Island incident in ways that that previous experts had not (see McGrayne 2011, p. 180). Had Rasmussen's analysis been heeded, the Three Mile Island incident might not have occurred.

This is the kind of strategic analysis, risk analysis, and technological forecasting that could help us to pivot the world in important ways.

Our AI risk situation is very complicated. There are many uncertainties about the future, and many interacting strategic variables. Though it is often hard to see whether a strategic analysis will pay off, the alternative is to act blindly.

Here are some examples of strategic research that may help (or have already helped) to inform our attempts to shape the future:

  • FHI's Whole Brain Emulation roadmap and SI's WBE discussion at the Summit 2011 workshop.
  • Nick Bostrom's forthcoming book on machine superintelligence.
  • Global Catastrophic Risks, which locates AI risk in the context of other catastrophic risks.
  • A model of AI risk currently being developed in MATLAB by Anna Salamon and others.
  • A study of past researchers who abandoned certain kinds of research when they came to believe it might be dangerous, and what might have caused such action. (This project is underway at SI.)

Here are some additional projects of strategic research that could help inform x-risk decisions, if funding were available to perform them:

  • A study of opportunities for differential technological development, and how to actually achieve them.
  • A study of microeconomic models of WBEs and self-improving systems.
  • A study of which research topics should and should not be discussed in public for the purposes of x-risk prevention. (E.g. we may wish to keep AGI discoveries secret for the same reason we'd want to keep the DNA of a synthetically developed supervirus secret, but we may wish to publish research on safe AGI goals because they are safe for a broader community to work on. But it's often difficult to see whether a subject fits into one category or the other.)

I'll note that for as long as FHI is working on AI risk, FHI probably has an advantage over SI in producing actionable strategic research, given past successes like the WBE roadmap and the GCR volume. But SI is also performing actionable strategic research, as described above.

Raising safety-consciousness among AGI researchers

15 lukeprog 02 June 2012 09:39PM

Series: How to Purchase AI Risk Reduction

Another method for purchasing AI risk reduction is to raise the safety-consciousness of researchers doing work related to AGI.

The Singularity Institute is conducting a study of scientists who decided to either (1) stop researching some topic after realizing it might be dangerous, or who (2) forked their career into advocacy, activism, ethics, etc. because they became concerned about the potential negative consequences of their work. From this historical inquiry we hope to learn some things about what causes scientists to become so concerned about the consequences of their work that they take action. Some of the examples we've found so far: Michael Michaud (resigned from SETI in part due to worries about the safety of trying to contact ET), Joseph Rotblat (resigned from the Manhattan Project before the end of the war due to concerns about the destructive impact of nuclear weapons), and Paul Berg (became part of a self-imposed moratorium on recombinant DNA back when it was still unknown how dangerous this new technology could be).

What else can be done?

Naturally, these efforts should be directed toward researchers who are both highly competent and whose work is very relevant to development toward AGI: researchers like Josh Tenenbaum, Shane Legg, and Henry Markram.

Reaching young math/compsci talent

6 lukeprog 02 June 2012 09:07PM

Series: How to Purchase AI Risk Reduction

Here is yet another way to purchase AI risk reduction...

Much of the work needed for Friendly AI and improved algorithmic decision theories requires researchers to invent new math. That's why the Singularity Institute's recruiting efforts have been aimed a talent in math and computer science. Specifically, we're looking for young talent in math and compsci, because young talent is (1) more open to considering radical ideas like AI risk, (2) not yet entrenched in careers and status games, and (3) better at inventing new math (due to cognitive decline with age).

So how can the Singularity Institute reach out to young math/compsci talent? Perhaps surprisingly, Harry Potter and the Methods of Rationality is one of the best tools we have for this. It is read by a surprisingly large proportion of people in math and CS departments. Here are some other projects we have in the works:

  • Run SPARC, a summer program on rationality for high school students with exceptional math ability. Cost: roughly $30,000. (There won't be classes on x-risk at SPARC, but it will attract young talent toward efficient altruism in general.)
  • Print copies of the first few chapters of HPMoR cheaply in Taiwan, ship them here, distribute them to leading math and compsci departments. Cost estimate in progress.
  • Send copies of Global Catastrophic Risks to lists of bright young students. Cost estimate in progress.

Here are some things we could be doing if we had sufficient funding:

  • Sponsor and be present at events where young math/compsci talent gathers, e.g. TopCoder High School and the International Math Olympiad. Cost estimate in progress.
  • Cultivate a network of x-risk reducers with high mathematical ability, build a database of conversations for them to have with strategically important young math/compsci talent, schedule those conversations and develop a pipeline so that interested prospects have a "next person" to talk to. Cost estimate in progress.
  • Write Open Problems in Friendly AI, send it to interested parties so that even those who don't think AI risk is important will at least see "Ooh, look at these sexy, interesting problems I could work on!"



How to Purchase AI Risk Reduction

15 lukeprog 01 June 2012 03:13AM

I'm writing a series of discussion posts on how to purchase AI risk reduction (through donations to the Singularity Institute, anyway; other x-risk organizations will have to speak for themselves about their plans).

Each post outlines a concrete proposal, with cost estimates:

(For a quick primer on AI risk, see Facing the Singularity.)

Building the AI Risk Research Community

18 lukeprog 01 June 2012 02:13AM

Series: How to Purchase AI Risk Reduction

Yet another way to purchase reductions in AI risk may be to grow the AI risk research community.

The AI risk research community is pretty small. It currently consists of:

  • 4-ish AI risk researchers at the Singularity Institute. (Eliezer is helping to launch CFAR before he goes back to AI risk research. The AI risk research done at SI right now is: about 40% of Carl, 25% of me, plus large and small fractions of various remote researchers, most significantly about 90% of Kaj Sotala.)
  • 4-ish AI risk researchers at the Future of Humanity Institute: Nick Bostrom, Anders Sandberg, Stuart Armstrong, Vincent Mueller. (This number might be wrong. It seems that Nick and Stuart are working basically full-time on AI risk right now, but I'm not sure about Anders and Vincent. Also, FHI should be hiring someone shortly with the Tamas Research Fellowship money. Finally, note that FHI has a broader mission than AI risk, so while they are focusing on AI risk while Nick works on his Superintelligence book, they will probably return to other subjects sometime thereafter.)
  • 0.6-ish AI risk researchers at Leverage Research, maybe?
  • 0.2-ish AI risk researchers at GCRi, maybe?
  • Nobody yet at CSER, but maybe 1-2 people in the relatively near future?
  • Occasionally, something useful might come from mainstream machine ethics, but they mostly aren't focused on problems of machine superintelligence (yet).
  • Small fractions of some people in the broader AI risk community, e.g. Ben GoertzelDavid Chalmers, and Wei Dai.

Obviously, a larger AI risk research community could be more productive. (It could also grow to include more people but fail to do actually useful work, like so many academic disciplines. But there are ways to push such a small field in useful directions as it grows.)

So, how would one grow the AI risk research community? Here are some methods:

  1. Make it easier for AI risk researchers to do their work, by providing a well-organized platform of work from which they can build. Nick's Superintelligence book will help with that. So would a scholarly AI risk wiki. So do sites like Existential-Risk.org, IntelligenceExplosion.com, Friendly-AI.com, Friendly AI Research, and SI's Singularity FAQ. So does my AI Risk Bibliography 2012, and so do "basics" or "survey" articles like Artificial Intelligence as a Positive and Negative Factor in Global Risk, The Singularity: A Philosophical Analysis, and Intelligence Explosion: Evidence and Import. So do helpful lists like journals that may publish articles related to AI risk. (If you don't think such things are useful, then you probably don't know what it's like to be a researcher trying to develop papers in the field. When I send my AI risk bibliography and my list of "journals that may publish articles related to AI risk" to AI risk researchers, I get back emails that say "thank you" with multiple exclamation points.)
  2. Run an annual conference for researchers, put out a call for papers, etc. This brings researchers together and creates a community. The AGI conference series did this for AGI. The new AGI Impacts sessions at AGI-12 could potentially be grown into an AGI Impacts conference series that would effectively be an AI Risk conference series.
  3. Maybe launch a journal for AI risk papers. Like a conference, this can to some degree bring the community closer. It can also provide a place to publish articles that don't fit within the scope of any other existing journals. I say "maybe" on this one because it can be costly to run a journal well, and there are plenty of journals already that will publish papers on AI risk.
  4. Give out grants for AI risk research.

Here's just one example of what SI is currently doing to help grow the AI risk research community.

Writing "Responses to Catastrophic AGI Risk": A journal-bound summary of the AI risk problem, and a taxonomy of the societal proposals (e.g. denial of the risk, no action, legal and economic controls, differential technological development) and AI design proposals (e.g. AI confinement, chaining, Oracle AI, FAI) that have been made.

Estimated final cost: $5,000 for Kaj's time, $500 for other remote research, 30 hours of Luke's time.

Now, here's a list of things SI could be doing to help grow the AI risk research community:

  • Creating a scholarly AI risk wiki. Estimated cost: 1,920 hours of SI staff time (over two years), $384,000 for remote researchers and writers, and $30,000 for wiki design, development, and hosting costs.
  • Helping to grow the AGI Impacts sessions at AGI-12 into an AGI Impacts conference. (No cost estimate yet.)
  • Writing Open Problems in Friendly AI. Estimated cost: 2 months of Eliezer's time, 250 hours of Luke's time, $40,000 for internal and external researchers.
  • Writing more "basics" and "survey" articles on AI risk topics.
  • Giving out grants for AI risk research.

Proposal for "Open Problems in Friendly AI"

26 lukeprog 01 June 2012 02:06AM

Series: How to Purchase AI Risk Reduction

One more project SI is considering...

When I was hired as an intern for SI in April 2011, one of my first proposals was that SI create a technical document called Open Problems in Friendly Artificial Intelligence. (Here is a preview of what the document would be like.)

When someone becomes persuaded that Friendly AI is important, their first question is often: "Okay, so what's the technical research agenda?"

So You Want to Save the World maps out some broad categories of research questions, but it doesn't explain what the technical research agenda is. In fact, SI hasn't yet explained much of the technical research agenda yet.

Much of the technical research agenda should be kept secret for the same reasons you might want to keep secret the DNA for a synthesized supervirus. But some of the Friendly AI technical research agenda is safe to explain so that a broad research community can contribute to it.

This research agenda includes:

  • Second-order logical version of Solomonoff induction.
  • Non-Cartesian version of Solomonoff induction.
  • Construing utility functions from psychologically realistic models of human decision processes.
  • Formalizations of value extrapolation. (Like Christiano's attempt.)
  • Microeconomic models of self-improving systems (e.g. takeoff speeds).
  • ...and several others open problems.

The goal would be to define the open problems as formally and precisely as possible. Some will be more formalizable than others, at this stage. (As a model for this kind of document, see Marcus Hutter's Open Problems in Universal Induction and Intelligence.)

Nobody knows the open problems in Friendly AI research better than Eliezer, so it would probably be best to approach the project this way:

  1. Eliezer spends a month writing an "Open Problems in Friendly AI" sequence for Less Wrong.
  2. Luke organizes a (fairly large) research team for presenting these open problems with greater clarity and thoroughness, in the mainstream academic form.
  3. These researchers collaborate for several months to put together the document, involving Eliezer when necessary.
  4. SI publishes the final document, possibly in a journal.

Estimated cost:

  • 2 months of Eliezer's time.
  • 150 hours of Luke's time.
  • $40,000 for contributed hours from staff researchers, remote researchers, and perhaps domain experts (as consultants) from mainstream academia.


Short Primers on Crucial Topics

24 lukeprog 31 May 2012 12:46AM

Series: How to Purchase AI Risk Reduction

Here's another way we might purchase existential risk reduction: the production of short primers on crucial topics.

Resources like The Sequences and NickBostrom.com have been incredibly effective at gathering and creating a community engaged in x-risk reduction (either through direct action or, perhaps more importantly, through donations), but most people who could make a difference probably won't take the time to read The Sequences or academic papers.

One solution? Short primers on crucial topics.

Facing the Singularity is one example. I'm waiting for some work from remote researchers before I write the last chapter, but once it's complete we'll produce a PDF version and a Kindle version. Already, several people (including Jaan Tallinn) use it as a standard introduction they send to AI risk newbies.

Similar documents (say, 10 pages in length) could be produced for topics like Existential Risk, AI Risk, Friendly AI, Optimal Philanthropy, and Rationality. These would be concise, fun to read, and emotionally engaging, while also being accurate and thoroughly hyperlinked/referenced to fuller explanations of each section and major idea (on LessWrong, in academic papers, etc.).

These could even be printed and left lying around wherever we think is most important: say, at the top math, computer science, and formal philosophy departments in the English-speaking world.

The major difficulty in executing such a project would be in finding good writers with the relevant knowledge. Eliezer, Yvain, and myself might qualify, but right now the three of us are otherwise occupied. The time investment of the primary author(s) could be minimized by outsourcing as much of the work as possible to SI's team of remote researchers, writers, and editors.

Estimated cost per primer:

  • 80 hours from primary author. (Well, if it's me. I've put about 60 hours into the writing of Facing the Singularity so far, which is of similar length to the proposed primers but I'm adding some padding to the estimate.)
  • $4,000 on remote research. (Tracking down statistics and references, etc.)
  • $1000 on book design, Kindle version production, etc.
Translations to other languages could also be produced, for an estimated cost of $2,000 per translation (this includes checks and improvements by multiple translators).


Funding Good Research

22 lukeprog 27 May 2012 06:41AM

Series: How to Purchase AI Risk Reduction

I recently explained that one major project undergoing cost-benefit analysis at the Singularity Institute is that of a scholarly AI risk wiki. The proposal is exciting to many, but as Kaj Sotala points out:

This idea sounds promising, but I find it hard to say anything about "should this be funded" without knowing what the alternative uses for the money are. Almost any use of money can be made to sound attractive with some effort, but the crucial question in budgeting is not "would this be useful" but "would this be the most useful thing".

Indeed. So here is another thing that donations to SI could purchase: good research papers by skilled academics.


Our recent grant of $20,000 to Rachael Briggs (for an introductory paper on TDT) provides an example of how this works:

  1. SI thinks of a paper it wants to exist but doesn't have the resources to write itself (e.g. a clearer presentation of TDT).
  2. SI looks for a few productive academics well-suited to write the paper we have in mind, and approaches them directly with the grant proposal. (Briggs is an excellent choice for the TDT paper because she is a good explainer and has had two of her past decision theory papers selected as among the 10 best papers of the year by The Philosopher's Annual.)
  3. Hopefully, one of these academics says "yes." We award them the grant in return for a certain kind of paper published in one of a pre-specified set of journals. (In the case of the TDT grant to Rachael Briggs, we specified that the final paper must be published in one of the following journals: Philosopher's Imprint, Philosophy and Phenomenological Research, Philosophical Quarterly, Philosophical Studies, Erkenntnis, Theoria, Australasian Journal of Philosophy, Nous, The Philosophical Review, or Theory and Decision.)
  4. SI gives regular feedback on outline drafts and article drafts prepared by the article author.
  5. Paper gets submitted, revised, and published!

For example, SI could award grants for the following papers:

  • "Objections to CEV," by somebody like David Sobel (his "Full Information Accounts of Well-Being" remains the most significant unanswered attack on ideal-preference theories like CEV).
  • "Counterfactual Mugging," by somebody like Rachael Briggs (here is the original post by Vladimir Nesov).
  • "CEV as a Computational Meta-Ethics," by somebody like Gert-Jan Lokhorst (see his paper "Computational Metaethics").
  • "Non-Bayesian Decision Theory and Normative Uncertainty," by somebody like Martin Peterson (the problem of normative uncertainty is a serious one, and Peterson's approach is a different line of approach than the one pursued by Nick Bostrom, Toby Ord, and Will Crouch, and also different from the one pursued by Andrew Sepielli).
  • "Methods for Long-Term Technological Forecasting," by somebody like Bela Nagy (Nagy is the lead author on one of the best papers in the field)
  • "Convergence to Rational Economic Agency," by somebody like Steve Omohundro (Omohundro's 2007 paper argues that advanced agents will converge toward the rational economic model of decision-making, if true this would make it easier to predict the convergent instrumental goals of advanced AIs, but his argument leaves much to be desired in persuasiveness as it is currently formulated).
  • "Value Learning," by somebody like Bill Hibbard (Dewey's 2011 paper and Hibbard's 2012 paper make interesting advances on this topic, but there is much more work to be done).
  • "Learning Preferences from Human Behavior," by somebody like Thomas Nielsen (Nielsen's 2004 paper with Finn Jensen described the first computationally tractable algorithms capable of learning a decision maker’s utility function from potentially inconsistent behavior. Their solution was to interpret inconsistent choices as random deviations from an underlying “true” utility function. But the data from neuroeconomics suggest a different solution: interpret inconsistent choices as deviations from an underlying “true” utility function that are produced by non-model-based valuation systems in the brain, and use the latest neuroscientific research to predict when and to what extent model-based choices are being “overruled” by the non-model-based valuation systems).

(These are only examples. I don't necessarily think these particular papers would be good investments.)


A Scholarly AI Risk Wiki

22 lukeprog 25 May 2012 08:53PM

Series: How to Purchase AI Risk Reduction

One large project proposal currently undergoing cost-benefit analysis at the Singularity Institute is a scholarly AI risk wiki. Below I will summarize the project proposal, because:

  • I would like feedback from the community on it, and
  • I would like to provide just one example of the kind of x-risk reduction that can be purchased with donations to the Singularity Institute.



The Idea

Think Scholarpedia:

  • Open-access scholarly articles written at roughly the "Scientific American" level of difficulty.
  • Runs on MediaWiki, but articles can only be created and edited by carefully selected authors, and curated by experts in the domain relevant to each article. (The editors would be SI researchers at first, and most of the authors and contributors would be staff researchers, research associates, or "remote researchers" from SI.)

But the scholarly AI risk wiki would differ from Scholarpedia in these respects:

  • Is focused on the subject of AI risk and related subjects.
  • No formal peer review system. The articles would, however, be continuously revised in response to comments from experts in the relevant fields, many of whom already work in the x-risk field or are knowledgeable participants on LessWrong and in the SIAI/FHI/etc. communities.
  • Articles will be written for a broader educated audience, not just for domain experts. (Many articles on Scholarpedia aren't actually written at the Scientific American level, despite that stated intent.)
  • A built-in citations and references system, Biblio (perhaps with the BibTeX addition).

Example articles: Eliezer Yudkowsky, Nick Bostrom, Ben Goertzel, Carl Shulman, Artificial General Intelligence, Decision Theory, Bayesian Decision Theory, Evidential Decision Theory, Causal Decision Theory, Timeless Decision Theory, Counterfactual Mugging, Existential Risk, Expected Utility, Expected Value, Utility, Friendly AI, Intelligence Explosion, AGI Sputnik Moment, Optimization Process, Optimization Power, Metaethics, Tool AI, Oracle AI, Unfriendly AI, Complexity of Value, Fragility of Value, Church-Turing Thesis, Nanny AI, Whole Brain Emulation, AIXI, Orthogonality Thesis, Instrumental Convergence Thesis, Biological Cognitive Enhancement, Nanotechnology, Recursive Self-Improvement, Intelligence, AI Takeoff, AI Boxing, Coherent Extrapolated Volition, Coherent Aggregated Volition, Reflective Decision Theory, Value Learning, Logical Uncertainty, Technological Development, Technological Forecasting, Emulation Argument for Human-Level AI, Evolutionary Argument for Human-Level AI, Extensibility Argument for Greater-Than-Human Intelligence, Anvil Problem, Optimality Notions, Universal Intelligence, Differential Intellectual Progress, Brain-Computer Interfaces, Malthusian Scenarios, Seed AI, Singleton, Superintelligence, Pascal's Mugging, Moore's Law, Superorganism, Infinities in Ethics, Economic Consequences of AI and Whole Brain Emulation, Creating Friendly AI, Cognitive Bias, Great Filter, Observation Selection Effects, Astronomical Waste, AI Arms Races, Normative and Moral Uncertainty, The Simulation Hypothesis, The Simulation Argument, Information Hazards, Optimal Philanthropy, Neuromorphic AI, Hazards from Large-Scale Computation, AGI Skepticism, Machine Ethics, Event Horizon Thesis, Acceleration Thesis, Singularitarianism, Subgoal Stomp, Wireheading, Ontological Crisis, Moral Divergence, Utility Indifference, Personhood Predicates, Consequentialism, Technological Revolutions, Prediction Markets, Global Catastrophic Risks, Paperclip Maximizer, Coherent Blended Volition, Fun Theory, Game Theory, The Singularity, History of AI Risk Thought, Utility Extraction, Reinforcement Learning, Machine Learning, Probability Theory, Prior Probability, Preferences, Regulation and AI Risk, Godel Machine, Lifespan Dilemma, AI Advantages, Algorithmic Complexity, Human-AGI Integration and Trade, AGI Chaining, Value Extrapolation, 5 and 10 Problem.

Most of these articles would contain previously unpublished research (not published even in blog posts or comments), because most of the AI risk research that has been done has never been written up in any form but sits in the brains and Google docs of people like Yudkowsky, Bostrom, Shulman, and Armstrong.



More than a year ago, I argued that SI would benefit from publishing short, clear, scholarly articles on AI risk. More recently, Nick Beckstead expressed the point this way:

Most extant presentations of SIAI's views leave much to be desired in terms of clarity, completeness, concision, accessibility, and credibility signals.

Chris Hallquist added:

I've been trying to write something about Eliezer's debate with Robin Hanson, but the problem I keep running up against is that Eliezer's points are not clearly articulated at all. Even making my best educated guesses about what's supposed to go in the gaps in his arguments, I still ended up with very little.

Of course, SI has long known it could benefit from clearer presentations of its views, but the cost was too high to implement it. Scholarly authors of Nick Bostrom's skill and productivity are extremely rare, and almost none of them care about AI risk. But now, let's be clear about what a scholarly AI risk wiki could accomplish:

  • Provide a clearer argument for caring about AI risk. Journal-published articles like Chalmers (2010) can be clear and scholarly, but the linear format is not ideal for analyzing such a complex thing as AI risk. Even a 65-page article like Chalmers (2010) can't hope to address even the tiniest fraction of the relevant evidence and arguments. Nor can it hope to respond to the tiniest fraction of all the objections that are "obvious" to some of its readers. What we need is a modular presentation of the evidence and the arguments, so that those who accept physicalism, near-term AI, and the orthogonality thesis can jump right to the sections on why various AI boxing methods may not work, while those who aren't sure what to think of AI timelines can jump to those articles, and those who accept most of the concern for AI risk but think there's no reason to assert humane values over arbitrary machine values can jump to the article on that subject. (Note that I don't presume all the analysis that would go into building an AI risk wiki would end up clearly recommending SI's current, very specific positions on AI risk, but I'm pretty sure it would clearly recommend some considerable concern for AI risk.)
  • Provide a clearer picture of our AI risk situation. Without clear presentations of most of the relevant factors, it is very costly for interested parties to develop a clear picture of our AI risk situation. If you wanted to get roughly as clear a picture of our AI risk situation as can be had today, you'd have to (1) read several books, hundreds of articles and blog posts, and the archives of SI's decision theory mailing list and several forums, (2) analyze them in detail to try to fill in all the missing steps in the reasoning presented in these sources, and (3) have dozens of hours of conversation with the leading experts in the field (Yudkowsky, Bostrom, Shulman, Armstrong, etc.). With a scholarly AI risk wiki, a decently clear picture of our AI risk situation will be much cheaper to acquire. Indeed, it will almost certainly clarify the picture of our situation even for the leading experts in the field.
  • Make it easier to do AI risk research. A researcher hoping to do AI risk research is in much the same position as the interested reader hoping to gain a clearer picture of our AI risk situation. Most of the relevant material is scattered across hundreds of books, articles, blog posts, forum comments, mailing list messages, and personal conversations. And those presentations of the ideas leave "much to be desired in terms of clarity, completeness, concision, accessibility..." This makes it hard to do research, in big-picture conceptual ways, but also in small, annoying ways. What paper can you cite on Thing X and Thing Y? When the extant scholarly literature base is small, you can't cite the sources that other people have dug up already. You have to do all that digging yourself.

There are some benefits to the wiki structure in particular:

  • Some wiki articles can largely be ripped/paraphrased from existing papers like Chalmers (2010) and Muehlhauser & Salamon (2012).
  • Many wiki articles can be adapted to become journal articles, if they are seen as having much value. Probably, 1-3 wiki articles could be developed, then adapted and combined into a journal article and published, and then the original wiki article(s) could be published on the wiki (while citing the now-published journal article).
  • It's not an all-or-nothing project. Some value is gained by having some articles on the wiki, more value is gained by having more articles on the wiki.
  • There are robust programs and plugins for managing this kind of project (MediaWiki, Biblio, etc.)
  • Dozens or hundreds of people can contribute, though they will all be selected by editors. (SI's army of part-time remote researchers is already more than a dozen strong, each with different skills and areas of domain expertise.)



This would be a large project, and has significant costs. I'm still estimating the costs, but here are some ballpark numbers for a scholarly AI risk wiki containing all the example articles above:

  • 1,920 hours of SI staff time (80 hrs/week for 24 months). This comes out to about $48,000, depending on who is putting in these hours.
  • $384,000 paid to remote researchers and writers ($16,000/mo for 24 months; our remote researchers generally work part-time, and are relatively inexpensive).
  • $30,000 for wiki design, development, hosting costs


AI risk bibliography (draft)

4 lukeprog 07 April 2012 06:46PM

Generally, only sources including an extended analysis of AI risk are included, though there are some exceptions among the earliest sources. Listed sources discuss either the likelihood of AI risk or they discuss possible solutions. (This does not include most of the "machine ethics" literature, unless an article discusses machine ethics in the explicit context of artificial intelligence as an existential risk.)

Please let me know what i missed!


Butler, Samuel [Cellarius, pseud.]. 1863. Darwin among the machines. Christchurch Press, June 13. http: //www.nzetc.org/tm/scholarly/tei-ButFir-t1-g1-t1-g1-t4-body.html.

Good, Irving John. 1959. Speculations on perceptrons and other automata. Research Lecture, RC-115. IBM, Yorktown Heights, New York, June 2. http://domino.research.ibm.com/library/cyberdig.nsf/papers/58DC4EA36A143C218525785E00502E30/$File/rc115.pdf.

Good, Irving John. 1965. Speculations concerning the first ultraintelligent machine. In Advances in computers, ed. Franz L. Alt and Morris Rubinoff, 31–88. Vol. 6. New York: Academic Press. doi:10.1016/S0065-2458(08)60418-0.

Good, Irving John. 1970. Some future social repercussions of computers. International Journal of Environmental Studies 1 (1–4): 67–79. doi:10.1080/00207237008709398.

Versenyi, Laszlo. 1974. Can robots be moral? Ethics 84 (3): 248–259. http://www.jstor.org/stable/2379958.

Good, Irving John. 1982. Ethical machines. In Machine intelligence, ed. J. E. Hayes, Donald Michie, and Y.-H. Pao, 555–560. Vol. 10. Intelligent Systems: Practice and Perspective. Chichester: Ellis Horwood.

Minsky, Marvin. 1984. Afterword to Vernor Vinge’s novel, “True Names.” Oct. 1. http://web.media.mit.edu/~minsky/papers/TrueNames.Afterword.html (accessed Mar. 26, 2012).

Moravec, Hans P. 1988. Mind children: The future of robot and human intelligence. Cambridge, MA: Harvard University Press.

Crevier, Daniel. 1993. The silicon challengers in our future. Chap. 12 in AI: The tumultuous history of the search for artificial intelligence. New York: Basic Books.

Vinge, Vernor. 1993. The coming technological singularity: How to survive in the post-human era. In Vision- 21: Interdisciplinary science and engineering in the era of cyberspace, 11–22. NASA Conference Publication 10129. NASA Lewis Research Center. http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19940022855_1994022855.pdf.

Hanson, Robin. 1994. If uploads comes first: The crack of a future dawn. Extropy 6 (2). http://hanson.gmu.edu/uploads.html.

Bostrom, Nick. 1997. Predictions from philosophy? How philosophers could make themselves useful. Last modified September 19, 1998. http://www.nickbostrom.com/old/predict.html.

Warwick, Kevin. 1998. In the mind of the machine: Breakthrough in artificial intelligence. London: Arrow.

Moravec, Hans P. 1999. Robot: Mere machine to transcendent mind. New York: Oxford University Press.

Joy, Bill. 2000. Why the future doesn’t need us. Wired, Apr. http://www.wired.com/wired/archive/8.04/joy.html.

Yudkowsky, Eliezer. 2001. Creating friendly AI 1.0: The analysis and design of benevolent goal architectures. Singularity Institute for Artificial Intelligence, San Francisco, CA, June 15. http://singinst.org/upload/CFAI.html.

6, Perri [David Ashworth]. 2001. Ethics, regulation and the new artificial intelligence, part I: Accountability and power. Information, Communication & Society 4 (2): 199–229. doi:10.1080/713768525.

Hibbard, Bill. 2001. Super-intelligent machines. ACM SIGGRAPH Computer Graphics 35 (1): 13–15. http://www.siggraph.org/publications/newsletter/issues/v35/v35n1.pdf.

Bostrom, Nick. 2002. Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology 9. http://www.jetpress.org/volume9/risks.html.

Goertzel, Ben. 2002. Thoughts on AI morality. Dynamical Psychology. http://www.goertzel.org/dynapsyc/2002/AIMorality.htm.

Hibbard, Bill. 2002. Super-intelligent machines. New York: Kluwer Academic/Plenum Publishers.

Bostrom, Nick. 2003. Ethical issues in advanced artificial intelligence. In Cognitive, emotive and ethical aspects of decision making in humans and in artificial intelligence, ed. Iva Smit and George E. Lasker. Vol. 2. Windsor, ON: International Institute of Advanced Studies in Systems Research / Cybernetics.

Georges, Thomas M. 2003. Digital soul: Intelligent machines and human values. Boulder, CO: Westview Press.

Bostrom, Nick. 2004. The future of human evolution. In Two hundred years after Kant, fifty years after Turing, ed. Charles Tandy, 339–371. Vol. 2. Death and Anti-Death. Palo Alto, CA: Ria University Press.

Goertzel, Ben. 2004. Encouraging a positive transcension: Issues in transhumanist ethical philosophy. Dynamical Psychology. http://www.goertzel.org/dynapsyc/2004/PositiveTranscension.htm.

Goertzel, Ben. 2004. The all-seeing A(I): Universal mind simulation as a possible path to stably benevolent superhuman AI. Dynamical Psychology. http://www.goertzel.org/dynapsyc/2004/AllSeeingAI.htm.

Posner, Richard A. 2004. What are the catastrophic risks, and how catastrophic are they? Chap. 1 in Catastrophe: Risk and response. New York: Oxford University Press.

Yudkowsky, Eliezer. 2004. Coherent extrapolated volition. Singularity Institute for Artificial Intelligence, San Francisco, CA, May. http://singinst.org/upload/CEV.html.

de Garis, Hugo. 2005. The artilect war: Cosmists vs. terrans: A bitter controversy concerning whether humanity should build godlike massively intelligent machines. Palm Springs, CA: ETC Publications.

Hibbard, Bill. 2005. The ethics and politics of super-intelligent machines. Unpublished manuscript, July. Microsoft Word file, http://sites.google.com/site/whibbard/g/SI_ethics_politics.doc (accessed Apr. 3, 2012).

Kurzweil, Ray. 2005. The deeply intertwined promise and peril of GNR. Chap. 8 in The singularity is near: When humans transcend biology. New York: Viking.

Armstrong, Stuart. 2007. Chaining god: A qualitative approach to AI, trust and moral systems. Unpublished manuscript, Oct. 20. http://www.neweuropeancentury.org/GodAI.pdf (accessed Apr. 6, 2012).

Bugaj, Stephan Vladimir, and Ben Goertzel. 2007. Five ethical imperatives and their implications for human-AGI interaction. Dynamical Psychology. http://goertzel.org/dynapsyc/2007/Five_Ethical_Imperatives_svbedit.htm.

Dietrich, Eric. 2007. After the humans are gone. Philosophy Now, May/June. http://www.philosophynow.org/issues/61/After_The_Humans_Are_Gone.

Hall, John Storrs. 2007. Beyond AI: Creating the conscience of the machine. Amherst, NY: Prometheus Books.

Hall, John Storrs. 2007. Ethics for artificial intellects. In Nanoethics: The ethical and social implications of nanotechnology, ed. Fritz Allhoff, Patrick Lin, James Moor, John Weckert, and Mihail C. Roco, 339–352. Hoboken, N.J: John Wiley & Sons.

Hall, John Storrs. 2007. Self-improving AI: An analysis. Minds and Machines 17 (3): 249–259. doi:10.1007/s11023-007-9065-3.

Omohundro, Stephen M. 2007. The nature of self-improving artificial intelligence. Paper presented at the Singularity Summit 2007, San Francisco, CA, Sept. 8–9. http://singinst.org/summit2007/overview/abstracts/#omohundro.

Blake, Thomas, Bernd Carsten Stahl, and N. B. Fairweather. 2008. Robot ethics: Why “Friendly AI” won’t work. In Proceedings of the tenth international conference ETHICOMP 2008: Living, working and learning beyond technology, ed. Terrel Ward Bynum, Maria Carla Calzarossa, Ivo De Lotto, and Simon Rogerson. isbn: 9788890286995.

Hall, John Storrs. 2008. Engineering utopia. In Artificial general intelligence 2008: Proceedings of the first AGI conference, ed. Pei Wang, Ben Goertzel, and Stan Franklin, 460–467. Vol. 171. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press.

Hanson, Robin. 2008. Economics of the singularity. IEEE Spectrum 45 (6): 45–50. doi:10.1109/MSPEC.2008.4531461.

Omohundro, Stephen M. 2008. The basic AI drives. In Artificial general intelligence 2008: Proceedings of the first AGI conference, ed. Pei Wang, Ben Goertzel, and Stan Franklin, 483–492. Vol. 171. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press.

Yudkowsky, Eliezer. 2008. Artificial intelligence as a positive and negative factor in global risk. In Global catastrophic risks, ed. Nick Bostrom and Milan M. Ćirković, 308–345. New York: Oxford University Press.

Freeman, Tim. 2009. Using compassion and respect to motivate an artificial intelligence. Unpublished manuscript, Mar. 8. http://fungible.com/respect/paper.html (accessed Apr. 7, 2012).

Russell, Stuart J., and Peter Norvig. 2009. Philosophical foundations. Chap. 26 in Artificial intelligence: A modern approach, 3rd ed. Upper Saddle River, NJ: Prentice-Hall.

Shulman, Carl, and Stuart Armstrong. 2009. Arms races and intelligence explosions. Extended abstract. Singularity Institute for Artificial Intelligence, San Francisco, CA. http://singinst.org/armscontrolintelligenceexplosions.pdf.

Shulman, Carl, Henrik Jonsson, and Nick Tarleton. 2009. Machine ethics and superintelligence. In AP-CAP 2009: The fifth Asia-Pacific computing and philosophy conference, October 1st-2nd, University of Tokyo, Japan, proceedings, ed. Carson Reynolds and Alvaro Cassinelli, 95–97. AP-CAP 2009. http://ia-cap.org/ap-cap09/proceedings.pdf.

Sotala, Kaj. 2009. Evolved altruism, ethical complexity, anthropomorphic trust: Three factors misleading estimates of the safety of artificial general intelligence. Paper presented at the 7th European Conference on Computing and Philosophy (ECAP), Bellaterra, Spain, July 2–4.

Wallach, Wendell, and Colin Allen. 2009. Moral machines: Teaching robots right from wrong. New York: Oxford University Press. doi:10.1093/acprof:oso/9780195374049.001.0001.

Waser, Mark R. 2009. A safe ethical system for intelligent machines. In Biologically inspired cognitive architectures: Papers from the AAAI fall symposium, ed. Alexei V. Samsonovich, 194–199. Technical Report, FS- 09-01. AAAI Press, Menlo Park, CA. http://aaai.org/ocs/index.php/FSS/FSS09/paper/view/934.

Chalmers, David John. 2010. The singularity: A philosophical analysis. Journal of Consciousness Studies 17 (9–10): 7–65. http://www.ingentaconnect.com/content/imp/jcs/2010/00000017/f0020009/art00001.

Fox, Joshua, and Carl Shulman. 2010. Superintelligence does not imply benevolence. Paper presented at the 8th European Conference on Computing and Philosophy (ECAP), Munich, Germany, Oct. 4–6.

Goertzel, Ben. 2010. Coherent aggregated volition: A method for deriving goal system content for advanced, beneficial AGIs. The Multiverse According to Ben (blog). Mar. 12. http://multiverseaccordingtoben.blogspot.ca/2010/03/coherent-aggregated-volition-toward.html (accessed Apr. 4, 2012).

Goertzel, Ben. 2010. GOLEM: Toward an AGI meta-architecture enabling both goal preservation and radical self-improvement. Unpublished manuscript, May 2. http://goertzel.org/GOLEM.pdf (accessed Apr. 4, 2012).

Kaas, Steven, Steve Rayhawk, Anna Salamon, and Peter Salamon. 2010. Economic implications of software minds. Singularity Institute for Artificial Intelligence, San Francisco, CA, Aug. 10. http://www.singinst.co/upload/economic-implications.pdf.

McGinnis, John O. 2010. Accelerating AI. Northwestern University Law Review 104 (3): 1253–1270. http://www.law.northwestern.edu/lawreview/v104/n3/1253/LR104n3McGinnis.pdf.

Shulman, Carl. 2010. Omohundro’s “Basic AI Drives” and catastrophic risks. Singularity Institute for Artificial Intelligence, San Francisco, CA. http://singinst.org/upload/ai-resource-drives.pdf.

Shulman, Carl. 2010. Whole brain emulation and the evolution of superorganisms. Singularity Institute for Artificial Intelligence, San Francisco, CA. http://singinst.org/upload/WBE-superorganisms.pdf.

Sotala, Kaj. 2010. From mostly harmless to civilization-threatening: Pathways to dangerous artificial general intelligences. Paper presented at the 8th European Conference on Computing and Philosophy (ECAP), Munich, Germany, Oct. 4–6.

Tarleton, Nick. 2010. Coherent extrapolated volition: A meta-level approach to machine ethics. Singularity Institute for Artificial Intelligence, San Francisco, CA. http://singinst.org/upload/coherent-extrapolated-volition.pdf.

Waser, Mark R. 2010. Designing a safe motivational system for intelligent machines. In Artificial general intelligence: Proceedings of the third conference on artificial general intelligence, AGI 2010, Lugano, Switzerland, March 5–8, 2010, ed. Eric Baum, Marcus Hutter, and Emanuel Kitzelmann, 170–175. Vol. 10. Advances in Intelligent Systems Research. Amsterdam: Atlantis Press. doi:10.2991/agi.2010.21.

Dewey, Daniel. 2011. Learning what to value. In Artificial general intelligence: 4th international conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011. Proceedings, ed. Jürgen Schmidhuber, Kristinn R. Thórisson, and Moshe Looks, 309–314. Vol. 6830. Lecture Notes in Computer Science. Berlin: Springer. doi:10.1007/978-3-642-22887-2_35.

Hall, John Storrs. 2011. Ethics for self-improving machines. In Machine ethics, ed. Michael Anderson and Susan Leigh Anderson, 512–523. New York: Cambridge University Press.

Muehlhauser, Luke. 2011. So you want to save the world. Last modified March 2, 2012. http://lukeprog.com/SaveTheWorld.html.

Muehlhauser, Luke. 2011. The singularity FAQ. Singularity Institute for Artificial Intelligence. http://singinst.org/singularityfaq (accessed Mar. 27, 2012).

Waser, Mark R. 2011. Rational universal benevolence: Simpler, safer, and wiser than “Friendly AI.” In Artificial general intelligence: 4th international conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011. Proceedings, ed. Jürgen Schmidhuber, Kristinn R. Thórisson, and Moshe Looks, 153–162. Vol. 6830. Lecture Notes in Computer Science. Berlin: Springer. doi:10.1007/978-3-642-22887-2_16.

Yudkowsky, Eliezer. 2011. Complex value systems in friendly AI. In Artificial general intelligence: 4th international conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011. Proceedings, ed. Jürgen Schmidhuber, Kristinn R. Thórisson, and Moshe Looks, 388–393. Vol. 6830. Lecture Notes in Computer Science. Berlin: Springer. doi:10.1007/978-3-642-22887-2_48.

Berglas, Anthony. 2012. Artificial intelligence will kill our grandchildren (singularity). Draft 9. Jan. http://berglas.org/Articles/AIKillGrandchildren/AIKillGrandchildren.html (accessed Apr. 6, 2012).

Goertzel, Ben. 2012. Should humanity build a global AI nanny to delay the singularity until it’s better understood? Journal of Consciousness Studies 19 (1–2): 96–111. http://ingentaconnect.com/content/imp/jcs/2012/00000019/F0020001/art00006.

Hanson, Robin. 2012. Meet the new conflict, same as the old conflict. Journal of Consciousness Studies 19 (1–2): 119–125. http://www.ingentaconnect.com/content/imp/jcs/2012/00000019/F0020001/art00008.

Tipler, Frank. 2012. Inevitable existence and inevitable goodness of the singularity. Journal of Consciousness Studies 19 (1–2): 183–193. http://www.ingentaconnect.com/content/imp/jcs/2012/00000019/F0020001/art00013.

Yampolskiy, Roman V. 2012. Leakproofing the singularity: artificial intelligence confinement problem. Journal of Consciousness Studies 2012 (1–2): 194–214. http://www.ingentaconnect.com/content/imp/jcs/2012/00000019/F0020001/art00014.

Armstrong, Stuart, Anders Sandberg, and Nick Bostrom. Forthcoming. Thinking inside the box: Using and controlling an Oracle AI. Minds and Machines.

Bostrom, Nick. Forthcoming. The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines. Preprint at, http://www.nickbostrom.com/superintelligentwill.pdf.

Bostrom, Nick, and Eliezer Yudkowsky. Forthcoming. The ethics of artificial intelligence. In Cambridge handbook of artificial intelligence, ed. Keith Frankish and William Ramsey. New York: Cambridge University Press.

Hanson, Robin. Forthcoming. Economic growth given machine intelligence. Journal of Artificial Intelligence Research.

Muehlhauser, Luke, and Louie Helm. Forthcoming. The singularity and machine ethics. In The singularity hypothesis: A scientific and philosophical assessment, ed. Amnon Eden, Johnny Søraker, James H. Moor, and Eric Steinhart. Berlin: Springer.

Muehlhauser, Luke, and Anna Salamon. Forthcoming. Intelligence explosion: Evidence and import. In The singularity hypothesis: A scientific and philosophical assessment, ed. Amnon Eden, Johnny Søraker, James H. Moor, and Eric Steinhart. Berlin: Springer.

Sotala, Kaj. Forthcoming. Advantages of artificial intelligences, uploads, and digital minds. International Journal of Machine Consciousness 4.

Omohundro, Stephen M. Forthcoming. Rationally-shaped artificial intelligence. In The singularity hypothesis: A scientific and philosophical assessment, ed. Amnon Eden, Johnny Søraker, James H. Moor, and Eric Steinhart. Berlin: Springer.

Yampolskiy, Roman V., and Joshua Fox. Forthcoming. Artificial general intelligence and the human mental model. In The singularity hypothesis: A scientific and philosophical assessment, ed. Amnon Eden, Johnny Søraker, James H. Moor, and Eric Steinhart. Berlin: Springer.

Yampolskiy, Roman V., and Joshua Fox. Forthcoming. Safety engineering for artificial general intelligence. Topoi.

AI Risk & Opportunity: Strategic Analysis Via Probability Tree

11 lukeprog 07 April 2012 08:25AM

Part of the series AI Risk and Opportunity: A Strategic Analysis.

(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)

There are many approaches to strategic analysis (Bishop et al. 2007). Though a morphological analysis (Ritchey 2006) could model our situation in more detail, the present analysis uses a simple probability tree (Harshbarger & Reynolds 2008, sec. 7.4) to model potential events and interventions.


A very simple tree

In our initial attempt, the first disjunction concerns which of several (mutually exclusive and exhaustive) transformative events comes first:


  • "FAI" = Friendly AI.
  • "uFAI" = UnFriendly AI, not including uFAI developed with insights from WBE.
  • "WBE" = Whole brain emulation.
  • "Doom" = Human extinction, including simulation shutdown and extinction due to uFAI striking us from beyond our solar system.
  • "Other" = None of the above four events occur in our solar system, perhaps due to stable global totalitarianism or for unforeseen reasons.


Our probability tree begins simply:

Each circle is a chance node, which represents a random variable. The leftmost chance node above represents the variable of whether FAI, uFAI, WBE, Doom, or Other will come first. The rightmost chance nodes are open to further disjunctions: the random variables they represent will be revealed as we continue to develop the probability tree.

Each left-facing triangle is a terminal node, which for us serves the same function as a utility node in a Bayesian decision network. The only utility node in the tree above assigns a utility of 0 (bad!) to the Doom outcome.

Each branch in the tree is assigned a probability. For the purposes of illustration, the above tree assigns .01 probability to FAI coming first, .52 probability to uFAI coming first, .07 probability to WBE coming first, .35 to Doom coming first, and .05 to Other coming first.


How the tree could be expanded

The simple tree above could be expanded "downstream" by adding additional branches:

We could also make the probability tree more actionable by trying to estimate the probability of desirable and undesirable outcomes given certain that certain shorter-term goals are met. In the example below, "private push" means that a non-state actor passionate about safety invests $30 billion or more into developing WBE technology within 30 years from today. Perhaps there's a small chance this safety-conscious actor could get to WBE before state actors, upload FAI researchers, and have them figure out FAI before uFAI is created.

We could also expand the tree "upstream" by making the first disjunction be not concerned with our five options for what comes first but instead with a series of disjunctions that feed into which option will come first.

We could add hundreds or thousands of nodes to our probability tree, and then use the software to test for how much the outcomes change when particular inputs are changed, and learn what things we can do now to most increase our chances of a desirable outcome, given our current model.

We would also need to decide which "endgame scenarios" we want to include as possible terminals, and the utility of each. These choices may be complicated by our beliefs about multiverses and simulations.

However, decision trees become enormously large and complex very quickly as you add more variables. If we had the resources for a more complicated model, we'd probably want to use influence diagrams instead (Howard & Matheson 2005), e.g. one built in Analytica, like the ICAM climate change model. Of course, one must always worry that one's model is internally consistent but disconnected from the real world (Kay 2012).



AI Risk & Opportunity: Questions We Want Answered

7 lukeprog 01 April 2012 07:19PM

Part of the series AI Risk and Opportunity: A Strategic Analysis.

(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)

This post provides a list of questions about AI risk strategy — questions we want answered. Please suggest additional questions (a paragraph of explanation is preferred but not necessary); I may add them to the list. You can submit questions anonymously here.

Also, please identify which 3-5 of these questions you think are low-hanging fruit for productive strategic analysis on Less Wrong.

The list is in no particular order, but question numbers will remain unchanged (so that you can reliably refer to questions by their number):

  1. What methods can we use to predict technological development? We don't yet have reliable methods for long-term technological forecasting. But not all methods have been examined yet. Perhaps technology futures have a good track record. Perhaps we could look at historical technological predictions and see if there is any pattern in the data suggesting that certain character traits and contexts lend themselves to accurate technological predictions. Perhaps there are creative solutions we haven't thought of yet.

  2. Which kinds of differential technological development should we encourage, and how? Should we "push" on WBE, or not? Are some kinds of AI research risk-reducing, and other kinds risk-increasing? How can we achieve such effects, if they are desired?

  3. Which open problems are safe to discuss, and which are potentially dangerous? AI risk research may itself produce risk in some cases, in the form of information hazards (Bostrom 2011). Is it safe to discuss decision theories? Acausal trade? Certain kinds of strategic questions, for example involving government intervention?

  4. What can we do to reduce the risk of an AI arms race?

  5. What can we do to raise the "sanity waterline," and how much will this help?

  6. What can we do to attract more funding, support, and research to x-risk reduction and to the specific sub-problems of successful Singularity navigation?

  7. Which interventions should we prioritize?

  8. How should x-risk reducers and AI safety researchers interact with governments and corporations? Does Drexler's interaction with the U.S. government regarding molecular nanotechnology provide any lessons for how AI risk researchers should act?

  9. How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?

  10. How does AI risk compare to other existential risks?

  11. Which problems do we need to solve, and which ones can we have an AI solve?

  12. How can we develop microeconomic models of WBEs and self-improving systems?

  13. How can we be sure a Friendly AI development team will be altruistic?

  14. How hard is it to create Friendly AI?

  15. What is the strength of feedback from neuroscience to AI rather than brain emulation?

  16. Is there a safe way to do uploads, where they don't turn into neuromorphic AI?

  17. How much must we spend on security when developing a Friendly AI team?

  18. What's the best way to recruit talent toward working on AI risks?

  19. How difficult is stabilizing the world so we can work on Friendly AI slowly?

  20. How hard will a takeoff be? To what degree is "intelligence" (as efficient cross-domain optimization) a matter of content vs. algorithms? How much does takeoff depend on slow, real-world experiments?

  21. What is the value of strategy vs. object-level progress toward a positive Singularity?

  22. What different kinds of Oracle AI are there, and are any of them both safe and feasible?

  23. How much should we be worried about "metacomputational hazards"? E.g. should we worry about nonperson predicates? Oracle AIs engaging in self-fulfilling prophecies? Acausal hijacking?

  24. What improvements can we make to the way we go about answering strategy questions? Wei Dai's notes on this question: "For example, should we differentiate between "strategic insights" (such as Carl Shulman's insight that WBE-based Singletons may be feasible) and "keeping track of the big picture" (forming the overall strategy and updating it based on new insights and evidence), and aim to have people specialize in each, so that people deciding strategy won't be tempted to overweigh their own insights? Another example: is there a better way to combine probability estimates from multiple people?"

  25. How do people in other fields answer strategy questions? Wei Dai's notes on this question: "Is there such a thing as a science or art of strategy that we can copy from (and perhaps improve upon with ideas from x-rationality)?"

[more questions to come, as they are posted to the comments section]

AI Risk & Opportunity: A Timeline of Early Ideas and Arguments

4 lukeprog 31 March 2012 02:34PM

Part of the series AI Risk and Opportunity: A Strategic Analysis.

(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)

Building on the previous post on AI risk history, this post provides an incomplete timeline (up to 1993) of significant novel ideas and arguments related to AI as a potential catastrophic risk. I do not include ideas and arguments concerning only, for example, the possibility of AI (Turing 1950) or attempts to predict its arrival (Bostrom 1998).

As is usually the case, we find that when we look closely at a cluster of ideas, it turns out these ideas did not appear all at once in the minds of a Few Great Men. Instead, they grew and mutated and gave birth to new ideas gradually as they passed from mind to mind over the course of many decades.


1863: Machine intelligence as an existential risk to humanity; relinquishment of machine technology recommended. Samuel Butler in Darwin among the machines worries that as we build increasingly sophisticated and autonomous machines, they will achieve greater capability than humans and replace humans as the dominant agents on the planet:

...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race... the time will come when the machines will hold the real supremacy over the world and its inhabitants...

Our opinion is that war to the death should be instantly proclaimed against them. Every machine of every sort should be destroyed by the well-wisher of his species. Let there be no exceptions made, no quarter shown...

(See also Butler 1872; Campbell 1932.)

1921: Robots as an existential risk. The Czech play R.U.R. by Karel Capek tells the story of robots which grow in power and intelligence and destroy the entire human race (except for a single survivor).

1947: Fragility & complexity of human values (in the context of machine goal systems); perverse instantiation. Jack Williamson's novelette With Folded Hands (1947) tells the story of a race of machines that, in order to follow the Prime Directive: "to serve and obey and guard men from harm." To obey this rule, the machines interfere with every aspect of human life, and humans who resist are lobotomized. Due to the fragility and complexity of human values (Yudkowsky 2008; Muehlhauser and Helm 2012), the machines' rules of behavior had unintended consequences, manifesting a "perverse instantiation" in the language of Bostrom (forthcoming).

(Also see Asimov 1950, 1957, 1983; Versenyi 1974Minsky 1984; Yudkowsky 2001, 2011.)

1948-1949: Precursor idea to intelligence explosion. Von Neumann (1948) wrote:

...“complication" on its lower levels is probably degenerative, that is, that every automaton that can produce other automata will only be able to produce less complicated ones. There is, however, a certain minimum level where this degenerative characteristic ceases to be universal. At this point automata which can reproduce themselves, or even construct higher entities, become possible.


Von Nuemann (1949) came very close to articulating the idea of intelligence explosion:

There is thus this completely decisive property of complexity, that there exists a critical size below which the process of synthesis is degenerative, but above which the phenomenon of synthesis, if properly arranged, can become explosive, in other words, where syntheses of automata can proceed in such a manner that each automaton will produce other automata which are more complex and of higher potentialities than itself.

1951: Potentially rapid transition from machine intelligence to machine takeover. Turing (1951) described ways that intelligent computers might learn and improve their capabilities, concluding that:

...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers... At some stage therefore we should have to expect the machines to take control...

1959: Intelligence explosion; the need for human-friendly goals for machine superintelligence. Good (1959) describes what he later (1965) called an "intelligence explosion," a particular mechanism for rapid transition from artificial general intelligence to dangerous machine takeover:

Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an "explosion" will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.

(Also see Good 19621965, 1970; Vinge 1992, 1993; Yudkowsky 2008.)

1966: A military arms race for machine superintelligence could accelerate machine takeover; convergence toward a singleton is likely. Dennis Feltham Jones' 1966 novel Colossus depicted what may be a particularly likely scenario: two world superpowers (the USA and USSR) are in an arms race to develop superintelligent computers, one of which self-improves enough to take control of the planet.

In the same year, Cade (1966) argued the same thing:


political leaders on Earth will slowly come to realize... that intelligent machines having superhuman thinking ability can be built. The construction of such machines, even taking into account all the latest developments in computer technology, would call for a major national effort. It is only to be expected that any nation which did put forth the financial and physical effort needed to build and programme such a machine, would also attempt to utilize it to its maximum capacity, which implies that it would be used to make major decisions of national policy. Here is where the awful dilemma arises. Any restriction to the range of data supplied to the machine would limit its ability to make effective political and economic decisions, yet if no such restrictions are placed upon the machine's command of information, then the entire control of the nation would virtually be surrendered to the judgment of the robot.

On the other hand, any major nation which was led by a superior, unemotional intelligence of any kind, would quickly rise to a position of world domination. This by itself is sufficient to guarantee that, sooner or later, the effort to build such an intelligence will be made — if not in the Western world, then elsewhere, where people are more accustomed to iron dictatorships.

...It seems that, in the forseeable future, the major nations of the world will have to face the alternative of surrendering national control to mechanical ministers, or being dominated by other nations which have already done this. Such a process will eventually lead to the domination of the whole Earth by a dictatorship of an unparalleled type — a single supreme central authority.


(This last paragraph also argues for convergence toward what Bostrom later called a "singleton.")

(Also see Ellison 1967.)

1970: Proposal for an association that analyzes the implications of machine superintelligence; naive control solutions like "switch off the power" may not work because the superintelligence will outsmart us, thus we must focus on its motivations; possibility of "pointless" optimization by machine superintelligence. Good (1970) argues:

Even if the chance that the ultraintelligent machine will be available [soon] is small, the repercussions would be so enormous, good or bad, that it is not too early to entertain the possibility. In any case by 1980 I hope that the implications and the safeguards will have been thoroughly discussed, and this is my main reason for airing the matter: an association for considering it should be started.

(Also see Bostrom 1997.)

On the idea that naive control solutions like "switch off the power" may not work because the superintelligence will find a way to outsmart us, and thus we must focus our efforts on the superintelligence's motivations, Good writes:

Some people have suggested that in order to prevent the [ultraintelligent machine] from taking over we should be ready to switch of its power supply. But it is not as simple as that because the machine could recommend the appointment of its own operators, it could recommend that they be paid well and it could select older men who would not be worried about losing their jobs. Then it could replace its operators by robots in order to make sure that it is not switched off. Next it could have the neo-Luddites ridiculed by calling them Ludditeniks, and if necessary it would later have them imprisoned or executed. This shows how careful we must be to keep our eye on the "motivation" of the machines, if possible, just as we should with politicians.

(Also see Yudkowsky 2008.)

Good also outlines one possibility for "pointless" goal-optimization by machine superintelligence:

If the machines took over and men became redundant and ultimately extinct, the society of machines would continue in a complex and interesting manner, but it would all apparently be pointless because there would be no one there to be interested. If machines cannot be conscious there would be only a zombie world. This would perhaps not be as bad as in many human societies where most people have lived in misery and degradation while a few have lived in pomp and luxury. It seems to me that the utility of such societies has been negative (while in the condition described) whereas the utility of a zombie society would be zero and hence preferable.

(Also see Bostrom 2004; Yudkowsky 2008.)

1974: We can't much predict what will happen after the creation of machine superintelligence. Julius Lukasiewicz (1974) writes:

The survival of man may depend on the early construction of an ultraintelligent machine-or the ultraintelligent machine may take over and render the human race redundant or develop another form of life. The prospect that a merely intelligent man could ever attempt to predict the impact of an ultraintelligent device is of course unlikely but the temptation to speculate seems irresistible.

(Also see Vinge 1993.)

1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.

1982: To design ethical machine superintelligence, we may need to design superintelligence first and then ask it to solve philosophical problems (e.g. including ethics).

Good (1982) writes:

Unfortunately, after 2500 years, the philosophical problems are nowhere near solution. Do we need to solve these philosophical problems before we can design an adequate ethical machine, or is there another approach? One approach that cannot be ruled out is first to produce an ultra-intelligent machine and then ask it to solve philosophical problems.

1988: Even though AI poses an existential threat, we may need to rush toward it so we can use it to mitigate other existential threats. Moravec (1988, p. 100-101) writes:

...intelligent machines... threaten our existence... Machines merely as clever as human beings will have enormous advantages in competitive situations... So why rush headlong into an era of intelligent machines? The answer, I believe, is that we have very little choice, if our culture is to remain viable... The universe is one random event after another. Sooner or later an unstoppable virus deadly to humans will evolve, or a major asteroid will collide with the earth, or the sun will expand, or we will be invaded from the stars, or a black hole will swallow the galaxy. The bigger, more diverse, and competent a culture is, the better it can detect and deal with external dangers. The larger events happen less frequently. By growing rapidly enough, a culture has a finite chance of surviving forever.

1993: Physical confinement is unlikely to constrain superintelligences, for superintelligences will outsmart us. Vinge (1993) writes:

I argue that confinement [of superintelligent machines] is intrinsically impractical. For the case of physical confinement: Imagine yourself confined to your house with only limited data access to the outside, to your masters. If those masters thought at a rate — say — one million times slower than you, there is little doubt that over a period of years (your time) you could come up with "helpful advice" that would incidentally set you free...

After 1993. The extropians mailing list was launched in 1991, and was home to hundreds of discussions in which many important new ideas were proposed — ideas later developed in the public writings of Bostrom, Yudkowsky, Goertzel, and others. Unfortunately, the discussions from before 1998 were private, by agreement among subscribers. The early years of the archive cannot be made public without getting permission from everyone involved — a nearly impossible task. I have, however, collected all posts I could find from 1998 onward and uploaded them here (link fixed 04-03-2012).

I will end this post here. Perhaps in a future post I will extend the timeline past 1993, when interest in the subject became greater and thus the number of new ideas generated per decade rapidly increased.


Journals that may publish articles on AI risk

12 lukeprog 28 March 2012 06:47AM

http://tinyurl.com/AI-risk-journals is a continuously updated spreadsheet of journals that may accept articles related to AI risk.

Maintained with help from Jonathan Wang.

Call for Papers on AI/robot safety

6 lukeprog 27 March 2012 06:31PM

The open-access Journal of Robotics has posted the Call for Papers for an upcoming "Special Issue on Robotic Safety & Security."

One of the guest editors for this issue is Roman Yampolskiy, a past SI visiting fellow and author or co-author of several papers on AI risk: Leakproofing the Singularity, Artificial General Intelligence and the Human Mental Model, and Safety Engineering for Artificial General Intelligence.

Check the PDF for full details, but:

  • Manuscripts due June 8, 2012
  • Reviews on August 31, 2012
  • Will be published October 26, 2012
  • Read the author guidelines

Because Journal of Robotics is an open access journal, it charges an "article processing fee" of $500 to cover its costs (details). You are only charged if your submissions is accepted and printed by the journal.

Update: The Singularity Institute will reimburse you for your article processing fee if we think the article you're submitting is worthwhile. Contact luke [at] singularity.org for details.


Facing the Intelligence Explosion discussion page

20 lukeprog 26 November 2011 08:05AM

I've created a new website for my ebook Facing the Intelligence Explosion:


Sometime this century, machines will surpass human levels of intelligence and ability, and the human era will be over. This will be the most important event in Earth’s history, and navigating it wisely may be the most important thing we can ever do.

Luminaries from Alan Turing and Jack Good to Bill Joy and Stephen Hawking have warned us about this. Why do I think they’re right, and what can we do about it?

Facing the Intelligence Explosion is my attempt to answer those questions.



This page is the dedicated discussion page for Facing the Intelligence Explosion.

If you'd like to comment on a particular chapter, please give the chapter name at top of your comment so that others can more easily understand your comment. For example:

Re: From Skepticism to Technical Rationality

Here, Luke neglects to mention that...

View more: Next