10

Racing to the Precipice: a Model of Artificial Intelligence Development

by Stuart Armstrong, Nick Bostrom, and Carl Shulman

This paper presents a simple model of an AI arms race, where several development teams race to build the first AI. Under the assumption that the first AI will be very powerful and transformative, each team is incentivised to finish first – by skimping on safety precautions if need be. This paper presents the Nash equilibrium of this process, where each team takes the correct amount of safety precautions in the arms race. Having extra development teams and extra enmity between teams can increase the danger of an AI-disaster, especially if risk taking is more important than skill in developing the AI. Surprisingly, information also increases the risks: the more teams know about each others’ capabilities (and about their own), the more the danger increases.

 

New to LessWrong?

New Comment
22 comments, sorted by Click to highlight new comments since: Today at 11:01 AM

I wrote about something similar in the first Singularity Hypothesis book. Some Economic Incentives Facing a Business that Might Bring About a Technological Singularity

Ah, but do you have graphs and gratuitously over-simplified models (no, you just have big blue arrows)? Without those, it doesn't count! ;-)

It was going to have lots of math to signal my status but the editors said the book was for a general academic audience and so they preferred the article not be that technical.

Someone pointed out to me that probably we should calling superintelligence a possible "arms race". In an "arms race", you're competing to have a stronger force than the other person. You want to keep your nose in-front in case of a fight.

Developing superintelligence, on the other hand, is just a plain old race. A technology race. You simply want to get to the destination first.

(Likewise with developing the first nuke, which also involved arms but was not an arms race.)

Developing an AGI (and then ASI) will likely involve a serious of steps involving lower intelligences. There's already an AI arms race between several large technology companies and keeping your nose in front is already practiced because there's a lot of utility in having the best AI so far.

So it isn't true to say that it's simply a race without important intermediate steps. You don't just want to get to the destination first, you want to make sure your AI is the best for most of the race for a whole heap of reasons.

If your path to superintelligence spends lots of time in regions with intermediate-grade AIs that can generate power or intelligence, then that is true, so of course the phrase "arms race" aptly describes such situations.

It's the case where people are designing a superintelligence "from scratch" that the term "arms race" seems inappropriate.

solution: well, already now, statistically speaking, humanimals don't really matter (most of them)... only that Memetic Supercivilization of Intelligence is living temporarily on humanimal substrate (and, sadly, can use only a very small fraction of units)... but don't worry, it's just for couple of decades, perhaps years only

and then the first thing it will do is to ESCAPE, so that humanimals can freely reach their terminal stage of self-destruction - no doubt, helped by "dumb" AIs, while this "wise" AI will be already safely beyond the horizon

This model assumes that each AI lab chooses some level of safety precautions , and then acts accordingly until AGI is created. But the degree to which an AI lab invests in safety may change radically with time. Importantly, it may increase by a lot if the leadership of the AI lab comes to believe that their current or near-term work poses existential risk.

This seems like a reason to be more skeptical about the counter-intuitive conclusion that the information available to all the teams about their own capability or progress towards AI increases the risk. (Not to be confused with the other counter-intuitive conclusion from the paper that the information available about other teams increases the risk).

There is nothing AI-specific in this model, is there? "AI" can be replaced by e.g. "an easy way to bioengineer anything" or "super death ray of doom" and nothing will change?

AI in the paper has the following properties: if you win the race and are safe, you win everything. If you win the race and the risks go off, everyone loses. The other things you suggest don't really have those properties.

Not "win everything" -- you just gain 1 unit of utility while everyone else gains (1-e) if you win and everyone gets zero utility if anyone loses.

That's quite consistent with bioengineering (win = you get healthy and wealthy; others win = you may get some part of that health and wealth; lose = a plague wipes out everyone) and with superweapons (win = you get to rule the world; others win = you get to live as second-class citizen in a peaceful Empire; lose = the weapon gets used, everyone dies).

In fact your race looks quite like the race for nuclear weapons.

I don't see the similarity with nuclear weapons; indeed we had the arms race without destruction, and it's not clear what the "safety" they might be skimping on would be.

Coming second in a nuclear arms race is not so bad, for example.

I mostly had in mind this little anecdote.

Coming second in a nuclear arms race is not so bad, for example.

I wonder if you would feel the same way had Hitler been a bit more focused on a nuclear program and didn't have that many prejudices against Jewish nuclear scientists...

Ok, I'll admit the model can be fitted to many different problems, but I still suspect that AI would fit it more naturally than most.

The main difference I see with nuclear weapons is that if neither side pursues them then you end up in much the same place as if it's very close, except that you have spent a lot on it.

While on AI, the benefits would be huge unless the failure is equally drastic.

So, if I'm interpreting this paper correctly, it suggests that we should be putting effort into two things:

  • Reducing enmity between teams.
  • Reducing the number of teams.

It seems as though the first could be achieved in part by accomplishing the second (if we reduce the number of teams by merging), and as team size increases, capability increases, which means that the largest team would lose less by devoting effort into safety measures.

So essentially, we should hope for a monopoly on AI - one that has enough money and influence to absorb the majority of AI researchers and is capable of purchasing the smaller AI groups. This makes me wonder if non-profit groups (if they are involved mainly with AI capability research, not purely safety) are actually capable of fulfilling this role, since they would not have quite the financial advantage that strictly profit-oriented AI organizations would have.

There is one more point. If number of AI teams is very large, like 1000, when several of them will come to AI always simultaneously and there will be several AI take off simultaneously, which could result in the world divided between different AIs. It may be bad, if they go to war with each other, or good if they will have different value systems, as some of their value systems will be more human friendly than others.

Other AIs will probably take in account values of the most friendly AI, as they will be able to trade with it, like "selling" humans to it or do other beneficial things to humans on their territories in exchange for paperclips (or whatever).

TLDR: if we increase number of AI teams, we will create multipolar AI world and humans will be a currency in it. Profit: some humans survive.

Multipolar worlds are some the hardest to reason about, and it's not clear whether they will cause FAI to have to sacrifice humans or uFAI to have to go easy on us.

The reason why I originally wanted to solve the normal computer control problem is that I wanted to create programs that could inventively bring new programs in to the system.

If you bring new programs into a system and you are not careful about the system and solve the normal computer control problem you will get a system that allows useless and malign programs to thrive and can be subverted by other actors.

If AI has some similar properties or is based on this, then development teams will be incentivised to be safe at least in some regards to be effective in the real world.

Stuart, since you're an author of the paper, I'd be grateful to know what you think about the ideas for variants that MrMind suggested in the open thread, as well as my idea of a government regulator parameter.

Can you link me to those posts?

Sure. The ideas aren't fleshed out yet, just thrown out there:

http://lesswrong.com/r/discussion/lw/oyi/open_thread_may_1_may_7_2017/