FHI has released a new tech report:

Armstrong, Bostrom, and Shulman. Racing to the Precipice: a Model of Artificial Intelligence Development.

Abstract:

This paper presents a simple model of an AI arms race, where several development teams race to build the first AI. Under the assumption that the first AI will be very powerful and transformative, each team is incentivized to finish first — by skimping on safety precautions if need be. This paper presents the Nash equilibrium of this process, where each team takes the correct amount of safety precautions in the arms race. Having extra development teams and extra enmity between teams can increase the danger of an AI-disaster, especially if risk taking is more important than skill in developing the AI. Surprisingly, information also increases the risks: the more teams know about each others’ capabilities (and about their own), the more the danger increases.

The paper is short and readable; discuss it here!

But my main reason for posting is to ask this question: What is the most similar work that you know of? I'd expect people to do this kind of thing for modeling nuclear security risks, and maybe other things, but I don't happen to know of other analyses like this.

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 9:26 AM

But my main reason for posting is to ask this question: What is the most similar work that you know of?

It's not tremendously similar, but for some reason I thought of the Diamond-Dybvig model of bank runs as a (distant) analogy. It has multiple equilibria: everyone might take money in & out of the bank as usual, or a bank run might kick off. The AI risk equivalent, I guess, would be a model where either every development team exercises optimal caution (whatever that would be), or every team rushes to be first. That said, I don't know whether any realistic-ish model of AI development would have those particular equilibria.

As for the FHI paper, I'm glad its abstract mentions the model's prediction that more information can increase the risk. That's a cute result.

I wonder what'd happen in a model that incorporates time passing over multiple rounds. The teams' decisions in each round could expose information about their judgements of capabilities & risks. Might lead to an intractable model, though.

[-][anonymous]10y10

My real concern would be the lack of understanding of what precautions are required, and how they were implemented.

If a corporation decided to enter the race for a true AI, then it wouldn’t be surprising if they got the AI researchers to work with zero safeguards while reassuring them that a separate team was managing the risk. This external team may well have a fantastic emergency protocol with all sorts of remote kill switches at power outlets, etc but if if a true AI was developed there is no guarantee it could be controlled by such external measures.

I just don’t believe that a corporation undertaking a project like this would understand the risk of this, or they would mistakenly believe they had the risk under complete control.

People have predicted that corporations will be amoral, ruthless psychopaths too. This is what you get when you leave things like reputations out of your models.

Skimping on safety features can save you money. However, a reputation for privacy breaches, security problems and accidents doesn't do you much good. Why model the first effect while ignoring the second one? Oh yes: the axe that needs grinding.

Reputational concerns apply to psychopaths too, and that's why not all of them turn violent. However it doesn't prevent all of them from turning violent.

The point I was trying to make was more along the lines that choosing which parameters to model allows you to control the outcome you get. Those who want to recruit people to causes associated with preventing the coming robot apocalypse can selectively include competitive factors, and ignore factors leading to cooperation - in order to obtain their desired outcome.

Today, machines are instrumental in killing lots of people, but many of them also have features like air bags and bumpers, which show that the manufacturers and their customers are interested in safety features - and not just retail costs. Skipping safety features has disadvantages - as well as advantages - to the manufacturers involved.

There are techniques for managing reputation, and those techniques are also amoral. For example, a powerful psychopath caring about his reputation may use legal threats and/or assassination against people who want to report about his evil acts. Alternatively, he may spread false rumors about his competitors. He may pay or manipulate people to create a positive image of him.

Just because the reputation is used, it does not guarantee the results will be moral.

A salient example is many cover-ups USSR was involved in, they even attempted to cover up Chernobyl (there was no internal news about it for 3 days, even though people were needlessly exposed to radiation)

Aha. That is the reason they failed with Skynet.

OK. Joke aside. From the paper (it is really short) I see that for the safest case (two teams none aware of the other or of their capability and their capability is higher than their enmity) the risk is 0. But this is due to a simplification and only a first order approximation.

Given that we might structure AI development such that AI research must be registered and no communication not thru the AI authority is allowed (OK, that might be circumvented but at least reduces risk) then we may arrive at the zero case above.

But it is not really zero. I'm interested in the exact value as that might still be too high.

Note that I think that the capability e will most likely exceed the enmity $\mu$ because the risk of AI failure is so high.

[-][anonymous]10y-20

It'll be nice to have this to point to whenever people complain about MIRI being too secretive.

[This comment is no longer endorsed by its author]Reply