The Friendly AI Game

bentarm

At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.

Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.

There are three reasons I suggest playing this game. In descending order of importance, they are:

It sounds like fun
It might help to convince people that the Friendly AI problem is hard(*).
We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try

DISCLAIMER (probably unnecessary, given the audience) - I think it is unlikely that anyone will manage to come up with a formally stated utility function for which none of us can figure out a way in which it could go hideously wrong. However, if they do so, this does NOT constitute a proof of Friendliness and I 100% do not endorse any attempt to implement an AI with said utility function.

(*) I'm slightly worried that it might have the opposite effect, as people build more and more complicated conjunctions of desires to overcome the objections that we've already seen, and start to think the problem comes down to nothing more than writing a long list of special cases but, on balance, I think that's likely to have less of an effect than just seeing how naive suggestions for Friendliness can be hideously broken.

There are three reasons I suggest playing this game. In descending order of importance, they are:

It sounds like fun
It might help to convince people that the Friendly AI problem is hard(*).
We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try

I disagree with all of these four claims

They know each other, and so can predict each other's CEV better than that of the whole of humanity

I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it's not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors. Unless we're talking uploaded personalities, which is a whole different discussion.

They can explicitly trade utility with each other and encode compromises into the utility function (so that it won't be a pure CEV)

So you want hard-coded compromises that opposes and overrides what these people would collectively prefer to do if they were more intelligent, more competent and more self-aware?

I don't think that's a good idea at all.

The fact they were in this project together indicates a certain commonality of interests and ideas, and may serve to exclude memes that AI-builders would likely consider dangerous (e.g., fundamentalist religion)

Do you believe that fundamentalist religion would exist if fundamentalist religionists believed that their religion was false, and were also completely self-aware? Why do you think a CEV (which essentially means what people would want if they were as intelligent as the AI) would support a dangerous meme?

They have had the opportunity of excluding people they don't like from participating in the project to begin with

I don't think that the 9999 first contributors get to vote on whether they'll accept a donation from the 10,000th one. And unless you believe these 10,000 people can create and defend their own country BEFORE the AI gets created, I'd urge not being vocal about them excluding everyone else, when developments in AI become close enough that the whole world starts paying serious attention.

Also, Putin and Ahmadinejad are much more likely than the average human to influence the first AI's utility function, simply because they have a lot of money and power.

That's why CEV is far better than CEV.

I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it's not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors.

The programmers want the AI to calculate CEV because they expect CEV to be something they will like. We can't calculate CEV ourselves, but that doesn't mean we don't know any of CEV's (expected) properties.

However, we might be wrong about what CEV will turn out to be like, and we may come to regret pre-committing to CEV. Tha... (read more)

50

The Friendly AI Game

50

50

50

The Friendly AI Game

50

50