The Friendly AI Game

bentarm

At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.

Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.

There are three reasons I suggest playing this game. In descending order of importance, they are:

It sounds like fun
It might help to convince people that the Friendly AI problem is hard(*).
We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try

DISCLAIMER (probably unnecessary, given the audience) - I think it is unlikely that anyone will manage to come up with a formally stated utility function for which none of us can figure out a way in which it could go hideously wrong. However, if they do so, this does NOT constitute a proof of Friendliness and I 100% do not endorse any attempt to implement an AI with said utility function.

(*) I'm slightly worried that it might have the opposite effect, as people build more and more complicated conjunctions of desires to overcome the objections that we've already seen, and start to think the problem comes down to nothing more than writing a long list of special cases but, on balance, I think that's likely to have less of an effect than just seeing how naive suggestions for Friendliness can be hideously broken.

There are three reasons I suggest playing this game. In descending order of importance, they are:

It sounds like fun
It might help to convince people that the Friendly AI problem is hard(*).
We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try

1: Define Descended People Years as number of years lived by any descendants of existing people.

2: Generate a searchable index of actions which can be taken to increase Descended People Years, along with an explanation on an adjustable reading level as to why it works.

3: Allow Full view of any DPY calculations, so that something can be seen as both "Expected DPY gain X" and "90% chance of Expected DPY gain Y, 10% chance of Expected DPY loss Z"

4: Allow Humans to search this list sorting by cost, descendant, and action, time required, complexity, Descended People Years given, and ratios of those figures.

5: At no Point will the FAI ever be given the power to implement any actions itself.

6: The FAI is not itself considered a Descendant of People. (Although even if it was, it would have to dutifuly report in the explanations.)

7: All people are allowed to have access to this FAI.

8: The FAI is required to have all courses of action obey the law, (no suggesting murdering one group at a cost of DPY X to extend a second group by DPY Y) although it can suggest changing a law.

9: Allow people to focus calculations in a particular area based on complexity or with restrictions, such as "I'd like to engage in an action that can boost DPY that I can complete in 5 minutes." or "I'd like to engage in an action that can boost DPY that does not require adding additional computational power to the FAI."

So I can use the AI to find out (for instance) that giving up drinking Soda will likely extend my descended people years by 0.2 years, at a savings to me of 2 dollars a day, along with an explanation of why it does so and extended calculations if desired.

But a president of a country could use the FAI to find out that signing a treaty with a foreign country to reduce Nuclear Weapons would likely extend DPY by 1,000,000 DPY, along with an explanation of why it does so and extended calculations if desired.

50

The Friendly AI Game

50

50

50

The Friendly AI Game

50

50