At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.
Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.
There are three reasons I suggest playing this game. In descending order of importance, they are:
- It sounds like fun
- It might help to convince people that the Friendly AI problem is hard(*).
- We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try
The AI that you designed finds a way to wirehead itself, achieving the upper bound in a manner that you didn't anticipate, in the process decisively wrecking itself. The AI that you designed remains as a little orgasmic loop at the center of the pile of wreckage. However, the pile of components are unfortunately not passive or "off". They were originally designed by a team of humans to be components of a smart entity, and then modified by a smart entity in a peculiar and nonintuitive way. Their "blue screen of death" behavior is more akin to an ecosystem, and replicator dynamics take over, creating several new selfish species.
Why would an AI wirehead itself to short-circuit its utility function? Beings governed by a utility function don't want to trick themselves into believing that they have optimized the world into a state with higher utility, they want to actually optimize the world into such a state.
If I want to save the world, I don't wirehead because that wouldn't save the world.