Start the AI in a sandbox universe. Define its utility function over 32-bit integers. Somewhere inside the sandbox, put something that sets its utility to INT_MAX utility, then halts the simulation. Outside the sandbox, leave documentation of this readily accessible. The AI should never try to do something elaborately horrible, because it can get max utility easily enough from inside the simulation; if it does escape the box, it should go back in to collect its INT_MAX utility.
Ooh, just thought of another one. For whatever reason, the easiest way for the AI to escape the box happens to have the side effect of causing immense psychological damage to its creator, or starting a war, or something like that.
At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.
Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.
There are three reasons I suggest playing this game. In descending order of importance, they are: