At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.
Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.
There are three reasons I suggest playing this game. In descending order of importance, they are:
- It sounds like fun
- It might help to convince people that the Friendly AI problem is hard(*).
- We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try
puts hand up
That was me with the geographically localised trial idea… though I don’t think I presented it as a definite solution. More of an ‘obviously this has been thought about BUT’. At least I hope that’s how I approached it!
My more recent idea was to give the AI a prior to never consult or seek the meaning of certain of its own files. Then put in these files the sorts of safeguards generally discussed and dismissed as not working (don’t kill people etc), with the rule that if the AI breaks those rules, it shuts down. So it can't deliberately work round the safeguards, just run into them. This is similar to my other helpful suggestion at the London meet, which was 'leave its central computer exposed so that it can be crippled with a well-aimed gunshot'.
Risks with subconscious AI: Someone tampers with the secret files It works out what will be in them by analysing us If we try to make an improved one after it shutting down, the improved one will assume similar rules We just don’t cover the possibilities of bad things it could do It become obsessed with its dreams etc and invents psychoanalysis
NEVERTHELESS, I think it’s a pretty neat idea. ;-)
In the process of FOOMing, the AI builds another AI without those safe guards.