A line in the wiki article on "paperclip maximizer" caught my attention:
"the notion that life is precious is specific to particular philosophies held by human beings, who have an adapted moral architecture resulting from specific selection pressures acting over millions of years of evolutionary time."
Why don't we set up an evolutionary system within which valuing other intelligences, cooperating with them and retaining those values across self improvement iterations would be selected for?
A specific plan:
Simulate an environment with a large number of AI agents competing for resources. Access to those resources allows the agent to perform a self improvement iteration. Rig the environments such that success requires cooperating with other intelligences of the same or lower level. Repopulate the next environment with copies of the succeeding intelligences. Over a sufficient number of generation times this should select for agents that value other intelligences, and preserve their values through self modification.
What do people think? I can see a few possible sources for error myself, but would like to hear your responses uncontaminated. [Given the importance of the topic you can assume Crocker's rules are in effect.]
Defining the metric for cooperation robustly enough that you could unleash the resulting evolved AI on the real world might not be any easier than figuring out what an FAI's utility function should be directly.
Also, a sufficiently intelligent AI may be able to hijack the game before we could decide whether it was ready to be released.
At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.
Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.
There are three reasons I suggest playing this game. In descending order of importance, they are: