I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question "What is Friendly AI?"
_____
A Friendly AI (FAI) is an artificial intelligence that benefits humanity. More specifically, Friendly AI may refer to:
- a very powerful and general AI that acts autonomously in the world to benefit humanity.
- an AI that continues to benefit humanity during and after an intelligence explosion.
- a research program concerned with the production of such an AI.
- Singularity Institute's approach (Yudkowsky 2001, 2004) to designing such an AI:
- Goals should be defined by the Coherent Extrapolated Volition of humanity.
- Goals should be reliably preserved during recursive self-improvement.
- Design should be mathematically rigorous and proof-apt.
Friendly AI is a more difficult project than often supposed. As explored in other sections, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:
-
Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.
-
Literalness: a superintelligent machine will make decisions using the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety (Kringelbach & Berridge 2009; Schroeder 2004; Glimcher 2010) of what humans value. A demand like "maximize human happiness" sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.
A Friendly AI (FAI) is an artificial intelligence that benefits humanity. It is contrasted with Unfriendly AI, which includes both Malicious AI and Uncaring AI.
Goals should be defined by the aggregating the desires of humanity in a fair way, with special attention paid to the fact that people's current dispositions reflect untrue beliefs about the world. How to do this is an unsolved problem for which Coherent Extrapolated Volition is an incomplete outline of a theory.
Superpower: a superintelligent machine will (would?) have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires. Such a machine's actions would be difficult to predict because to reliably predict the way something will solve a problem, one must be at least approximately as smart as the problem solver.
...
In general, human language and thought is designed to explain things to and interface with other humans that have similar ways of thinking. Designing an AI that does not have human failings like hate yet does understand what humans care for is an unprecedented challenge. This step, describing the goal system to the AI, builds on the solution of determining what is a fair goal system to give the AI. It conceptually precedes ensuring that the goal system remains stable under recursive self improvement.
Thanks.