I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question "What is Friendly AI?"

_____

 

A Friendly AI (FAI) is an artificial intelligence that benefits humanity. More specifically, Friendly AI may refer to:

  • a very powerful and general AI that acts autonomously in the world to benefit humanity.
  • an AI that continues to benefit humanity during and after an intelligence explosion.
  • a research program concerned with the production of such an AI.
  • Singularity Institute's approach (Yudkowsky 2001, 2004) to designing such an AI:
    • Goals should be defined by the Coherent Extrapolated Volition of humanity.
    • Goals should be reliably preserved during recursive self-improvement.
    • Design should be mathematically rigorous and proof-apt.

Friendly AI is a more difficult project than often supposed. As explored in other sections, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

  1. Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.

  2. Literalness: a superintelligent machine will make decisions using the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety (Kringelbach & Berridge 2009; Schroeder 2004; Glimcher 2010) of what humans value. A demand like "maximize human happiness" sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.

New Comment
11 comments, sorted by Click to highlight new comments since:
[-][anonymous]40

I don't think that "benefits humanity" explains anything any more than "friendly". Quite the contrary, I can imagine someone being friendly. I can not, however, think of anything that would benefit all of humanity.

Perhaps, but I cannot briefly explain all of metaethics, especially since we haven't solved it yet. But I'm open to suggestions on how this could be clearer while remaining brief.

Eliminate acne.

The "features possessed by any superintelligence" paragraphs seem particularly vulnerable to antagonistic reading:

Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.

Superintelligent machines have mystical powers allowing them to transcend laws of physics and reality.

It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety

It will refuse to do anything unless rules are precise. It will ignore details of designers' specifications out of spite.

"Superpower", while logical enough, is a bad word to use because of its associations with low-status fiction.

ETA: for an alternative, I think just "power" would work in this context. Then you could change "powers to reshape reality" to "ability to reshape reality".

A Friendly

A Friendly AI (FAI) is an artificial intelligence that benefits humanity. It is contrasted with Unfriendly AI, which includes both Malicious AI and Uncaring AI.

Goals should be defined by the Coherent Extrapolated Volition of humanity.

Goals should be defined by the aggregating the desires of humanity in a fair way, with special attention paid to the fact that people's current dispositions reflect untrue beliefs about the world. How to do this is an unsolved problem for which Coherent Extrapolated Volition is an incomplete outline of a theory.

Superpower

Superpower: a superintelligent machine will (would?) have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires. Such a machine's actions would be difficult to predict because to reliably predict the way something will solve a problem, one must be at least approximately as smart as the problem solver.

Literalness

...

In general, human language and thought is designed to explain things to and interface with other humans that have similar ways of thinking. Designing an AI that does not have human failings like hate yet does understand what humans care for is an unprecedented challenge. This step, describing the goal system to the AI, builds on the solution of determining what is a fair goal system to give the AI. It conceptually precedes ensuring that the goal system remains stable under recursive self improvement.

Thanks.

"Literalness" is explained in sufficient detail to get a first idea of the connection to FAI, but "Superpower" is not.

In which stage of the FAQ is that supposed to be ? For people not used to singularity concepts, things like "intelligence explosion" will not mean much. Maybe the FAQ should first refer to a more general FAQ about the singularity ? Or include a link to it from "intelligence explosion" ?

For people not used to singularity concepts, things like "intelligence explosion" will not mean much.

Terrorist attack on MENSA!

I think this:

an AI that continues to benefit humanity during and after an intelligence explosion

is redundant with this:

Goals should be reliably preserved during recursive self-improvement.

and could be omitted.