Work harder on tabooing "Friendly AI"

ChrisHallquist

This is is an outgrowth of a comment I left on Luke's dialog with Pei Wang, and I'll start by quoting that comment in full:

Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?

The Singularity Institute's page "What is Friendly AI?" defines "Friendly AI" as "A "Friendly AI" is an AI that takes actions that are, on the whole, beneficial to humans and humanity." Surely you don't mean to say, "The idea of an AI that takes actions that are, on the whole, beneficial to humans and humanity may be incoherent or impossible"?

Eliezer's paper "Artificial Intelligence as a Positive and Negative Factor in Global Risk" talks about "an AI created with specified motivations." But it's pretty clear that that's not the only thing you and he have in mind, because part of the problem is making sure the motivations we give an AI are the ones we really want to give it.

If you meant neither of those things, what did you mean? "Provably friendly"? "One whose motivations express an ideal extrapolation of our values"? (It seems a flawed extrapolation could still give results that are on the whole beneficial, so this is different than the first definition suggested above.) Or something else?

Since writing that comment, I've managed to find two other definitions of "Friendly AI." One is from Armstrong, Sandberg, and Bostrom's paper on Oracle AI, which describes Friendly AI as: "AI systems designed to be of low risk." This definition is very similar to the definition from the Singularity Institute's "What is Friendly AI?" page, except that it incorporates the concept of risk. The second definition is from Luke's paper with Anna Salamon, which describes Friendly AI as "an AI with a stable, desirable utility function." This definition has the important feature of restricting "Friendly AI" to designs that have a utility function. Luke's comments about "rationally shaped" AI in this essay seem relevant here.

Neither of those papers seems to use the initial definition they give of "Friendly AI" consistently. Armstrong, Sandberg, and Bostrom's paper has a section on creating Oracle AI by giving it a "friendly utility function," which states, "if a friendly OAI could be designed, then it is most likely that a friendly AI could also be designed, obviating the need to restrict to an Oracle design in the first place."

This is a non-sequitur if "friendly" merely means "low risk," but it makes sense if they are actually defining Friendly AI in terms of a safe utility function: what they're saying then is if we can create an AI that stays boxed because of its utility function, we can probably create an AI that doesn't need to be boxed to be safe.

In the case of Luke's paper with Anna Salamon, the discussion on page 17 seems to imply that "Nanny AI" and "Oracle AI" are not types of Friendly AI. This is strange under their official definition of "Friendly AI." Why couldn't Nanny AI or Oracle AI have a stable, desirable utility function? I'm inclined to think the best way to make sense of that part of the paper is if "Friendly AI" is interpreted to mean "an AI whose utility function an ideal extrapolation of our values (or at least comes close.)"

I'm being very nitpicky here, but I think the issue of how to define "Friendly AI" is important for a couple of reasons. First, it's obviously important for clear communication. If we aren't clear on what we mean by "Friendly AI," we won't understand each other when we try to talk about it." But another very important worry that confusion about the meaning of "Friendly AI" may be spawning sloppy thinking about it. Equivocating between narrower and broader definitions of "Friendly AI" may end up taking the place of an argument that the approach specified by the more narrow definition is the way to go. This seems like an excellent example of the benefits of tabooing your words.

I see on Luke's website that he has a forthcoming peer-reviewed article with Nick Bostrom titled "Why We Need Friendly AI." On the whole, I've been impressed with the drafts of the two peer-reviewed articles Luke has posted so far, so I'm moderately optimistic that that article will resolve these issues.

This is is an outgrowth of a comment I left on Luke's dialog with Pei Wang, and I'll start by quoting that comment in full:

Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?

Yeah, the terminology doesn't seem to be consistently used. On one hand, Eliezer seems to use it as a general term for "safe" AI:

Creating Friendly AI, 2001: The term "Friendly AI" refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals.

Artificial Intelligence as a Positive and Negative Factor in Global Risk, 2006/2008 It would be a very good thing if humanity knew how to choose into existence a powerful optimization process with a particular target. Or in more colloquial terms, it would be nice if we knew how to build a nice AI.

To describe the field of knowledge needed to address that challenge, I have proposed the term "Friendly AI". In addition to referring to a body of technique, "Friendly AI" might also refer to the product of technique - an AI created with specified motivations. When I use the term Friendly in either sense, I capitalize it to avoid confusion with the intuitive sense of "friendly".

Complex Value Systems are Required to Realize Valuable Futures, 2011: A common reaction to first encountering the problem statement of Friendly AI ("Ensure that the creation of a generally intelligent, self-improving, eventually superintelligent system realizes a positive outcome")...

On the other hand, some authors seem to use "Friendly AI" as a more specific term to refer to a particular kind of AI design proposed by Eliezer. For instance,

Ben Goertzel, Thoughts on AI Morality, 2002: Eliezer Yudkowsky has recently put forth a fairly detailed theory of what he calls “Friendly AI,” which is one particular approach to instilling AGI’s with morality (Yudkowsky, 2001a). The ideas presented here, in this (much briefer) essay, are rather different from Yudkowsky’s, but they are aiming at roughly the same goal.

Ben Goertzel, Apparent Limitations on the “AI Friendliness” and Related Concepts Imposed By the Complexity of the World, 2006: Eliezer Yudkowsky, in his various online writings (see links at www.singinst.org), has introduced the term “Friendly AI” to refer to powerful AI’s that are beneficent rather than malevolent or indifferent to humans.1 On the other hand, in my prior writings (see the book The Path to Posthumanity that I coauthored with Stephan Vladimir Bugaj; and my earlier online essay “Encouraging a Positive Transcension”), I have suggested an alternate approach in which much more abstract properties like “compassion”, “growth” and “choice” are used as objectives to guide the long-term evolution and behavior of AI systems. [...]

My general feeling, related here in the context of some specific arguments, is not that Friendly AI is a bad thing to pursue in any moral sense, but rather that it is very likely to be unachievable for basic conceptual reasons.

Mark Waser, Rational Universal Benevolence: Simpler, Safer, and Wiser than “Friendly AI”, 2011: Insanity is doing the same thing over and over and expecting a different result. “Friendly AI” (FAI) meets these criteria on four separate counts by expecting a good result after: 1) it not only puts all of humanity’s eggs into one basket but relies upon a totally new and untested basket, 2) it allows fear to dictate our lives, 3) it divides the universe into us vs. them, and finally 4) it rejects the value of diversity. In addition, FAI goal initialization relies on being able to correctly calculate a “Coherent Extrapolated Volition of Humanity” (CEV) via some as-yet-undiscovered algorithm. Rational Universal Benevolence (RUB) is based upon established game theory and evolutionary ethics and is simple, safe, stable, self-correcting, and sensitive to current human thinking, intuitions, and feelings. Which strategy would you prefer to rest the fate of humanity upon?

27

Work harder on tabooing "Friendly AI"

27

27

27

Work harder on tabooing "Friendly AI"

27

27