Work harder on tabooing "Friendly AI"

ChrisHallquist

27 Work harder on tabooing "Friendly AI"

20th May 2012

3 min read

27

This is is an outgrowth of a comment I left on Luke's dialog with Pei Wang, and I'll start by quoting that comment in full:

Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?

The Singularity Institute's page "What is Friendly AI?" defines "Friendly AI" as "A "Friendly AI" is an AI that takes actions that are, on the whole, beneficial to humans and humanity." Surely you don't mean to say, "The idea of an AI that takes actions that are, on the whole, beneficial to humans and humanity may be incoherent or impossible"?

Eliezer's paper "Artificial Intelligence as a Positive and Negative Factor in Global Risk" talks about "an AI created with specified motivations." But it's pretty clear that that's not the only thing you and he have in mind, because part of the problem is making sure the motivations we give an AI are the ones we really want to give it.

If you meant neither of those things, what did you mean? "Provably friendly"? "One whose motivations express an ideal extrapolation of our values"? (It seems a flawed extrapolation could still give results that are on the whole beneficial, so this is different than the first definition suggested above.) Or something else?

Since writing that comment, I've managed to find two other definitions of "Friendly AI." One is from Armstrong, Sandberg, and Bostrom's paper on Oracle AI, which describes Friendly AI as: "AI systems designed to be of low risk." This definition is very similar to the definition from the Singularity Institute's "What is Friendly AI?" page, except that it incorporates the concept of risk. The second definition is from Luke's paper with Anna Salamon, which describes Friendly AI as "an AI with a stable, desirable utility function." This definition has the important feature of restricting "Friendly AI" to designs that have a utility function. Luke's comments about "rationally shaped" AI in this essay seem relevant here.

Neither of those papers seems to use the initial definition they give of "Friendly AI" consistently. Armstrong, Sandberg, and Bostrom's paper has a section on creating Oracle AI by giving it a "friendly utility function," which states, "if a friendly OAI could be designed, then it is most likely that a friendly AI could also be designed, obviating the need to restrict to an Oracle design in the first place."

This is a non-sequitur if "friendly" merely means "low risk," but it makes sense if they are actually defining Friendly AI in terms of a safe utility function: what they're saying then is if we can create an AI that stays boxed because of its utility function, we can probably create an AI that doesn't need to be boxed to be safe.

In the case of Luke's paper with Anna Salamon, the discussion on page 17 seems to imply that "Nanny AI" and "Oracle AI" are not types of Friendly AI. This is strange under their official definition of "Friendly AI." Why couldn't Nanny AI or Oracle AI have a stable, desirable utility function? I'm inclined to think the best way to make sense of that part of the paper is if "Friendly AI" is interpreted to mean "an AI whose utility function an ideal extrapolation of our values (or at least comes close.)"

I'm being very nitpicky here, but I think the issue of how to define "Friendly AI" is important for a couple of reasons. First, it's obviously important for clear communication. If we aren't clear on what we mean by "Friendly AI," we won't understand each other when we try to talk about it." But another very important worry that confusion about the meaning of "Friendly AI" may be spawning sloppy thinking about it. Equivocating between narrower and broader definitions of "Friendly AI" may end up taking the place of an argument that the approach specified by the more narrow definition is the way to go. This seems like an excellent example of the benefits of tabooing your words.

I see on Luke's website that he has a forthcoming peer-reviewed article with Nick Bostrom titled "Why We Need Friendly AI." On the whole, I've been impressed with the drafts of the two peer-reviewed articles Luke has posted so far, so I'm moderately optimistic that that article will resolve these issues.

Rationalist Taboo

Personal Blog

27

New Comment

Rendering 0/52 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:35 PM

Moderation Log

27 Work harder on tabooing "Friendly AI"

by ChrisHallquist

20th May 2012

3 min read

27

This is is an outgrowth of a comment I left on Luke's dialog with Pei Wang, and I'll start by quoting that comment in full:

Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?

Rationalist Taboo

Personal Blog

27

Mentioned in

15I think I've found the source of what's been bugging me about "Friendly AI"

New Comment

Rendering 0/52 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:35 PM

Moderation Log

More from ChrisHallquist

Curated and popular this week

52Comments

Comment Permalink

lukeprog14y160

My article forthcoming with Bostrom is too short to resolve the confusions you're discussing.

What we actually said about Nanny AI is that it may be FAI-complete, and that it is thus really full-blown Friendly AI even though when Ben Goertzel talks about it in English it might sound like not-FAI.

Here's an example of why "Friendly AI may be incoherent and impossible." Suppose that the only way to have a superintelligent AI beneficial to humanity is something like CEV, but nobody is ever able to make sense of the idea of combining and extrapolating human values. "Can we extrapolate the coherent convergence of human values?" sounds suspiciously like a Wrong Question. Maybe there's a Right Question somewhere near that space, and we'll be able to find the answer, but right now we are fundamentally philosophically confused about what these English words could usefully mean.

John_Maxwell14y00

I don't think the confusions are that hard to resolve, although related confusions might be. Here are some distinct questions:

Will a given AI's creation lead to good consequences?
To what extent can a given AI be said to have a utility function?
How can we define humanity's utility function?
How closely does a given AI's utility function approximate our definition?
Is a given AI's utility function stable?

The standard SI position would be something like an AI will only lead to good consequences if we are careful to define humanity's utility function, ge... (read more)

6TheOtherDave14y

(Dances the Dance of Endorsement )

8ChrisHallquist14y

It's worth distinguishing between two claims: (1) If you can build Nanny AI, you can build FAI and (2) If you've built Nanny AI, you've built FAI. (2) is compatible with and in fact entails (1). (1) does not, however, entail (2). In fact, (1) seems pointless to say if you also believe (2) because the entailment is so obvious. Because your paper explicitly asserts (1), I inferred you did not believe (2). Your comment seems to explicitly assert both (1) and (2), making me somewhat confused about what your view is. EDIT: Part of what is confusing about your comment is that it seems to say "(1), thus (2)" which does not follow. Also, to save people the trouble of looking up the relevant section of the paper, the term "FAI complete" is explained in this way: "That is, in order to build Nanny AI, you may need to solve all the problems required to build full-blown Friendly AI." [...] I'm not sure I understand what you mean by this either. Maybe, going off the "beneficial to humanity" definition of FAI, you mean to say that it's possible that right now, we are fundamentally philosophically confused about what "beneficial to humanity" might mean?

See in context