My article forthcoming with Bostrom is too short to resolve the confusions you're discussing.
What we actually said about Nanny AI is that it may be FAI-complete, and that it is thus really full-blown Friendly AI even though when Ben Goertzel talks about it in English it might sound like not-FAI.
Here's an example of why "Friendly AI may be incoherent and impossible." Suppose that the only way to have a superintelligent AI beneficial to humanity is something like CEV, but nobody is ever able to make sense of the idea of combining and extrapolating human values. "Can we extrapolate the coherent convergence of human values?" sounds suspiciously like a Wrong Question. Maybe there's a Right Question somewhere near that space, and we'll be able to find the answer, but right now we are fundamentally philosophically confused about what these English words could usefully mean.
I don't think the confusions are that hard to resolve, although related confusions might be. Here are some distinct questions:
The standard SI position would be something like an AI will only lead to good consequences if we are careful to define humanity's utility function, ge...
This is is an outgrowth of a comment I left on Luke's dialog with Pei Wang, and I'll start by quoting that comment in full:
Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?
The Singularity Institute's page "What is Friendly AI?" defines "Friendly AI" as "A "Friendly AI" is an AI that takes actions that are, on the whole, beneficial to humans and humanity." Surely you don't mean to say, "The idea of an AI that takes actions that are, on the whole, beneficial to humans and humanity may be incoherent or impossible"?
Eliezer's paper "Artificial Intelligence as a Positive and Negative Factor in Global Risk" talks about "an AI created with specified motivations." But it's pretty clear that that's not the only thing you and he have in mind, because part of the problem is making sure the motivations we give an AI are the ones we really want to give it.
If you meant neither of those things, what did you mean? "Provably friendly"? "One whose motivations express an ideal extrapolation of our values"? (It seems a flawed extrapolation could still give results that are on the whole beneficial, so this is different than the first definition suggested above.) Or something else?
Since writing that comment, I've managed to find two other definitions of "Friendly AI." One is from Armstrong, Sandberg, and Bostrom's paper on Oracle AI, which describes Friendly AI as: "AI systems designed to be of low risk." This definition is very similar to the definition from the Singularity Institute's "What is Friendly AI?" page, except that it incorporates the concept of risk. The second definition is from Luke's paper with Anna Salamon, which describes Friendly AI as "an AI with a stable, desirable utility function." This definition has the important feature of restricting "Friendly AI" to designs that have a utility function. Luke's comments about "rationally shaped" AI in this essay seem relevant here.
Neither of those papers seems to use the initial definition they give of "Friendly AI" consistently. Armstrong, Sandberg, and Bostrom's paper has a section on creating Oracle AI by giving it a "friendly utility function," which states, "if a friendly OAI could be designed, then it is most likely that a friendly AI could also be designed, obviating the need to restrict to an Oracle design in the first place."
This is a non-sequitur if "friendly" merely means "low risk," but it makes sense if they are actually defining Friendly AI in terms of a safe utility function: what they're saying then is if we can create an AI that stays boxed because of its utility function, we can probably create an AI that doesn't need to be boxed to be safe.
In the case of Luke's paper with Anna Salamon, the discussion on page 17 seems to imply that "Nanny AI" and "Oracle AI" are not types of Friendly AI. This is strange under their official definition of "Friendly AI." Why couldn't Nanny AI or Oracle AI have a stable, desirable utility function? I'm inclined to think the best way to make sense of that part of the paper is if "Friendly AI" is interpreted to mean "an AI whose utility function an ideal extrapolation of our values (or at least comes close.)"
I'm being very nitpicky here, but I think the issue of how to define "Friendly AI" is important for a couple of reasons. First, it's obviously important for clear communication. If we aren't clear on what we mean by "Friendly AI," we won't understand each other when we try to talk about it." But another very important worry that confusion about the meaning of "Friendly AI" may be spawning sloppy thinking about it. Equivocating between narrower and broader definitions of "Friendly AI" may end up taking the place of an argument that the approach specified by the more narrow definition is the way to go. This seems like an excellent example of the benefits of tabooing your words.
I see on Luke's website that he has a forthcoming peer-reviewed article with Nick Bostrom titled "Why We Need Friendly AI." On the whole, I've been impressed with the drafts of the two peer-reviewed articles Luke has posted so far, so I'm moderately optimistic that that article will resolve these issues.