nshepperd comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: Richard_Loosemore 05 May 2015 12:59:04PM *  3 points [-]

You ask

What is your probability estimate that an AI would be a psychopath

and you give me a helpful hint:

(Hint: All computer systems produced until today are psychopaths by this definition.)

Well, first please note that ALL artifacts at the present time, including computer systems, cans of beans, and screwdrivers, are psychopaths because none of them are DESIGNED to possess empathy. So your hint contains zero information. :-)

What is the probability that an AI would be a psychopath if someone took the elementary step of designing it to have empathy? Probability would be close to 1, assuming the designers knew what empathy was, and knew how to design it.

But your question was probably meant to target the situation where someone built an AI and did not bother to give it empathy. I am afraid that that is outside the context we are examining here, because all of the scenarios talk about some kind of inevitable slide toward psychpathic behavior, even under the assumption that someone does their best to give the AI an empathic motivation.

But I will answer this: if someone did not even try to give it empathy, that would be like designing a bridge and not even trying to use materials that could hold up a person's weight. In both cases the hypothetical is not interesting, since designing failure into a system is something any old fool could do.

Your second remark is a classic mistake that everyone makes in the context of this kind of discussion. You mention that the phrase "benevolence toward humanity" means "benevolence" as defined by the computer code.

That is incorrect. Let's try, now, to be really clear about that, because if you don't get why it is incorrect we might waste a lot of time running around in circles. It is incorrect for two reasons. First, because I was consciously using the word to refer to the normal human usage, not the implementation inside the AI. Second, it is incorrect because the entire issue in the paper is that there is a discrepancy between the implementation inside the AI and normal usage, and that discrepancy is then examined in the rest of the paper. By simply asserting that the AI may believe, "correctly" that benevolence is the same as violence toward people, you are pre-empting the discussion.

In the remarks you make after that, you are reciting the standard line contained in all the scenarios that the paper is addressing. That standard line is analyzed in the rest of the paper, and a careful explanation is given for why it is incoherent. So when you simply repeat the standard line, you are speaking as if the paper did not actually exist.

I can address questions that refer to the arguments in the paper, but I cannot say anything if you only recite the standard line that is demolished in the course of the paper's argument. So if you could say something about the argument itself.....

Comment author: nshepperd 12 May 2015 08:04:01AM *  2 points [-]

This is an absolutely blatant instance of equivocation.

Here's the sentence from the post:

[believes that benevolence toward humanity might involve forcing human beings to do something violently against their will.]

Assume that "benevolence" in that sentence refers to "benevolence as defined by the AI's code". Okay, then justification of that sentence is straightforward: The fact that the AI does things against the human's wishes provides evidence that the AI believes benevolence-as-defined-by-code to involve that.

Alternatively, assume that "benevolence" there refers to, y'know, actual human benevolence. Then how do you justify that claim? Observed actions are clearly insufficient, because actual human benevolence is not programmed into its code, benevolence-as-defined-by-code is. What makes you think the AI has any opinions about actual human benevolence at all?

You can't have both interpretations.

(As an aside, I do disapprove of Muehlhauser's use of "benevolence" to refer to mere happiness maximisation. "Apparently benevolent motivations" would be a better phrase. If you're going to use it to mean actual human benevolence then you can certainly complain that the FAQ appears to assert that a happiness maximiser can be "benevolent", even though it's clearly not.)

Comment author: TheAncientGeek 12 May 2015 01:38:29PM *  2 points [-]
Comment author: Richard_Loosemore 12 May 2015 01:50:31PM 0 points [-]

This comment is both rude and incoherent (at the same level of incoherence as your other comments). And it is also pedantic (concentrating as it does on meanings of words, as if those words were being used in violation of some rules that ... you just made up).

Sorry to say this but I have to choose how to spend my time, in responding to comments, and this does not even come close to meriting the use of my time. I did that before, in response to your other comments, and it made no impact.

Comment author: nshepperd 12 May 2015 03:38:05PM 4 points [-]

Equivocation is hardly something I just made up.

Here's an exercise to try. Next time you go to write something on FAI, taboo the words "good", "benevolent", "friendly", "wrong" and all of their synonyms. Replace the symbol with the substance. Then see if your arguments still make sense.