Vaniver comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: Richard_Loosemore 05 May 2015 01:31:25PM *  9 points [-]

I am going to have to respond piecemeal to your thoughtful comments, so apologies in advance if I can only get to a couple of issues in this first response.

Your first remark, which starts

If there is some good way...

contains a multitude of implicit assumptions about how the AI is built, and how the checking code would do its job, and my objection to your conclusion is buried in an array of objections to all of those assumptions, unfortunately. Let me try to bring some of them out into the light:

1) When you say

If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans...

I am left wondering what kind of scenario you are picturing for the checking process. Here is what I had in mind. The AI can quickly assess the "forcefulness" of any candidate action plan by asking itself whether the plan will involve giving choices to people vs. forcing them to do something whether they like it or not. If a plan is of the latter sort, more care is needed, so it will canvass a sample of people to see if their reactions are positive or negative. It will also be able to model people (as it must be able to do, because all intelligent systems must be able to model the world pretty accurately or they don't qualifiy as 'intelligent') so it will probably have a pretty shrewd idea already of whether people will react positively or negatively toward some intended action plan.

If the AI starts to get even a hint that there are objections, it has to kick in a serious review of the plan. It will ask everyone (it is an AI, after all: it can do that even if there are 100 billion people on the planet). If it gets feedback from anyone saying that they object to the plan, that is the end of the story: it does not force anyone to go through with it. That means, it is a fundamental feature of the checking code that it will veto a plan under that circumstance. Notice, by the way, that I have generalized "consulting the programmers" to "consulting everyone". That is an obvious extension, since the original programmers were only proxies for the will of the entire species.

In all of that procedure I just described, why would the explanation of the plans to the people be problematic? People will ask questions about what the plans involve. If there is technical complexity, they will ask for clarification. If the plan is drastic there will be a world-wide debate, and some people who finds themselves unable to comprehend the plan will turn to more expert humans for advice. And if even the most expert humans cannot understand the significance of the plan, what do you imagine would happen? I suggest that the most obvious reaction would be "Sorry, that plan is so obscure, and its consequences are so impossible for us to even understand, that we, a non-zero fraction of the human species, would like to invoke a precautionary principle and simply refuse to go ahead with it."

That seems, to me at least, to get around the idea that there might be such a severe mismatch between human and AI understanding of the AI's plans, that something bad would happen during the attempt to understand the plan.

In other words, your opening comment

If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works

seems to have been 100% addressed by the procedure I just described: if the plans could not be explained, the checking code would simply accept that the will of the people prevails even when they say "We decline on the grounds that we cannot understand the complexity or implications of your plans."

I see I have only gotten as far as the very first sentence of your comment, but although I have many more points that I could deploy in response to the rest, doesn't that close the case, since you said that it would work?

Comment author: Vaniver 05 May 2015 05:29:44PM *  3 points [-]

Your first remark, which starts "If there is some good way..."

I suggest quoting the remarks using the markdown syntax with a > in front of the line, like so:

 >If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works.

That will look like this:

If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works.

You can then respond to the quotes afterwards, and the flow will be more obvious to the reader.

Comment author: Richard_Loosemore 05 May 2015 05:40:10PM 6 points [-]

Thank you. I edited the remarks to conform. I was not familiar with the mechanism for quoting, here. Let me know if I missed any.

Comment author: Vaniver 05 May 2015 05:47:02PM 1 point [-]

You're welcome!