Vaniver comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
I am going to have to respond piecemeal to your thoughtful comments, so apologies in advance if I can only get to a couple of issues in this first response.
Your first remark, which starts
contains a multitude of implicit assumptions about how the AI is built, and how the checking code would do its job, and my objection to your conclusion is buried in an array of objections to all of those assumptions, unfortunately. Let me try to bring some of them out into the light:
1) When you say
I am left wondering what kind of scenario you are picturing for the checking process. Here is what I had in mind. The AI can quickly assess the "forcefulness" of any candidate action plan by asking itself whether the plan will involve giving choices to people vs. forcing them to do something whether they like it or not. If a plan is of the latter sort, more care is needed, so it will canvass a sample of people to see if their reactions are positive or negative. It will also be able to model people (as it must be able to do, because all intelligent systems must be able to model the world pretty accurately or they don't qualifiy as 'intelligent') so it will probably have a pretty shrewd idea already of whether people will react positively or negatively toward some intended action plan.
If the AI starts to get even a hint that there are objections, it has to kick in a serious review of the plan. It will ask everyone (it is an AI, after all: it can do that even if there are 100 billion people on the planet). If it gets feedback from anyone saying that they object to the plan, that is the end of the story: it does not force anyone to go through with it. That means, it is a fundamental feature of the checking code that it will veto a plan under that circumstance. Notice, by the way, that I have generalized "consulting the programmers" to "consulting everyone". That is an obvious extension, since the original programmers were only proxies for the will of the entire species.
In all of that procedure I just described, why would the explanation of the plans to the people be problematic? People will ask questions about what the plans involve. If there is technical complexity, they will ask for clarification. If the plan is drastic there will be a world-wide debate, and some people who finds themselves unable to comprehend the plan will turn to more expert humans for advice. And if even the most expert humans cannot understand the significance of the plan, what do you imagine would happen? I suggest that the most obvious reaction would be "Sorry, that plan is so obscure, and its consequences are so impossible for us to even understand, that we, a non-zero fraction of the human species, would like to invoke a precautionary principle and simply refuse to go ahead with it."
That seems, to me at least, to get around the idea that there might be such a severe mismatch between human and AI understanding of the AI's plans, that something bad would happen during the attempt to understand the plan.
In other words, your opening comment
seems to have been 100% addressed by the procedure I just described: if the plans could not be explained, the checking code would simply accept that the will of the people prevails even when they say "We decline on the grounds that we cannot understand the complexity or implications of your plans."
I see I have only gotten as far as the very first sentence of your comment, but although I have many more points that I could deploy in response to the rest, doesn't that close the case, since you said that it would work?
I suggest quoting the remarks using the markdown syntax with a > in front of the line, like so:
That will look like this:
You can then respond to the quotes afterwards, and the flow will be more obvious to the reader.
Thank you. I edited the remarks to conform. I was not familiar with the mechanism for quoting, here. Let me know if I missed any.
You're welcome!