Richard_Loosemore comments on Debunking Fallacies in the Theory of AI Motivation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
So, on rereading the paper I was able to pinpoint the first bit of text that made me think this (the quoted text and the bit before), but am having difficulties finding a second independent example, and so I apologize for the unfairness in generalizing based on one example.
The other examples I found looked like they all relied on the same argument. Consider the following section:
If I think the "logical consistency" argument does not go through, I shouldn't claim this is an independent argument that doesn't go through, because this argument holds given the premises (at least one of which I reject, but it's the same premise). I clearly had this line in mind also:
The 'principal-agent problem' is a fundamental problem in human institutional design: principals would like to be able to hire agents to perform tasks, but only have crude control over the incentives of the agents, and the agents often have control over what information makes it to the principals. One way to characterize the AI value alignment problem (as I hear MIRI is calling it these days) is that it's a principal agent problem where the agent has massive control over the information the principal sees, but the value difference between principals and agents is only due to communication problems, rather than any malice on the part of the agent. That is, the principal wants the agent to do "what I mean," but the agent only has access to "what I say," and cannot be assumed to have any mind-reading powers that we don't build into it.
It seems very difficult to get an AI to correctly classify the difference between programming error and programming intention, and even more difficult for the AI to communicate to us that it has correctly classified that issue. (We have both the illusion of transparency, and the double illusion of transparency to deal with!) Claiming that something is "clearly" a programming error strikes me as trivializing the underlying communication problem. But I agree with you that if we have that problem solved, then we're home free.
I just wanted to say that I will try to reply soon. Unfortunately :-) some of the comments have been intensely thoughtful, causing me to write enormous replies of my own and saturating my bandwidth. So, apologies for any delay....