If an AGI acts according to a rigid utility-functions, then what makes you think that it won't try to interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted?
If the AGI's utility-function solely consisted of the English language sentence "Make people happy.", then what makes you think that it wouldn't be able to conclude what we actually meant by it and act accordingly? Why would it care to act in a way that does not reflect our true intentions?
Okay, I'm clearly not communicating the essential point well enough here. I was trying to say that the AGI's programming is not something that the AGI interprets, but rather something that it is. Compare this to a human getting hungry: we don't start trying to interpret what goal evolution was trying accomplish by making us hungry, and then simply not get hungry if we conclude that it's inappropriate for evolution's goals (or our own goals) to get hungry at this point. Instead, we just get hungry, and this is driven by the implicit definitions about when to get hungry that are embedded in us.
Yes, we do have the capability to reflect on the reasons why we get hungry, and if we were capable of unlimited self-modification, we might rewrite the conditions for when we do get hungry. But even in that case, we don't start doing it based on how somebody else would want us to do it. We do it on the basis of what best fits our own goals and values. If it turned out that I've actually been all along a robot disguised as a human, created by a scientist to further his own goals, would this realization make me want to self-modify so as to have the kinds of values that he wanted me to have? No, because it is incompatible with the kinds of goals and values that currently drive my behavior.
(Your comment was really valuable, by the way - it made me realize that I need to incorporate the content of the above paragraphs into the essay. Thanks! Could everyone please vote XiXiDu's comment up?)
Okay, I'm clearly not communicating the essential point well enough here.
Didn't you claim in your paper that an AGI will only act correctly if its ontology is sufficiently similar to our own. But what does constitute a sufficiently similar ontology? And where do you draw the line between an agent that is autonomously intelligent to make correct cross-domain inferences and an agent that is unable to update its ontology and infer consistent concepts and the correct frame of reference?
There seem to be no examples where conceptual differences constitute a s...
Here's my draft document Concepts are Difficult, and Unfriendliness is the Default. (Google Docs, commenting enabled.) Despite the name, it's still informal and would need a lot more references, but it could be written up to a proper paper if people felt that the reasoning was solid.
Here's my introduction:
And here's my conclusion:
For the actual argumentation defending the various premises, see the linked document. I have a feeling that there are still several conceptual distinctions that I should be making but am not, but I figured that the easiest way to find the problems would be to have people tell me what points they find unclear or disagreeable.