For what it's worth. Here are some possible objections that certain people might raise.
(Note: I am doing this to help you refine a document that was probably meant to convince critics that they are wrong. It is not an attempt to troll. Everything below this line is written in critique mode.)
The most basic drive of any highly efficient AGI is, in my opinion, the drive to act correctly. You seem to assume that AGI will likely be designed to judge any action with regard to a strict utility-function. You are assuming a very special kind of AGI design with a rigid utility-function that the AGI then cares to satisfy the way it was initially hardcoded. You assume that the AGI won't be able to, respectively does not want to, figure out what its true goals might be.
What makes you think that AGI's will be designed according to those criteria?
If an AGI acts according to a rigid utility-functions, then what makes you think that it won't try to interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted?
If the AGI's utility-function solely consisted of the English language sentence "Make people happy.", then what makes you think that it wouldn't be able to conclude what we actually meant by it and act accordingly? Why would it care to act in a way that does not reflect our true intentions?
My problem is that there seems to be a discontinuity between the superior intelligence of a possible AGI and its inability to discern irrelevant information from relevant information with respect to the correct interpretation of its utility-function.
If an AGI acts according to a rigid utility-functions, then what makes you think that it won't try to interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted?
If the AGI's utility-function solely consisted of the English language sentence "Make people happy.", then what makes you think that it wouldn't be able to conclude what we actually meant by it and act accordingly? Why would it care to act in a way that does not reflect our true intentions?
Okay, I'm clearly not communicating the ess...
Here's my draft document Concepts are Difficult, and Unfriendliness is the Default. (Google Docs, commenting enabled.) Despite the name, it's still informal and would need a lot more references, but it could be written up to a proper paper if people felt that the reasoning was solid.
Here's my introduction:
And here's my conclusion:
For the actual argumentation defending the various premises, see the linked document. I have a feeling that there are still several conceptual distinctions that I should be making but am not, but I figured that the easiest way to find the problems would be to have people tell me what points they find unclear or disagreeable.