My argument doesn't require that anybody be able to formally define "self" or "maximize paperclips"; it doesn't require the goal G to be picked among those that are easily defined in English.
An agent capable of reasoning about the world should be able to make an inference like "if all copies of me are destroyed, it makes it much less likely that goal G would be reached"; it may not have exactly that form, but it should be something analogous. It doesn't matter if I can't formalize that, the agent may not have a completely formal version either, only one that is sufficient for it's purposes.
My argument doesn't require that anybody be able to formally define "self" or "maximize paperclips"; it doesn't require the goal G to be picked among those that are easily defined in English.
Show 3 examples of goal G. Somewhere I've read awesome technique for avoiding the abstraction mistakes - asking to show 3 examples.
Here's my draft document Concepts are Difficult, and Unfriendliness is the Default. (Google Docs, commenting enabled.) Despite the name, it's still informal and would need a lot more references, but it could be written up to a proper paper if people felt that the reasoning was solid.
Here's my introduction:
And here's my conclusion:
For the actual argumentation defending the various premises, see the linked document. I have a feeling that there are still several conceptual distinctions that I should be making but am not, but I figured that the easiest way to find the problems would be to have people tell me what points they find unclear or disagreeable.