I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the "structured utility function" Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can't be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
Even notions of being "cosmopolitan" - of not selfishly or provincially constraining future AIs - are written down nowhere in the universe except a handful of human brains. An expected paperclip maximizer would not bother to ask such questions.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer--that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don't believe in smart expected paperclip maximizers.
Humans certainly can't be factorized in that way.
Humans aren't factorized this way, whether they can't is a separate question. It's not surprising that evolution's design isn't that neat, so the fact that humans don't have this property is only weak evidence about the possibility of designing systems that do have this property.
This post is shameless self-promotion, but I'm told that's probably okay in the Discussion section. For context, as some of you are aware, I'm aiming to model C. elegans based on systematic high-throughput experiments - that is, to upload a worm. I'm still working on course requirements and lab training at Harvard's Biophysics Ph.D. program, but this remains the plan for my thesis.
Last semester I gave this lecture to Marvin Minsky's AI class, because Marvin professes disdain for everything neuroscience, and I wanted to give his students—and him—a fair perspective of how basic neuroscience might be changing for the better, and seems a particularly exciting field to be in right about now. The lecture is about 22 minutes long, followed by over an hour of questions and answers, which cover a lot of the memespace that surrounds this concept. Afterward, several students reported to me that their understanding of neuroscience was transformed.
I only just now got to encoding and uploading this recording; I believe that many of the topics covered could be of interest to the LW community (especially those with a background in AI and an interest in brains), perhaps worthy of discussion, and I hope you agree.