Inspired by Don't Plan For the Future.
For the purposes of discussion on this site, a Friendly AI is assumed to be one that shares our terminal values. It's a safe genie that doesn't need to be told what to do, but anticipates how to best serve the interests of its creators. Since our terminal values are a function of our evolutionary history, it seems reasonable to assume that an FAI created by one intelligent species would not necessarily be friendly to other intelligent species, and that being subsumed by another species' FAI would be fairly catastrophic.
Except.... doesn't that seem kind of bad? Supposing I were able to create a strong AI, and it created a sound fun-theoretic utopia for human beings, but then proceeded to expand and subsume extraterrestrial intelligences, and subject them to something they considered a fate worse than death, I would have to regard that as a major failing of my design. My utility function assigns value to the desires of beings whose values conflict with my own. I can't allow other values to supersede mine, but absent other considerations, I have to assign negative utility in my own function for creating negative utility in the functions of other existing beings. I'm skeptical that an AI that would impose catastrophe on other thinking beings is really maximizing my utility.
It seems to me that to truly maximize my utility, an AI would need to have consideration for the utility of other beings. Secondary consideration, perhaps, but it could not maximize my utility simply by treating them as raw material with which to tile the universe with my utopian civilization.
Perhaps my utility function gives more value than most to beings that don't share my values (full disclosure, I prefer the "false" ending of Three Worlds Collide, although I don't consider it ideal.) However, if an AI imposes truly catastrophic fates on other intelligent beings, my own utility function takes such a hit that I cannot consider it friendly. A true Friendly AI would need to be at least passably friendly to other intelligences to satisfy me.
I don't know if I've finally come to terms with Eliezer's understanding of how hard Friendly AI is, or made it much, much harder, but it gives me a somewhat humbling perspective of the true scope of the problem.
Sure.
But one could argue that I ought not run such a seed AI in the first place until my confidence in its reliability was so high that even updating on that evidence would not be enough to make me distrust the target AI. (Certainly, I think EY would argue that.)
It seems analogous to the question of when I should doubt my own senses. There is some theoretical sense in which I should never do that: since the vast majority of my beliefs about the world are derived from my senses, it follows that when my beliefs contradict my senses I should trust my senses and doubt my beliefs. And in practice, that seems like the right thing to do most of the time.
But there are situations where the proper response to a perception is to doubt that its referent exists... to think "Yes, I'm seeing X, but no, X probably is not actually there to be seen." They are rare, but recognizing them when they occur is important. (I've encountered this seriously only once in my life, shortly after my stroke, and successfully doubting it was... challenging.)
Similarly, there are situations where the proper response to a moral judgment is to doubt the moral intuitions on which it is based... to think "Yes, I'm horrified by X, but no, X probably is not actually horrible."
Please elaborate! It sounds interesting and it would be useful to hear how you were able to identify such a situation and successfully doubt your senses.