I agree with Allen and Wallach here. We don't know what an AGI is going to look like. Maybe the idea of a utility maximizer is unfeasible, and the AGIs we are capable of building end up operating in a fundamentally different way (more like a human brain, perhaps). Maybe morality compatible with our own desires can only exist in a fuzzy form at a very high level of abstraction, effectively precluding mathematically precise statements about its behavior (like in a human brain).
These possibilities don't seem trivial to me, and would undermine results from friendliness theory. Why not instead develop a sub-superintelligent AI first (perhaps an intelligence intentionally less than human), so that we can observe directly what the system looks like before we attempt to redesign it for greater safety.
The problem there is twofold; firstly, a lot of aspects would not necessarily scale up to a smarter system, and it's sometimes hard to tell what generalizes and what doesn't. Secondly, it's very very hard to pinpoint the "intelligence" of a program without running it; if we make one too smart it may be smart/nasty enough to feed us misleading data so that our final AI will not share moral values with humans. It's what I'd do if some aliens tried to dissect my mind to force their morality on humanity.
firstly, a lot of aspects would not necessarily scale up to a smarter system, and it's sometimes hard to tell what generalizes and what doesn't.
I agree, but certainly trying to solve the problem without any hands on knowledge is more difficulty.
Secondly, it's very very hard to pinpoint the "intelligence" of a program without running it
I agree, there is a risk that the first AGI we build will be intelligent enough to skillfully manipulate us. I think the chances are quite small. I find it difficult to image skipping dog level intelligence and human level intelligence and jumping straight to superhuman intelligence, but it is certainly possible.
It must be great working full-time on this stuff. I had the book on preorder and haven't had time to crack it open yet.
Colin Allen and Wendell Wallach, who wrote Moral Machines (MM) for OUP in 2009, address the problem of Friendly AI in their recent chapter for Robot Ethics (MIT Press). Their chapter is a precis of MM and a response to objections, one of which is:
Their brief response to this objection is:
Meh. Not much to this. I suppose The Singularity and Machine Ethics is another plank in bridging the two communities.
The most interesting chapter in the book is, imo, Anthony Beavers' "Moral Machines and the Threat of Ethical Nihilism."