Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
I agree (with this question) - what makes us so sure that "maximize paperclips" is the part of the utility function that the optimizer will really value? Couldn't it symmetrically decide that "maximize paperclips" is a constraint on "try not to murder everyone"?
Asking what it really values is anthropomorphic. It's not coming up with loopholes around the "don't murder" people constraint because it doesn't really value it, or because the paperclip part is its "real" motive.
It will probably come up with loopholes around the "maximize paperclips" constraint too - for example, if "paperclip" is defined by something paperclip-shaped, it will probably create atomic-scale nanoclips because these are easier to build than full-scale human-sized ones, much to the annoyance of the offi... (read more)