Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
Why don't I just built a paperclip maximiser with two utility functions, which it desires to resolve 1.maximise paperclip production 2.do not exceed safeguards
where 2 is weighed far more highly than 2. This may have drawbacks in making my paperclip maximiser less efficient than it might be (it'll value 2 so highly it will make much less paperclips than maximum, to make sure it doesn't overshoot), but should prevent it from grinding our bones to make it paperclips.
Surely the biggest issue is not the paperclip maximiser, but the truly intelligent AI which we want to fix many of our problems: the problem being that we'd have to make our safeguards specific, and missing something out could be disastrous. If we could teach an AI to WANT to find out where we had made a mistake, and fix that, that would be better. Hence friendly AI.