Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
The space of possible AI behaviours is large, you can't succeed by ruling parts of it out. It would be like a cake recipe that went
Clearly the list can never be long enough. Chefs have instead settled on the technique of actually specifying what to do. (Of course the analogy doesn't stretch very far, AI is less like trying to bake a cake, and more like trying to build a chef.)
To extend the analogy a bit further, a valid, fully specified cake recipe may be made safer by appending "if cake catches fire, keep oven door closed and turn off heat." The point of safeguards would not be to tell the AI what to do, but rather to mitigate the damage if the instructions were incorrect in some unforeseen way.