Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
This is true for a particular kind of utility-maximizer and a particular kind of safeguard, but it is not true for utility-maximizing minds and safeguards in general. For one thing, safeguards may be built into the utility function itself. For example, the AI might be programmed to disbelieve and ask humans about any calculated utility above a certain threshold, in a way that prevents that utility from influencing actions. An AI might have a deontology module, which forbids certain options as instrumental goals. An AI might have a special-case bonus for human participation in the design of its successors.
Safeguards certainly have problems, and no safeguard can reduce the probability of unfriendly AI to zero, but well-designed safeguards can reduce the probability of unfriendliness substantially. (Conversely, badly-designed safeguards can increase the probability of unfriendliness.)
I'm not sure deontological rules can work like that. I'm remembering an Asimov that has robots who can't kill but can allow harm to come to humans. They end up putting humans into deadly situation at Time A, as they know that they are able to save them and so the threat does not apply. And then at Time B not bothering to save them after all.
The difference with Asimov's rules, as far as I know, is that the first rule (or the zeroth rule that underlies it) is in fact the utility-maximising drive rather than a failsafe protection.