Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
True, but irrelevant, because humanity has never produced a provably-correct software project anywhere near as complex as an AI would be, we probably never will, and even if we had a mathematical proof it still wouldn't be a complete guarantee of safety because the proof might contain errors and might not cover every case we care about.
The right question to ask is not, "will this safeguard make my AI 100% safe?", but rather "will this safeguard reduce, increase, or have no effect on the probability of disaster, and by how much?" (And then separately, at some point, "what is the probability of disaster now and what is the EV of launching vs. waiting?" That will depend on a lot of things that can't be predicted yet.)
The difficulty of writing software guaranteed to obey a formal specification could be obviated using a two-staged approach. Once you believed that an AI with certain properties would do what you want, you could write one to accept mathematical statements in a simple, formal notation and output proofs of those statements in an equally simple notation. Then you would put it in a box as a safety precaution, only allow a proof-checker to see its output and use it to prove the correctness of your code.
The remaining problem would be to write a provably-correct simple proof verification program, which sounds challenging but doable.