Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
To extend the analogy a bit further, a valid, fully specified cake recipe may be made safer by appending "if cake catches fire, keep oven door closed and turn off heat." The point of safeguards would not be to tell the AI what to do, but rather to mitigate the damage if the instructions were incorrect in some unforeseen way.
Presumably with a very powerful AI, if it was going wrong we'd have seriously massive problems. So the best failsafe would be retrospective and rely on a Chinese Wall within the AI so it was banned from working round via another failsafe.
So when the AI hits certain prompts (realising that humans must all be destroyed) this sets off the hidden ('subconscious') failsafe that switches it off. Or possibly makes it sing Daisy, Daisy while slowly sinking into idiocy.
To clarify, I know nothing about AI or these 'genie' debates, so sorry if this has already been d... (read more)