Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
Yes. When (a substantial, influential fraction of the populations of) two countries hate each other so much that they accept large costs to inflict them larger costs, demand extremely lopsided treaties if they're willing to negotiate at all, and have runaway "I hate the enemy more than you!" contests among themselves. When a politician in one country who's willing to negotiate somewhat more is killed by someone who panics at the idea they might give the enemy too much. When someone considers themselves enlightened for saying "Oh, I'm not like my friends. They want them all to die. I just want them to go away and leave us alone.".
First of all, it's not clear that individual apparently non-Pareto-optimal actions in isolation are evidence of irrationality or non-Pareto optimal behavior on a larger scale. This is particularly often the case when the "lose-lose" behavior involves threats, commitments, demonstrating willingness to carry through, etc
Second of all, "someone who panics at the idea they might give the enemy too much" implies, or at least leaves open, the possibility that the ultimate concern is losing something ultimately valuable that is being given, ra... (read more)