Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
I still don't see how laws as barriers could be effective. People are arguing whether it's possible to write highly specific failsafe rules capable of acting as barriers, and the general feeling is that you wouldn't be able to second-guess the AI enough to do that effectively. I'm not sure what replacing these specific laws with a large corpus of laws achieves. On the plus side, you've got a large group of overlapping controls that might cover each others' weaknesses. But they're not specially written with AI in mind and even if they were, small political shifts could lead to loopholes opening. And the number also means that you can't clearly see what's permitted or not: it risks an illusion of safety simply because we find it harder to think of something bad an AI could do that doesn't break any law.
Not to mention the fact that a utility-maximising AI would seek to change laws to make them better for humans, so the rules controlling the AI would be a target of their influence.
I guess here I'd reiterate this point from my latest reply to orthonormal:
It may not be helpful to think of some grand utility-maximising AI that constantly strives to maximize human happiness or some other similar goals, and can cause us to wake up in some alternate reality some day. It would be nice to have some AIs working on how to maximize some things human's value, e.g., health, happiness, attractive and sensibl... (read more)