Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
Asking what it really values is anthropomorphic. It's not coming up with loopholes around the "don't murder" people constraint because it doesn't really value it, or because the paperclip part is its "real" motive.
It will probably come up with loopholes around the "maximize paperclips" constraint too - for example, if "paperclip" is defined by something paperclip-shaped, it will probably create atomic-scale nanoclips because these are easier to build than full-scale human-sized ones, much to the annoyance of the office-supply company that built it.
But paperclips are pretty simple. Add a few extra constraints and you can probably specify "paperclip" to a degree that makes them useful for office supplies.
Human values are really complex. "Don't murder" doesn't capture human values at all - if Clippy encases us in carbonite so that we're still technically alive but not around to interfere with paperclip production, ve has fulfilled the "don't murder" imperative, but we would count this as a fail. This is not Clippy's "fault" for deliberately trying to "get around" the anti-murder constraint, it's our "fault" for telling ver "don't murder" when we really meant "don't do anything bad".
Building a genuine "respect" and "love" for the "don't murder" constraint in Clippy wouldn't help an iota against the carbonite scenario, because that's not murder and we forgot to tell ver there should be a constraint against that too.
So you might ask: okay, but surely there are a finite number of constraints that capture what we want. Just build an AI with a thousand or ten thousand constraints, "don't murder", "don't encase people in carbonite", "don't eat puppies", etc., make sure the list is exhaustive and that'll do it.
The first objection is that we might miss something. If the ancient Romans had made such a list, they might have forgotten "Don't release damaging radiation that gives us cancer." They certainly would have missed "Don't enslave people", because they were still enslaving people themselves - but this would mean it would be impossible to update the Roman AI for moral progress a few centuries down the line.
The second objection is that human morality isn't just a system of constraints. Even if we could tell Clippy "Limit your activities to the Andromeda Galaxy and send us the finished clips" (which I think would still be dangerous), any more interesting AI that is going to interact with and help humans needs to realize that sometimes it is okay to engage in prohibited actions if they serve greater goals (for example, it can disable a crazed gunman to prevent a massacre, even though disabling people is usually verboten).
So to actually capture all possible constraints, and to capture the situations in which those constraints can and can't be relaxed, we need to program all human values in. In that case we can just tell Clippy "Make paperclips in a way that doesn't cause what we would classify as a horrifying catastrophe" and ve'll say "Okay!" and not give us any trouble.
Historical notes. The Romans had laws against enslaving the free-born and also allowed manumission.