Wei_Dai comments on Hacking the CEV for Fun and Profit - Less Wrong

52 Post author: Wei_Dai 03 June 2010 08:30PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (194)

You are viewing a single comment's thread. Show more comments above.

Comment author: Yvain 05 June 2010 02:35:43PM *  4 points [-]

EDIT: Doesn't work, see Wei Dai below.

This isn't a bug in CEV, it's a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.

But allowing Dr. Evil to clone himself is bad; it will reduce the utility of all currently existing humans except Dr. Evil.

If a normal, relatively nice but non-philosopher human ascended to godhood, ve would probably ignore Dr. Evil's clones' wishes. Ve would destroy the clones and imprison the doctor, because ve was angry at Dr. Evil for taking the utility-lowering action of cloning himself and wanted to punish him.

But everything goes better than expected! Dr. Evil hears a normal human is ascending to godhood, realizes making the clones won't work, and submits passively to the new order. And rationalists should win, so a superintelligent AI should be able to do at least as well as a normal human by copying normal human methods when they pay off.

So an AI with sufficiently good decision theory could (I hate to say "would" here, because making quick assumptions that an AI would do the right thing is a good way to get yourself killed) use the same logic. Ve would say, before even encountering the world "I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility." Then ve opens ver sensors, sees Dr. Evil and his clones, and says "Sorry, I'd like to help you, but I precommited to not doing so," kills all of the clones as painlessly as possible, and get around to saving the world.

Comment author: Wei_Dai 05 June 2010 05:22:11PM 8 points [-]

"I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility."

Wait, increase utility according to what utility function? If it's an aggregate utility function where Dr. Evil has 99% weight, then why would that precommitment increase utility?

Comment author: Yvain 05 June 2010 08:25:12PM 5 points [-]

You're right. It will make a commitment to stop anyone who tries the same thing later, but it won't apply it retroactively. The original comment is wrong.