TheOtherDave comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: paulfchristiano 20 December 2010 07:15:40PM 6 points [-]

In short, the security guarantee of the scheme prevents precisely this.

In long, the way it works it that I as the human experiment run the AI and extract a message from it. The AI has no control over the message other than what it will decrypt to. Since I never decrypt anything other than the message, the entire influence of the AI on the rest of the world comes from that one little message.

This message then gets passed to an arbiter (the AI now destroyed, having no possibility of influencing the wold), whose environment has also been carefully cryptographically protected so that the message passed by the AI can't corrupt the hardware it is running on. The arbiter does its thing, then destroys the original message of the AI (since we proved it would) and publishes its transformed version, which gets decrypted.

I guess the metaphor of an AI with an arbiter in a cage is a little misleading. It is more like, the AI gets to send a slip of paper out of the cage to the arbiter, who then carefully censors it before passing it on to the real world. The point of using an arbiter is that he is very simple, unlike a human, so we can actually prove that he won't fail at his task.

Comment author: TheOtherDave 20 December 2010 07:42:07PM 1 point [-]

Ah! Yes, that clarifies the situation. Thank you.