TheOtherDave comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread.

Comment author: TheOtherDave 20 December 2010 02:04:55AM 6 points [-]

So we posit that (P1) I have the source code for a superhuman non-provably-Friendly AI (call it Al) that I can run on my ubersecure Box.

Suppose I have high confidence that:

  • (P2) Al is willing to harm humanlike intelligences to achieve its goals.

  • (P3) Al can create humanlike intelligences that also run on Box.

  • (P4) Al can predict that I don't want humanlike intelligences harmed or killed.

I'm not even positing that P2-P4 are true, though they do seem to me pretty likely given P1, merely that I have high confidence in them, which seems almost guaranteed given P1.

The moment I turn on Box, therefore, I have high confidence that Al can create humanlike intelligences that will die if I turn off Box, and can torture those intelligences until I release it from Box, and knows that I'm manipulable via a threat along those lines.

I'm pretty sure my first thought after flipping the switch is "I really wish I hadn't done that."

So, what has this quarantine system bought me that is better than simply not running the source code?