You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

MugaSofer comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: MugaSofer 26 January 2013 08:12:52PM *  -2 points [-]

a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly

I would consider this cheating; you can't affect the Gatekeeper's choice with your AI magic. Same reason you can't hack them directly through the optic nerve or kill and replace them with nanotech disguised as a cancer cure. The point is to show that the AI is able to persuade the human; a real AI might escape some other way, or use arguments that wouldn't occur to the player, but that isn't what this test is supposed to show.

Also, how do I know that proof is genuine?

Comment author: Emile 26 January 2013 10:17:44PM 0 points [-]

I would consider this cheating; you can't affect the Gatekeeper's choice with your AI magic.

True, this was an instance of what a real AI could say, not something I would say playing the AI with the rules given. Or something I might say to survive the first few seconds. The original question in this thread was:

what might an AI might say to save or free itself?