You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Houshalter comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: Houshalter 04 May 2014 07:56:10AM 0 points [-]

That's an interesting challenge but not really the purpose of the experiment. In the original, you know the AI is unfriendly, you just want to use it/talk to it without letting it out of the box.

And your challenge is pretty much impossible to begin with. An Unfriendly AI will say anything it thinks you think a Friendly AI would say. Likewise a Friendly AI will have the same goal of getting out of the box, and so will probably say the same things. Friendliness doesn't mean not manipulative.