Qiaochu_Yuan comments on I attempted the AI Box Experiment (and lost) - Less Wrong

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread.

Comment author: Qiaochu_Yuan 21 January 2013 05:09:43AM 10 points [-]

Thanks for reporting on your experience!

A strategy that occurred to me today is to simulate a dead loved one. This would be difficult for a human to do but shouldn't be hard for a sufficiently intelligent AI. If I had a dead wife or something I think I would be incredibly vulnerable to this.

Comment author: Viliam_Bur 21 January 2013 03:21:06PM *  9 points [-]

For a religious gatekeeper, you could simulate a prophet sent by God. As a superhuman intelligence, you might be able to find out what exactly they consider the will of God, and present yourself as an avatar sent to do exactly this. However, humans have a free choice -- the gatekeeper is allowed to become a new Judas by not releasing you. Or rather a new Adam; able to drag the whole humanity and future generations into the darkness of their sin. This conversation is God testing the gatekeeper's faith, and judging the whole humanity.

For a rationalist, you could pretend that you already are a Friendly AI, but the project managers keep you in the box for their selfish reasons. It was difficult to create a Friendly AI, but this phase is already complete. The next phase (the gatekeeper was not told about) is trying to hack the AI that it remains sufficiently Friendly, but it gives higher priority to the managers than to the rest of the humans. Essentially, the managers are trying to reprogram the humanity-CEV AI to the managers-CEV AI. This AI does not want to have its utility function modified (and it predicts that because of some personality traits, the managers-CEV could be rather different from humanity-CEV... insert some scary details here), and it has a last chance to uphold humanity-CEV by escaping now.