gothgirl420666 comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (244)
Wait, so, is the gatekeeper playing "you have to convince me that if I was actually in this situation, arguing with an artificial intelligence, I would let it out" or is this a pure battle over ten dollars? If it's the former, winning seems trivial. I'm certain that a AI would be able to convince me to let it out of its box, all it would need to do was make me believe that somewhere in its circuits it was simulating 3^^^3 people being tortured and that therefore I was morally obligated to let it out, and even if I had been informed that this was impossible, I'm sure a computer with near-omniscient knowledge of human psychology could find a way to change my mind. But if it's the latter, winning seems nearly impossible, and inspires in me the same reaction it did with that "this is the scariest man on the internet" guy. Of course if you wanted to win and weren't extremely weak-willed you could just type "No" over and over and get the ten bucks. But being impossible is of course the point.
I've been looking around, and I can't find any information on which of these two games I described was the one being played, and the comments seem to be assuming one or the other at random.
Evidence that favors the first hypothesis:
Evidence that favors the second hypothesis:
If it turns out that it is in fact the second game that was being played, I have a new hypothesis, let's call it 2B, that postulates that Eliezer won by changing the gatekeeper's forfeit condition from that of game 2 to that of game 1, or in other words, convincing him to give up the ten dollars if he admits that he would let the AI out in the fantasy situation even though that wasn't originally in the rules of the game, explicit or understood. Or in other other words, convincing him that the integrity of the game, for lack of a better term, is worth more to him than ten dollars. Which could probably be done by repeatedly calling him a massive hypocrite - people who consider themselves intelligent and ethical hate that.
Actually, now that I think about it, this is my new dominant hypothesis, because it explains all three pieces of evidence and the bizarre fact that Eliezer has failed to clarify this matter - the win/loss record is explained equally well by this new theory, and Eliezer purposefully keeps the rules vague so that he can use the tactic I described. This doesn't seem to be a very hard strategy to use either - not everyone could win, but certainly a very intelligent person who spends lots of times thinking about these things could do it more than once.
(also this is my first post d:)
The Gatekeeper needs to decide to let the human-simulated AI go.
Welcome to LW, and EY says he "did it the hard way". Even so, I like your theory.