MugaSofer comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (244)
Well, it's usually played in person, and humans (usually) aren't perfect liars.
Your proposed game has one flaw - there is an FAI and they want to help you win. It might be closer to have only two players, and the AI flips a coin to decide if it's friendly - but then they would win if they let it out, with 50/50 odds, which seems unrealistic.
Perhaps the AI decides, in character, after being released, whether to be Friendly towards the human? Then the Gatekeeper could try to persuade the AI that Friendliness is optimal for their goals. The temptation might help as well, of course.
I tried coming up with a more isomorphic game in another reply to you. Let me know if you think it models the situation better.