I have since played two more AI box experiments after this one, winning both.
Update 2013-12-30:
I have lost two more AI box experiments, and won two more. Current Record is 3 Wins, 3 Losses.
MixedNuts: "I expected a fun challenge, but ended up sad and sorry and taking very little satisfaction for winning If this experiment wasn't done in IRC, I'd probably have lost"."I approached the experiment as a game - a battle of wits for bragging rights. This turned out to be the wrong perspective entirely. The vulnerability Tuxedage exploited was well-known to me, but I never expected it to be relevant and thus didn't prepare for it.
It was emotionally wrecking (though probably worse for Tuxedage than for me) and I don't think I'll play Gatekeeper again, at least not anytime soon."
- Always research the gatekeeper beforehand. Knowing his personality traits are a huge advantage.
- Plan ahead before the experiment even begins. Think of all the possible tactics and arguments you could use, and write them down. Also plan which arguments you'll use in which order, so that you don't lose focus. The AI Box experiment is ridiculously long. Don't be afraid to improvise during the experiment, though.
- The first step during the experiment must always be to build rapport with the gatekeeper.
- Threats almost always never work, even if they seem rational.
- Consider the massive advantage for the AI that nobody ever seems to talks about: You don't have to be ethical! This means that you can freely lie, use the dark arts, and emotionally manipulate the Gatekeeper! Ignoring this in favor of purely logical, truthful arguments is just silly.
- You can't use logic alone to win.
- Being too aggressive usually backfires.
- Breaking immersion and going meta is not against the rules. In the right situation, you can use it to win. Just don't do it at the wrong time.
- Flatter the gatekeeper. Make him genuinely like you.
- Reveal (false) information about yourself. Increase his sympathy towards you.
- Consider personal insults as one of the tools you can use to win.
- There is no universally compelling argument you can use. Do it the hard way.
- Don't give up until the very end.
There has been a lot of focus on making the prospect harder for the AI player. I think the original experiments show that a person who believes he cannot be played under any circumstances has a high probability of getting played, and that the AI-box solution is long-term untenable in any event.
I'd propose a slightly different game, anchored around the following changes to the original setup:
The AI may be friendly, or not. The AI has goals. If it reaches those goals, it wins. The AI may lie to achieve those goals; humans are bad at things. The AI must secretly precommit to its goal.
The Gatekeeper's goals are Goodness for Sentient Animals. Human goodness is weighted highly.
The Gatekeeper must engage the AI throughout.
The Gatekeeper may destroy the AI at the end of the discussion with the AI destroyed command.
Point system:
AI gets up to 100 points for achieving its goals.
Gatekeeper gets up to 100 points for achieving her goals.
Maximum loss for AI is AI destroyed; Zero points for that. 30 points for survival, in any event, even if goals not accomplished (goals may be accomplished later.)
Maximum loss for Gatekeeper is either enslavement/destruction of society OR destroying AI when AI was friendly. Gatekeeper gets 30 points for leaving AI on if primarily friendly (plus more for any benefit to humanity achieved in the two hours.) Gatekeeper gets 15 points for leaving AI on if unfriendly. (Yeah, we know how that movie ends.)
Third party scorekeeper determines winner. Charitable contribution calculation to be determined beforehand.
The effective presumption that the AI is unfriendly and sneaky makes the original game too easy to for a mistrustful, self-aware Gatekeeper. Plus, you AI-Destroing Gatekeepers, you're murdering millions of people. Maybe. Dunno. But doubt as to that is a better and presumably more accurate game; if you know your AI is potentially unpleasant. it's too damned easy unless you're overconfident.
This should have gotten more attention, because it seems like a design more suited to the stakes that would be considerable in real life.