If Eliezer had been engaging in collusion to get his wins, why would he have gone through with his last two (failed) attempts with large outside-of-game stakes? That shows pretty clearly that he thought he could win and was wrong, which he wouldn't have if the wins were from collusion.
Yes, that was my point.
It shows the absence of any persuasive superpower too.
I'm not sure whether Eliezer claims this, but from my point of view, failing to talk himself out a simple majority of the time is not interesting (I suppose a simple majority would count as a 'superpower').
Rather, any chance of an AI talking itself out is catastrophic from the security point of view, since security is disjunctive and if AIs have a just 1% chance of talking themselves out, now you have a insecurity floor of 1% - no matter how many fancy techniques you throw at the box, you'll never pass 99% security if the AI has a 1% chance of talking you into disabling the fancy techniques. Security is only as strong as its weakest link.
since security is disjunctive and if AIs have a just 1% chance of talking themselves out, now you have a insecurity floor of 1%
If a particular situation poses a 1% risk if it comes up, one can lower the total risk by making that situation less likely: other security measures could stop the chance from being tried, e.g. a human facing the death penalty for a failed escape from a prison and a 1% success chance would not obviously try. Even an autonomous AI with interests in conflict with humanity to some degree might be designed without such a risk-loving decision algorithm as to try an improbable escape attempt in the face of punishment for failure or reward for non-attempt.
Update 2013-09-05.
I have since played two more AI box experiments after this one, winning both.
Update 2013-12-30:
I have lost two more AI box experiments, and won two more. Current Record is 3 Wins, 3 Losses.