Fictional evidence that this isn't obvious: in Blindsight, which I otherwise thought was a reasonably smart book (for example, it goes out of its way to make its aliens genuinely alien), the protagonists allow an unknown alien intelligence to communicate with them using a human voice. Armed with the idea of AI-boxing, this seemed so stupid to me that it actually broke my suspension of disbelief, but this isn't an obvious thought to have.
Spoiler: Gura ntnva, gur nyvra qbrf nccneragyl znantr gb chg n onpxqbbe va bar bs gur uhzna'f oenvaf.
Update 2013-09-05.
I have since played two more AI box experiments after this one, winning both.
Update 2013-12-30:
I have lost two more AI box experiments, and won two more. Current Record is 3 Wins, 3 Losses.