Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
(The AI-box experiment is explained here)
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)
The more I look at the comments, the more I am convinced that the AI Box experiment is too weak a demonstration of transhuman powers. Most of the proposals here fall under this basic trope (feel free to give a tvtropes link): to achieve what AI claims, it'd have to have powers formidable enough to not need the gatekeeper's help getting out of the box in the first place. Given that, why would an AI need to talk to the gatekeeper at all?
So I suggest a modified AI boxing experiment: the gatekeeper designs an AI box with no communication channel at all. It will still have an AI inside and enough initial data fed in for the AI to foom. The AI will attempt to break out of the box by any and all means possible.
Here is a relevant previous thread.
So, we're being asked to imagine an arbitrary superhuman AI whose properties and abilities we can't guess at except to specify arbitrarily, is trying to get out of a box whose security protocols and strength we can't guess at except to specify arbitrarily, and trying to decide whether it does?
Meh. Superman vs Batman is more entertaining.