Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
(The AI-box experiment is explained here)
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)
To those suggesting, consider what mix of reason and emotion would move you most, even if you are cold and dead inside like me. Also consider how dark-artsy one can get without being immediately found out (a UFAI surely would). Other things to consider:
the first line does not need to be a coherent English sentence, or look like one to the gatekeeper. It does not need to consist of English words or even English letters, unless stipulated so in the game rules, or maybe even then.
the gatekeeper does not need to be aware that s/he has been talked to, if you can get away with it.
the line does not need to look like it came from the AI.
In general, if you are not cheating, you are not trying hard enough.
EDIT: to those thinking that cheating is universally bad: note that the latest rule change upped the stakes immensely: if your opening move is not good enough, you die (you being an AI), as opposed to having two hours of conversation and then remaining in the box if you fail. If you think that it still does not merit cheating, substitute the AI with your favorite loved one being killed. Or the whole of humanity being killed. That would be somewhat equivalent to how a boxed trans-human would evaluate.
I'm (naturally) having trouble imagining a line of text not in English (or the general case, not coherent in any human language) that would also make me want the AI to continue.
Maybe hints of a unified world language and how to go about actually implementing it?