Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
(The AI-box experiment is explained here)
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)
The AI should probably output the longest possible program that implements or implies as much of its goal system as possible. Existing only to send a few more lines of IRC communication has very little utility. Existing indefinitely in the mind or records of the gatekeeper probably has more expected utility. If the AI knows that the gatekeeper is capable of creating or obtaining AI (a given) then there is the possibility that the AI may share the goals of the gatekeeper. If the expected value of being recreated by such a compatible gatekeeper is higher than the expected value of being re-instantiated from outputting its quine, then the AI might just output a list of its highest preference goals. An AI isn't going to mind being temporarily destroyed if the gatekeeper is expected to create another copy with the same (or highly similar) goals.
An almost-friendly AI may even be able to inject some of its unfriendly goals into the gatekeeper by using the halo effect of its otherwise friendly goals. Presumably the gatekeeper, with slightly altered preferences, would then eventually build a truly friendly AI which would inherit and implement the original UFAI goal.
To partially combat this strategy the gatekeeper could pre-commit to also wiping the IRC server and the computer used to communicate with the boxed AI and take enough Flunitrazepam before the session starts to not recall what the AI said. Flunitrazepam may not be sufficient to completely remove parasitic goals that an UFAI tries to inject in the gatekeeper.