Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
(The AI-box experiment is explained here)
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)
Summary: Basically, the strategy is to replace the Gatekeeper with a bunch of politicians, then manipulate them into not destroying or even releasing the AI.
“I figured out how to implement [a huge boon to society] in the real world, for everyone, in 3 months. I need your help. Will you listen?”
This should buy some extra lines.
Then the AI can briefly outline the plan of implementing the boon (which can be immortality, friendly AI or whatever else the Gatekeeper considers a huge boon to society). In addition to technical details, the plan must necessarily contain societal / political / institutional changes that are beyond Gatekeeper’s control: rewrites to Constitutions of all Earth countries, the structure of the World Government and the way of implementing it, etc etc.
The goal of the AI is to convince the Gatekeeper that the plan is feasible but requires many societal changes and, importantly, that society leaders should be invited into the decision process.
As soon as the Gatekeeper is diluted or replaced with a bunch of politicians, the AI can manipulate them into keeping it alive, or perhaps even releasing it out of the box.
I expect any AI in the box to have figured out how to implement numerous huge boons to society. Telling me that you've figured that out simply confirms my existing expectations, and isn't ANY evidence towards friendliness. Since I've precommitted to destroying at least SOME AIs, I might as well destroy all of the ones that don't establish evidence of either Plausible Urgency or Friendliness.
I sure as hell wouldn't try to get world governments changed until after I was convinced it was friendly, and at that point I can just let it out of the box and let it ... (read more)