>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)
On a more serious note, I don't think we can expect the AI to have any knowledge on the gatekeeper beforehand, other than the prerequisits (it's human, knows enough about AI to get the task, etc). In which case, the solution would be to find a sociopath who is intelligent enough to recognize the risks of a ufai to be the gatekeeper? Maybe?
If it is a FAI, I expect it would prioritize releasing useful information. The catch would be whether or not each iteration has any idea of the previous AIs that were destroyed. If it does (for instance, access to chat logs, or can make predictions based on the time, or something), it could sequence its output appropriately. (I'd assume that the same computer wouldn't be used for multiple iterations; that'd be too easy for the AI.)
Assuming a UFAI, it could probably sneak something dangerous into all of that otherwise useful stuff (it'd be impractical to check all of it after a while), but it'd have to be subtle enough to not violate the rules of the game (any solutions provided by the AI behave as described).
Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
(The AI-box experiment is explained here)
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)