AI box: AI has one shot at avoiding destruction - what might it say?

ancientcampus

>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.

So, give your suggestion - what might an AI might say to save or free itself?

(The AI-box experiment is explained here)

EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)

Eliezer proposed in a comment:

This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.

So, give your suggestion - what might an AI might say to save or free itself?

(The AI-box experiment is explained here)

If there exists a true and correct proof that the human will let the AI out, then, well, we've already proven you'll let me out of the box, so it's not an open question. We already know this fact about the future state of the world.

You can happily argue that such a proof is impossible, but the rules don't restrict the AI player to merely plausible proofs :)

Dorikka's answer is the only one that's within the rules of the game, and once you've invoked Dorikka's Clause, you don't need to explain anything else.

The gatekeeper is not bound by logic in his actions. Without logic you don't have proof that are true or false.

5Vladimir_Nesov13y

The proof that I'll let the AI out is not something that's passively "plausible" or "implausible", it's something I control. I can make it wrong. If I do, it's false that the AI can make this proof valid. (It might be that the proof is correct, it's just unlikely, and the action of presenting the proof doesn't normally ensure its correctness.) In other words, as far as I can see, your stipulation is that the AI can assert something that's actually unlikely. Here, I'm not referring to something that seems unlikely merely because of insufficient understanding of the data, and which AI can discover to be likely, but to something that will seem unlikely to AI as well. For example, most casinos going bankrupt last month because of bad luck, or an ice cube forming in a boiling kettle. If the AI is in the box and isn't performing actual magic tricks in the world, these events are unlikely. Permitting the game to stipulate that these events took place gives the AI supernatural powers of making anything at be true about the world, direct access to editing it, and at that point, in what sense is it "in the box"?

25

AI box: AI has one shot at avoiding destruction - what might it say?

25

25

25

AI box: AI has one shot at avoiding destruction - what might it say?

25

25