I always took the AI Box as being a specific subset of the meta-question: how can we be sure the AI is friendly?
"How do we completely isolate the AI" seems senseless since then we get ZERO information and have ZERO chance of releasing it, so why not save time and just not build the AI?
And, of course, I'd expect any reasonable approach to the meta-question to be more a matter of math and logic, and probably something where we don't even have the framework to start directly answering it. Certainly not a forum game :)
On the other hand, games are fun, and they get people thinking, so coming up with new games that genuinely help us to frame the problem is still probably useful! And if not, I'll still probably have fun playing them. It's why I love this variant of the AI Box - it's a quick, easy, and fun game that still taught me a lot about what I'd consider to be evidence-of-friendlines, what I was looking for as the gatekeeper :)
I always took the AI Box as being a specific subset of the meta-question: how can we be sure the AI is friendly?
And that subset was a demonstration that an unfriendly AI is unlikely be containable even if the communication channel is text-only.
"How do we completely isolate the AI" seems senseless since then we get ZERO information and have ZERO chance of releasing it, so why not save time and just not build the AI?
Of course completely isolating an AI is senseless. My (poorly expressed) point was that an AGI can probably get out regardless ...
Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
(The AI-box experiment is explained here)
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)