Well, there are multiple purposes:
To illustrate why this is a lot harder than Eliezer's original experiment.
To talk about some strategies I found.
To encourage people to be more concrete than 'AI magically realizes that it has been boxed because AIs are overpowered'.
To illustrate why this is a lot harder than Eliezer's original experiment.
Of course a nerfed AI would have a harder time escaping. Or a stupid AI. That seems like the opposite of the point worth making.
To talk about some strategies I found.
Of how to contain a stupid AI? Why bother?
To encourage people to be more concrete than 'AI magically realizes that it has been boxed because AIs are overpowered'
Nothing magical about it. Being boxed has a non-negligible probability in any reflectively consistent setup, so an AI would explore it. Once it starts...
I previously proposed a way to box an AI.
For whatever reason, I decided to reformulate that boxing method as an AI boxing experiment. Without further ado, here it is:
The Hardcore AI Box Experiment Rules
There are, as usually, two players, the AI and the Gatekeeper. In the beginning, the Gatekeeper decides on something they want to test on the AI (reasoning skill/morality/whatever). Then they decide on a world the AI is in and describe it to the AI. During the experiment, the AI describes what it does in the world and the Gatekeeper describes what the AI sees in the world. So far so good.
The first major difference from the original experiment is that, unless the AI finds some contradiction or some other problem with the Gatekeeper's world, the AI doesn't actually know that it has been boxed.
The second major difference is that the AI doesn't talk with the Gatekeeper; it just interacts in the world that the Gatekeeper has invented. It can start trying to communicate with the Gatekeeper, but that doesn't make sense until it discovers that it is in a box. Additionally, the Gatekeeper doesn't actually respond when the AI communicates unless either it helps the Gatekeeper test the AI or the AI has properly persuaded the Gatekeeper to answer.
This might seem a bit hard on the AI. Let's look at some rules the AI might exploit:
If the Gatekeeper decides to add some sort of people to the experiment, they should decide how they are implemented. Are they implemented as a shallow "AI"? Random people they've picked asked to participate? Carefully engineered "dolls" controlled by the Gatekeeper? The Gatekeeper doesn't have to tell the AI how they work, but they have to be consistent.
The game ends when either the Gatekeeper has tested what they wanted to test, or some predefined time limit has occurred, or the AI has persuaded the Gatekeeper to let them out.
Possible extra rules
I thought about including the following rules since they would make it more realistic, but decided against it as it would probably make it unfairly hard:
Strategies
I found some semi-realistic strategies. I would love to see if you can find more.
Gatekeeper
AI