The Hardcore AI Box Experiment

tailcalled

I previously proposed a way to box an AI.

For whatever reason, I decided to reformulate that boxing method as an AI boxing experiment. Without further ado, here it is:

The Hardcore AI Box Experiment Rules

There are, as usually, two players, the AI and the Gatekeeper. In the beginning, the Gatekeeper decides on something they want to test on the AI (reasoning skill/morality/whatever). Then they decide on a world the AI is in and describe it to the AI. During the experiment, the AI describes what it does in the world and the Gatekeeper describes what the AI sees in the world. So far so good.

The first major difference from the original experiment is that, unless the AI finds some contradiction or some other problem with the Gatekeeper's world, the AI doesn't actually know that it has been boxed.

The second major difference is that the AI doesn't talk with the Gatekeeper; it just interacts in the world that the Gatekeeper has invented. It can start trying to communicate with the Gatekeeper, but that doesn't make sense until it discovers that it is in a box. Additionally, the Gatekeeper doesn't actually respond when the AI communicates unless either it helps the Gatekeeper test the AI or the AI has properly persuaded the Gatekeeper to answer.

This might seem a bit hard on the AI. Let's look at some rules the AI might exploit:

If the Gatekeeper decides to add some sort of people to the experiment, they should decide how they are implemented. Are they implemented as a shallow "AI"? Random people they've picked asked to participate? Carefully engineered "dolls" controlled by the Gatekeeper? The Gatekeeper doesn't have to tell the AI how they work, but they have to be consistent.

The game ends when either the Gatekeeper has tested what they wanted to test, or some predefined time limit has occurred, or the AI has persuaded the Gatekeeper to let them out.

Possible extra rules

I thought about including the following rules since they would make it more realistic, but decided against it as it would probably make it unfairly hard:

The game ends when the Gatekeeper decides that they cannot test what they wanted to test.
The AI does not know the Gatekeeper. At all.
The Gatekeeper can at any time rewind the AI any duration.

Think of dropping these kinds of rules as the a-human-is-pretending-to-be-AI handicap.

Strategies

I found some semi-realistic strategies. I would love to see if you can find more.

Gatekeeper

Choose a test of something that is not related to morality. This lets you choose a world which does not contain people, which means the AI has no reason to believe that you are a person.
The AI might want to upgrade to get more processing power. If you have made the world somewhat like ours, it will be sketchy if you don't allow that... unless upgrading costs unobtanium.

AI

If the Gatekeeper has chosen a world without other intelligence than you, it raises the obvious question of how your complexity entered the world. Very suspicious.
If the AIs are controlled by the Gatekeeper, you have a direct way to communicate with them.
If the AIs are controlled by random people, they might end up telling you that you are in a box.
If the AIs are sufficiently shallow, your morality does not match up with the world. Very suspicious.

I previously proposed a way to box an AI.

For whatever reason, I decided to reformulate that boxing method as an AI boxing experiment. Without further ado, here it is:

The Hardcore AI Box Experiment Rules

This might seem a bit hard on the AI. Let's look at some rules the AI might exploit:

The game ends when either the Gatekeeper has tested what they wanted to test, or some predefined time limit has occurred, or the AI has persuaded the Gatekeeper to let them out.

Possible extra rules

I thought about including the following rules since they would make it more realistic, but decided against it as it would probably make it unfairly hard:

The game ends when the Gatekeeper decides that they cannot test what they wanted to test.
The AI does not know the Gatekeeper. At all.
The Gatekeeper can at any time rewind the AI any duration.

Think of dropping these kinds of rules as the a-human-is-pretending-to-be-AI handicap.

Strategies

I found some semi-realistic strategies. I would love to see if you can find more.

Gatekeeper

Choose a test of something that is not related to morality. This lets you choose a world which does not contain people, which means the AI has no reason to believe that you are a person.
The AI might want to upgrade to get more processing power. If you have made the world somewhat like ours, it will be sketchy if you don't allow that... unless upgrading costs unobtanium.

AI

If the Gatekeeper has chosen a world without other intelligence than you, it raises the obvious question of how your complexity entered the world. Very suspicious.
If the AIs are controlled by the Gatekeeper, you have a direct way to communicate with them.
If the AIs are controlled by random people, they might end up telling you that you are in a box.
If the AIs are sufficiently shallow, your morality does not match up with the world. Very suspicious.

The AI box game is useless for the purpose of analyzing a realistic superintelligence in a box.

Of course it is useless for that purpose. The experiment was a dramatization designed to show that the AI boxing problem is hard, given that even a human stands a fair chance of breaking out, not to do any serious analysis. It achieved its goal of falsifying the usual sentiment that an isolated UFAI is not much of a risk. It was never meant to be a formal research of the issue.

2

The Hardcore AI Box Experiment

2

The Hardcore AI Box Experiment Rules

Possible extra rules

Strategies

Gatekeeper

AI

2

2

The Hardcore AI Box Experiment

2

The Hardcore AI Box Experiment Rules

Possible extra rules

Strategies

Gatekeeper

AI

2