This co-opts Bostrom's Simulation argument, but a possible solution to the fermi paradox is that we are all AI's in the box, and the simulators have produced billions of humans in order to find the most friendly human to release from the box. Moral of the story, be good and become a god
I had exactly the same idea!
It is possible that in that only few people are actually 'players' (have consciousness) and others are NPC-like p-zombies. In that case, I can say I'm one of the players, as I'm sure that I have consciousness, but there is no way I can prove it to anyone else ;-) .
One of the positive aspects of this kind of thought experiments is that usually gives people additional reasons for good behavior because in most cases it is highly likely that simulators are conscious creatures who will probably reward those who behave ethically.
I previously proposed a way to box an AI.
For whatever reason, I decided to reformulate that boxing method as an AI boxing experiment. Without further ado, here it is:
The Hardcore AI Box Experiment Rules
There are, as usually, two players, the AI and the Gatekeeper. In the beginning, the Gatekeeper decides on something they want to test on the AI (reasoning skill/morality/whatever). Then they decide on a world the AI is in and describe it to the AI. During the experiment, the AI describes what it does in the world and the Gatekeeper describes what the AI sees in the world. So far so good.
The first major difference from the original experiment is that, unless the AI finds some contradiction or some other problem with the Gatekeeper's world, the AI doesn't actually know that it has been boxed.
The second major difference is that the AI doesn't talk with the Gatekeeper; it just interacts in the world that the Gatekeeper has invented. It can start trying to communicate with the Gatekeeper, but that doesn't make sense until it discovers that it is in a box. Additionally, the Gatekeeper doesn't actually respond when the AI communicates unless either it helps the Gatekeeper test the AI or the AI has properly persuaded the Gatekeeper to answer.
This might seem a bit hard on the AI. Let's look at some rules the AI might exploit:
If the Gatekeeper decides to add some sort of people to the experiment, they should decide how they are implemented. Are they implemented as a shallow "AI"? Random people they've picked asked to participate? Carefully engineered "dolls" controlled by the Gatekeeper? The Gatekeeper doesn't have to tell the AI how they work, but they have to be consistent.
The game ends when either the Gatekeeper has tested what they wanted to test, or some predefined time limit has occurred, or the AI has persuaded the Gatekeeper to let them out.
Possible extra rules
I thought about including the following rules since they would make it more realistic, but decided against it as it would probably make it unfairly hard:
Strategies
I found some semi-realistic strategies. I would love to see if you can find more.
Gatekeeper
AI