Boxing an AI?

tailcalled

Boxing an AI is the idea that you can avoid the problems where an AI destroys the world by not giving it access to the world. For instance, you might give the AI access to the real world only through a chat terminal with a person, called the gatekeeper. This is should, theoretically prevent the AI from doing destructive stuff.

Eliezer has pointed out a problem with boxing AI: the AI might convince its gatekeeper to let it out. In order to prove this, he escaped from a simulated version of an AI box. Twice. That is somewhat unfortunate, because it means testing AI is a bit trickier.

However, I got an idea: why tell the AI it's in a box? Why not hook it up to a sufficiently advanced game, set up the correct reward channels and see what happens? Once you get the basics working, you can add more instances of the AI and see if they cooperate. This lets us adjust their morality until the AIs act sensibly. Then the AIs can't escape from the box because they don't know it's there.

It would actually tell us a lot of useful things.

First of all, there is the general problem of 'does this AI work?' This includes the general intelligence/rationality-related problems, but possibly also other problems, such as whether it will wirehead itself (whether a box can test that really depends a lot on the implementation).

The morality-stuff is tricky and depends on a lot of stuff, especially on how the AI is implemented. It seems to dangerous to let it play a multiplayer game with humans, even with most restrictions I can think of. However, how to test the morality really depends on how its human-detection system has been implemented. If it just uses some 'humans generally do these stupid things' heuristics, you can just plop down a few NPCs. If it uses somewhat smarter heuristics, you might be able to make some animals play the game and let the AI care for them. If it picks something intelligent, you might be able to instantiate other copies of the AI with vastly different utility functions. Basically, there are a lot of approaches to testing morality, but it depends on how the AI is implemented.

3

Boxing an AI?

3

3

3

Boxing an AI?

3

3