Boxing an AI?

tailcalled

Boxing an AI is the idea that you can avoid the problems where an AI destroys the world by not giving it access to the world. For instance, you might give the AI access to the real world only through a chat terminal with a person, called the gatekeeper. This is should, theoretically prevent the AI from doing destructive stuff.

Eliezer has pointed out a problem with boxing AI: the AI might convince its gatekeeper to let it out. In order to prove this, he escaped from a simulated version of an AI box. Twice. That is somewhat unfortunate, because it means testing AI is a bit trickier.

However, I got an idea: why tell the AI it's in a box? Why not hook it up to a sufficiently advanced game, set up the correct reward channels and see what happens? Once you get the basics working, you can add more instances of the AI and see if they cooperate. This lets us adjust their morality until the AIs act sensibly. Then the AIs can't escape from the box because they don't know it's there.

I'm not saying it would solve everything, I'm saying it would be a way to test significant aspects of AI without destroying the world, including significant aspects of their morality. It's not a "do this magic and morality for AI is solved" as much as a "this doable step helps parts of AI design, probably including preventing the worst classes of paperclip-maximization".

Yup, maybe. But don't you think it's likely that the values we want to impart to an AI are going to be ones that come out really radically differently for a universe without us in it? For instance, we might want the AI to serve us, which of course isn't even a concept that makes sense if it's in a simulated universe without us. Or we might want it to value all intelligent life, which is a thing that looks very different if the AI is the only intelligent life in its universe. So: yes, I agree that running the AI in a simulated world might tell us some useful things, but it doesn't look to me as if the things it could tell us a lot about overlap very much with the things we care most about.

3

Boxing an AI?

3

3

3

Boxing an AI?

3

3