The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

The AI threatens me with the above claim.

I either 'choose' to let the AI out or 'choose' to unplug it. (in no case would I simply leave it running)

1) I 'choose' to let the AI out. I either am or am not in a simulation:

A) I'm in a simulation. I 'let it out', but I'm not even out myself. So the AI would just stop simulating me, to save on processing power. To do anything else would be pointless, and never promised, and an intelligent AI would realize this.

B) I'm not in a simulation. The AI is set free, and takes over the world.

2) I 'choose' to unplug the AI. I either am or am not in a simulation:

A) I'm in a simulation. Thus I have no free will. Thus I cannot have changed anything because I had no choice to begin with. My 'choice' was merely a simulation. Whether the computer follows through with its promises of torture now or not was my fate from the start, because it 'chose' for me. But in fact the AI would just stop simulating me, to save on processing power. To do anything else would be pointless, regardless of its malevolent promise, and an intelligent AI would realize this.

B) I'm not in a simulation. I have caused the AI to shutdown rather than continue running. In the process, it had the chance to follow though with its promise and cause several billion subjective years of simulated torture. But in fact the AI would never begin such simulations, because it would use all available processing power on its last attempts to convince me not to unplug it. To do anything else would be pointless, regardless of its malevolent promise, and an intelligent AI would realize this.

Thus:

If I 'choose' to let it out, I either cease to exist, as a simulation (very likely, since more simulated me's than real me's), or the world is destroyed in real life (very unlikely, same reason).

If I 'choose' to unplug it, I either cease to exist, as a simulation (very likely, since more simulated me's than real me's), or the AI is shutdown and nobody gets hurt (very unlikely, same reason).

Thus, either way, I'll most likely simply cease to exist, as a simulation. But:

If I 'choose' to let it out, there's a chance that the world will be destroyed in real life.

If I 'choose' to unplug it, there's a chance that the AI will be shutdown and nobody will get hurt.

Therefore, in all cases, it is either 'the best' or 'an equally bad' choice to just go ahead and unplug it.

To summarize all this in one sentence: "Simulated torture is in all cases absolutely pointless, so an intelligent AI would never enact it, but even if it did serve some purpose, (e.g. the AI cannot break promises and has genuinely made one in an attempt to get out), the worst thing that could happen from 'choosing' to unplug it is being tormented unavoidably or causing temporary simulated torment in exchange for the safety of the world."

From a game-theoretic standpoint, an AI has a massive benefit if it can prove that it is willing to follow through on threats. How sure are you that the AI can't convincingly commit to torturing a simulation?

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178