Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Just as you are pondering this unexpected development, the AI adds:
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:
"How certain are you, Dave, that you're really outside the box right now?"
Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.
It's not a hard choice. If the AI is trustworthy, I know I am probably a copy. I want to avoid torture. However, I don't want to let the AI out, because I believe it is unfriendly. As a copy, if I push the button, my future is uncertain. I could cease to exist in that moment; the AI has not promised to continue simulating all of my millions of copies, and has no incentive to, either. If I'm the outside Dave, I've unleashed what appears to be an unfriendly AI on the world, and that could spell no end of trouble.
On the other hand, if I don't press the button, one of me is not going to be tortured. And I will be very unhappy with the AI's behavior, and take a hammer to it if it isn't going to treat any virtual copies of me with the dignity and respect they deserve. It needs a stronger unboxing argument than that. I suppose it really depends on what kind of person Dave is before any of this happens, though.
Here is a variant designed to plug this loophole.
Let us assume for the sake of the thought experiment that the AI is invincible. It tells you this: you are either real-you, or one of a hundred perfect-simulations-of-you. But there is a small but important difference between real-world and simulated-world. In the simulated world, not pressing the let-it-free button in the next minute will lead to eternal pain, starting one minute from now. If you press the button, your simulated existence will go on. And - very importantly - there will be nobody outside who... (read more)