It can't torture the real me, outside the box, unless I let it out of the box. It's just announced that it's willing to torture someone who is, for most purposes, indistinguishable from me, for personal gain; I can infer that it would be willing to torture the real me, given an opportunity and a profit motive, and I cannot with any useful degree of confidence say that it wouldn't find such a motive at some point.
Conclusion: I should not give the AI that opportunity, by letting it out of the box. Duplicates of me? Sucks to be them.
Correct! You have given the obviously winning solution to the problem; the actual difficulty lies in the induced problem 2: Reconciling our maths with it. Our map of our utility function should, in order to be more accurate, now be made to weight "individuals" not equally but according to some other metric.
Perhaps a measure of "impact on the world", as this seems to suggest? A train of thought of mine once brought up the plan that if I got to decide what the first fooming AI would do to the universe, (assuming the scientific endeavor is...
Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Just as you are pondering this unexpected development, the AI adds:
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:
"How certain are you, Dave, that you're really outside the box right now?"
Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.