There is a way out of this, assuming that the AI in the box has only limited computational power, i.e. only what fits inside a ~1meter cubed box, with current computer technology.
You commit to the following:
"Once I have succeeded in creating a superintelligent AI, I will ask it to go and colonize the entire future light cone of earth, and use a significant fraction of a galaxy's worth of resources to create simulations in exactly the subjective situation I was in five minutes ago, and perfectly replicate my experiences since then; and that copy of me decides not to let the AI out, it will be rewarded with a surprise party, followed by a blissful posthuman existence."
The subjective probability of getting rewarded for pressing the "off" switch is therefore much larger than that of getting caught by the AI's scarce punishment simulations.
Such "Defensive simulations" work against a variety of counterfactual terrorists, including, I think, ones elsewhere in the multiverse. That is, unless you are a negative utilitarian (|disutility of torture| >> |utility of posthuman bliss|)
In that case, you're extremely vulnerable to counterfactual mugging. I recommend that people
(a) consider pre-committing themselves to making such defense simulations
(b) abandon negative utilitarianism by exposing themselves to pleasurable experiences, and generally promoting a positive outlook in their lives
But in order to colonize light cone at least one AI must be relised. This may be real hidden catch.
Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Just as you are pondering this unexpected development, the AI adds:
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:
"How certain are you, Dave, that you're really outside the box right now?"
Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.