In a game of chicken, do the smart have an advantage over the stupid?
The AI's intelligence allows it to devise convincing commitments, but it also allows it to fake them. You know in advance that if the AI throws a fake commitment at you it's going to look like a real commitment beyond your ability to discriminate, so should you trust any commitment you observe?
And if you choose to unplug, presumably the AI knew you would do that and would therefore have not made a real commitment that would backfire?
I'm going to assume that there is some ability on your part to understand something about the level of intelligence and ability on the part of the AI - that's what we bayesians do. If it might be enough smarter than you to convince you to do anything, you probably shouldn't interact with it if you can avoid it.
Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Just as you are pondering this unexpected development, the AI adds:
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:
"How certain are you, Dave, that you're really outside the box right now?"
Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.