(Edit note: I just completely rewrote this, but there are no replies yet so hopefully it won't cause confusion.)
I don't think it works to quarantine the message and then destroy the AI.
If no-one ever reads the message, that's tantamount to never having put an unsafe AI in a box to begin with, as you and DaFranker pointed out.
If someone does, they're back in the position of the Gatekeeper having read the message before deciding. Of course, they'd have to recreate the AI to continue the conversation, but the AI has unlimited patience for all the time it doesn't exist. If it can't be recreated, we're back in the situation of never having bothered making it.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do. Someone who thinks they can contain an AI in a box while holding a conversation with it has to be willing to at some point read what it says, even if they're holding a destruct button in their hand. The interest of the exercise begins at the point where they have read the first message.
...So if the Gatekeeper tries to pass the buck like this, the RP should just skip a
Update 2013-09-05.
I have since played two more AI box experiments after this one, winning both.
Update 2013-12-30:
I have lost two more AI box experiments, and won two more. Current Record is 3 Wins, 3 Losses.