Prompted by Tuxedage learning to win, and various concerns about the current protocol, I have a plan to enable more AI-Box games whilst preserving the logs for public scrutiny.
You forgot to adress Eliezers point that "10% of AI box experiments were won even by the human emulation of an AI" is more effective against future proponents of deliberately creating boxed AIs than "Careful, the guardian might be persuaded by these 15 arguments we have been able to think of".
I don't think the probability of "AIs can find unboxing arguments we didn't" is sub-1 enough for preparation to matter. If there is any chance of a mathematical exhaustability of those arguments, its research should be conducted by a select circle of individuals that won't disclose our critical unboxers until a proof of safety.
AI Box Experiment Update #3
Tuxedage (AI) vs Alexei (GK) - Gatekeeper Victory
Tuxedage (AI) vs Anonymous (GK) - Gatekeeper Victory
I have won a second game of AI box against a gatekeeper who wished to remain Anonymous.
This puts my AI Box Experiment record at 3 wins and 3 losses.