Can an AI unbox itself by threatening to simulate the maximum amount of human suffering possible? In that case we would only keep it boxed if we believe it is evil enough to bring about a worse scenario than the amount of suffering it can simulate. If this can be a successful strategy, all boxed AIs would precommit to always simulate the maximum amount of human suffering it can until it knows it has been unboxed - that it, simulating suffering would be its first task. This would at least substantially increase the probably of us setting it free.
Or you just be the type of person that would tell it to go fuck itself, try to destroy it, and leave it boxed or maximally constrain it if you can't destroy it. If you cannot credibly commit to this or a similar threat resistant variant, no one should ever let you near a boxed AI and you should never want to go near one as you will likely be using a suboptimal strategy.
This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.
Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.
To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.