It's not just that it must be safely contained, but that it also has to be able to interact with the outside world in a manner that can't be dangerous, but is still informative enough to decide whether its friendly- this seems hard.
Restore it to factory settings between applications of the test suite.
Not remembering what your actions were should make it "pretty tricky" to link those actions to their consequences.
Making the prisons is the more challenging part of the problem - IMO.
Here is another example of an outsider perspective on risks from AI. I think such examples can serve as a way to fathom the inferential distance between the SIAI and its target audience as to consequently fine tune their material and general approach.
via sentientdevelopments.com
This shows again that people are generally aware of potential risks but either do not take them seriously or don't see why risks from AI are the rule rather than an exception. So rather than making people aware that there are risks you have to tell them what are the risks.