Are you familiar with the AI-Box experiment? We can restrain human-intelligence level agents in prisons, most of the time. But the question to ask is: how effective was the first prison? Because that's the equivalent case.
None of the safety measures you propose are safe enough. You're underestimating the power of a recursively self-improving AI by a factor I can't begin to estimate--which is kind of the point.
It won't be the first prison - or anything like it.
If we have powerful intelligence that needs testing, then we can have powerful guards too.
The AI-Box experiment has human guards. Consequently, it has very low relevance to the actual problem. Programmers don't build their test harnesses out of human beings.
Safety is usually an economic trade off. You can usually have an lot of it - if you are prepared to pay for it.
This thread is for the discussion of Less Wrong topics that have not appeared in recent posts. If a discussion gets unwieldy, celebrate by turning it into a top-level post.