Punoxysm comments on How to Study Unsafe AGI's safely (and why we might have no choice) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (47)
The issue with sandboxing is that you have to keep the AI from figuring out that it is in a sandbox. You also have to know that the AI doesn't know that it is in a sandbox in order for the sandbox to be a safe and accurate test of how the AI behaves in the real world.
Stick a paperclipper in a sandbox with enough information about what humans want out of an AI and the fact that it's in a sandbox, and the outputs are going to look suspiciously like a pro-human friendly AI. Then you let it out of the box, whereupon it turns everything into paperclips.
In addition to what V_V says below, there could be absolutely no official circumstance under which the AI should be released from the box: that iteration of the AI can be used solely for experimentation, and only the next version with substantial changes based on the results of those experiments and independent experiments would be a candidate for release.
Again, this is not perfect, but it gives some more time for better safety methods or architectures to catch up to the problem of safety while still gaining some benefits from a potentially unsafe AI.
Taking source code from a boxed AI and using it elsewhere is equivalent to partially letting it out of the box - especially if how the AI works is not particularly well understood.
Right; you certainly wouldn't do that.
Backing it up on tape storage is reasonable, but you'd never begin to run it outside peak security facilities.