A putative new idea for AI control; index here.
In a previous post, I talked about an AI operating only on a virtual world (ideas like this used to be popular, until it was realised the AI might still want to take control of the real world to affect the virtual world; however, with methods like indifference, we can guard against this much better).
I mentioned that the more of the AI's algorithm that existed in the virtual world, the better it was. But why not go the whole way? Some people at MIRI and other places are working on agents modelling themselves within the real world. Why not have the AI model itself as an agent inside the virtual world? We can quine to do this, for example.
Then all the restrictions on the AI - memory capacity, speed, available options - can be specified precisely, within the algorithm itself. It will only have the resources of the virtual world to achieve its goals, and this will be specified within it. We could define a "break" in the virtual world (ie any outside interference that the AI could cause, were it to hack us to affect its virtual world) as something that would penalise the AI's achievements, or simply as something impossible according to its model or beliefs. It would really be a case of "given these clear restrictions, find the best approach you can to achieve these goals in this specific world".
It would be idea if the AI's motives were not given in terms of achieving anything in the virtual world, but in terms of making the decisions that, subject to the given restrictions, were most likely to achieve something if the virtual world were run in its entirety. That way the AI wouldn't care if the virtual world were shut down or anything similar. It should only seek to self modify in way that makes sense within the world, and understand itself existing completely within these limitations.
Of course, this would ideally require flawless implementation of the code; we don't want bugs developing in the virtual world that point to real world effects (unless we're really confident we have properly coded the "care only about the what would happen in the virtual world, not what actually does happen).
Any thoughts on this idea?
Once AI exists, in the public, it isn't containable. Even if we can box it, someone will build it without a box. Or like you said, ask it how to make as many paperclips as possible.
But if we get to AI first, and we figure out how to box it and get it to do useful work, then we can use it to help solve FAI. Maybe. You could ask it questions like "how do I build a stable self improving agent" or "what's the best way to solve the value loading problem", etc.
You would need some assurance that the AI would not try to manipulate the output. That's the hard part, but it might be doable. And it may be restricted to only certain kinds of questions, but that's still very useful.
You mean like the knowledge of how it was made is public and anyone can do it? Definitely not. But if you keep it all proprietary it might be possible to contain.
I suppose what we should do is figure out how to make friendly AI, figure out how to create boxed AI, and then build an AI that's probably friendly and probably boxed, and it's more likely that everything won... (read more)