A putative new idea for AI control; index here.
In a previous post, I talked about an AI operating only on a virtual world (ideas like this used to be popular, until it was realised the AI might still want to take control of the real world to affect the virtual world; however, with methods like indifference, we can guard against this much better).
I mentioned that the more of the AI's algorithm that existed in the virtual world, the better it was. But why not go the whole way? Some people at MIRI and other places are working on agents modelling themselves within the real world. Why not have the AI model itself as an agent inside the virtual world? We can quine to do this, for example.
Then all the restrictions on the AI - memory capacity, speed, available options - can be specified precisely, within the algorithm itself. It will only have the resources of the virtual world to achieve its goals, and this will be specified within it. We could define a "break" in the virtual world (ie any outside interference that the AI could cause, were it to hack us to affect its virtual world) as something that would penalise the AI's achievements, or simply as something impossible according to its model or beliefs. It would really be a case of "given these clear restrictions, find the best approach you can to achieve these goals in this specific world".
It would be idea if the AI's motives were not given in terms of achieving anything in the virtual world, but in terms of making the decisions that, subject to the given restrictions, were most likely to achieve something if the virtual world were run in its entirety. That way the AI wouldn't care if the virtual world were shut down or anything similar. It should only seek to self modify in way that makes sense within the world, and understand itself existing completely within these limitations.
Of course, this would ideally require flawless implementation of the code; we don't want bugs developing in the virtual world that point to real world effects (unless we're really confident we have properly coded the "care only about the what would happen in the virtual world, not what actually does happen).
Any thoughts on this idea?
Not with a lobotomy, no. But with a more sophisticated brain surgery/wipe that caused me to value spending time in your house and making you happy and so forth- then yes, after the operation I would probably consider you a friend, or something quite like it.
Obviously, as a Toggle who has not yet undergone such an operation, I consider it a hostile and unfriendly act. But that has no bearing on what our relationship is after the point in time where you get to arbitrarily decide what our relationship is.