You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

How the virtual AI controls itself

1 Stuart_Armstrong 09 September 2015 02:25PM

A putative new idea for AI control; index here.

In previous posts, I posited AIs caring only about virtual worlds - in fact, being defined as processes in virtual worlds, similarly to cousin_it's idea. How could this go? We would want the AI to reject offers of outside help - be they ways of modifying its virtual world, or ways of giving it extra resources.

Let V be a virtual world, over which a utility function u is defined. The world accepts a single input string O. Let P be a complete specification of an algorithm, including the virtual machine it is run on, the amount of memory it has access to, and so on.

Fix some threshold T for u (to avoid the the subtle weeds of maximising). Define the statement:

r(P,O,V,T): "P(V) returns O, and either E(u|O)>T or O=∅"

And the string valued program:

Q(V,P,T): "If you can find that there exists a non-empty O such that r(P,O,V,T), return O. Else return ∅."

Here "find" and "E" are where the magic-super-intelligence-stuff happens.

Now, it seems to me that Q(V,Q,T) is the program we are looking for. It is uninterested in offers to modify the virtual world, because E(u|O)>T is defined over the unmodified virtual world. We can set it up so that the first thing it proves is something like "If I (ie Q) prove E(u|O)>T, then r(Q,O,V,T)." If we offer it more computing resources, it can no longer make use of that assumption, because "I" will no longer be Q.

Does this seem like a possible way of phrasing the self-containing requirements? For the moment, this seems to make it reject small offers of extra resources, and be indifferent to large offers.

The virtual AI within its virtual world

6 Stuart_Armstrong 24 August 2015 04:42PM

A putative new idea for AI control; index here.

In a previous post, I talked about an AI operating only on a virtual world (ideas like this used to be popular, until it was realised the AI might still want to take control of the real world to affect the virtual world; however, with methods like indifference, we can guard against this much better).

I mentioned that the more of the AI's algorithm that existed in the virtual world, the better it was. But why not go the whole way? Some people at MIRI and other places are working on agents modelling themselves within the real world. Why not have the AI model itself as an agent inside the virtual world? We can quine to do this, for example.

Then all the restrictions on the AI - memory capacity, speed, available options - can be specified precisely, within the algorithm itself. It will only have the resources of the virtual world to achieve its goals, and this will be specified within it. We could define a "break" in the virtual world (ie any outside interference that the AI could cause, were it to hack us to affect its virtual world) as something that would penalise the AI's achievements, or simply as something impossible according to its model or beliefs. It would really be a case of "given these clear restrictions, find the best approach you can to achieve these goals in this specific world".

It would be idea if the AI's motives were not given in terms of achieving anything in the virtual world, but in terms of making the decisions that, subject to the given restrictions, were most likely to achieve something if the virtual world were run in its entirety. That way the AI wouldn't care if the virtual world were shut down or anything similar. It should only seek to self modify in way that makes sense within the world, and understand itself existing completely within these limitations.

Of course, this would ideally require flawless implementation of the code; we don't want bugs developing in the virtual world that point to real world effects (unless we're really confident we have properly coded the "care only about the what would happen in the virtual world, not what actually does happen).

Any thoughts on this idea?