How the virtual AI controls itself

1 Post author: Stuart_Armstrong 09 September 2015 02:25PM

A putative new idea for AI control; index here.

In previous posts, I posited AIs caring only about virtual worlds - in fact, being defined as processes in virtual worlds, similarly to cousin_it's idea. How could this go? We would want the AI to reject offers of outside help - be they ways of modifying its virtual world, or ways of giving it extra resources.

Let V be a virtual world, over which a utility function u is defined. The world accepts a single input string O. Let P be a complete specification of an algorithm, including the virtual machine it is run on, the amount of memory it has access to, and so on.

Fix some threshold T for u (to avoid the the subtle weeds of maximising). Define the statement:

r(P,O,V,T): "P(V) returns O, and either E(u|O)>T or O=∅"

And the string valued program:

Q(V,P,T): "If you can find that there exists a non-empty O such that r(P,O,V,T), return O. Else return ∅."

Here "find" and "E" are where the magic-super-intelligence-stuff happens.

Now, it seems to me that Q(V,Q,T) is the program we are looking for. It is uninterested in offers to modify the virtual world, because E(u|O)>T is defined over the unmodified virtual world. We can set it up so that the first thing it proves is something like "If I (ie Q) prove E(u|O)>T, then r(Q,O,V,T)." If we offer it more computing resources, it can no longer make use of that assumption, because "I" will no longer be Q.

Does this seem like a possible way of phrasing the self-containing requirements? For the moment, this seems to make it reject small offers of extra resources, and be indifferent to large offers.

Comments (0)

There doesn't seem to be anything here.