A problem with "playing chicken with the universe" as an approach to UDT

16 Karl 08 March 2013 02:34AM

Let's consider the agent given in A model of UDT with a halting oracle. One will notice that that agent is not quite well defined because it doesn't tell us in what order we are supposed to consider actions in step 1. But surely that doesn't matter, right? Wrong.

 

Let's consider the prisoner dilemma with payment matrix given by

 

  1: C 1:  D
2: C (3, 3) (5, 0)
2: D (0, 5) (2, 2)

 

and consider agent A which consider whether there is a proof that A()≠D before considering whether there is a proof that A()≠C and agent A' which do things in the opposite order. If A or A' is pitted against itself everything is well and mutual cooperation is the result of the game but what if A is pitted against A'? Then A break down and cry.

Let's call the utility functions of A U and the utility function of A' U' and consider a model of PA in which PA is inconsistent (such a model must exist if PA is consistent). In such a model we will have A()=D and A'()=C and so U()=5 and U'()=0. That means that A will not be able to prove that A()=D => U()=u for any u different from 5 and so either A will defect and A' will cooperate or A will break down and cry, but A' will not cooperate because it cannot prove A'()=C => U()=u' for any u' except possibly 0, so A will break down and cry. QED

More generally if M is a model of PA in which PA is inconsistent, an agent defined in this way will never be able to prove that A()=a => U()=u (where a is the first action considered in step 1) except possibly for u=u0 where u0 is the value of U() in M. That seems to create a huge problem for that approach to UDT.

Rational vs. real utilities in the cousin_it-Nesov model of UDT

6 Karl 29 January 2012 04:43AM

In this article cousin_it describes an algorithm, using a Turing Oracle, which gives a working model of UDT. This algorithm seems to work well in the case of a utility function which takes natural numbers or rationals as values but, as I will explain, the algorithm doesn't actually work for utilities given by real numbers; I will also propose an algorithm which doesn't use an oracle and which seems to wield the correct answer given sufficient computing time.

One of the hidden assumptions in the initial post is that comparison of utilities is decidable (at least decidable with a Turing oracle): in step 3 of the algorithm you are supposed to return the action that corresponds to the highest utility found on step 2 of the algorithm, but if comparison of utilities isn't actually decidable you won't be able (in the general case) to do that and as it happens that equality of real numbers isn't decidable (even using an oracle). Also there are no canonical forms for real numbers like there are for natural or rational numbers so you can have a situation where the order relationship between different real numbers depends on the value of A().

A simple modification of the original algorithm can deal with this problem:

  1. Enumerate proofs of statements of the form A()=a ⇒ U()≥u, where u is a rational number in canonical form, for t time steps.
  2. Return the action that corresponds to the highest utility found on step 2.

Why does this works? Because if the formal system used is sound, when the formal system proves that A()=a ⇒ U()≥u where a is the action which ends up actually being taken then U() must actually end up being at least u.

So for example, in the case of Newcomb's problem if t is sufficiently high it will eventually be proven that A()=1 ⇒ U()≥1 000 000 and so, if the formal system used is sound, the utility which ends up being received by the agent will have to be at least 1 000 000 which is of course only possible if the agent one boxes so that's what it will end up doing.

Is it possible to build a safe oracle AI?

0 Karl 20 April 2011 12:54PM

It seem to me possible to create a safe oracle AI.

Suppose that you have a sequence predictor which is a good approximation of Solomonoff induction but which run in reasonable time. This sequence predictor can potentially be really useful (for example, predict future siai publications from past siai publications then proceed to read the article which give a complete account of Friendliness theory...) and is not dangerous in itself.

The question, of course, is how to obtain such a thing.

The trick rely on the concept of program predictor. A program predictor is a function which predict, more or less accurately, the output of the program (note that when we refer to a program we refer to a program without side effect that just calculate an output.) it take as it's input but within reasonable time. If you have a very accurate program predictor then you can obviously use it to gain a good approximation of Solomonoff induction which run in reasonable time.

But of course, this just displace the problem: how do you get such an accurate program predictor?

Well, suppose you have a program predictor which is good enough to be improved on. Then, you use it to predict the program of less than N bits of length (with N sufficiently big of course) which maximize a utility function which measure how accurate the output of that program is as a program predictor given that it generate this output in less than T steps (where T is a reasonable number given the hardware you have access to). Then you run that program. Check the accuracy of the obtained program predictor. If insufficient repeat the process. You should eventually obtain a very accurate program predictor. QED.

So we've reduced our problem to the problem of creating a program predictor good enough to be improved upon. That should be possible. In particular, it is related to the problem of logical uncertainty. If we can get a passable understanding of logical uncertainty it should be possible to build such a program predictor using it. Thus a minimal understanding of logical uncertainty should be sufficient to obtain agi. In fact even without such understanding, it may be possible to patch together such a program predictor...