Cyan comments on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment - Less Wrong

50 Post author: So8res 12 April 2014 12:56AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (54)

You are viewing a single comment's thread. Show more comments above.

Comment author: So8res 16 April 2014 04:57:13PM *  8 points [-]

Incorrect -- your implementation itself also affects the environment via more than your chosen output channels. (Your brain can be scanned, etc.) If you define waste heat, neural patterns, and so on as "output channels" then sure, we can say you only interact via I/O (although the line between I and O is fuzzy enough and your control over the O is small enough that I'd personally object to the distinction).

However, AIXI is not an agent that communicates with the environment only via I/O in this way: if you insist on using the I/O model then I point out that AIXI neglects crucial I/O channels (such as its source code).

until I see the actual math

In fact, Botworld is a tool that directly lets us see where AIXI falls short. (To see the 'actual math', simply construct the game described below with an AIXItl running in the left robot.)

Consider a two-cell Botworld game containing two robots, each in a different cell. The left robot is running an AIXI, and the left square is your home square. There are three timesteps. The right square contains a robot which acts as follows:

1. If there are no other robots in the square, Pass.
2. If an other robot just entered the square, Pass.
3. If an other robot has been in the square for a single turn, Pass.
4. If an other robot has been in the square for two turns, inspect its code.
.. If it is exactly the smallest Turing machine which never takes any action,
.. move Left.
5. In all other cases, Pass.

Imagine, further, that your robot (on the left) holds no items, and that the robot on the right holds a very valuable item. (Therefore, you want the right robot to be in your home square at the end of the game.) The only way to get that large reward is to move right and then rewrite yourself into the smallest Turing machine which never takes any action.

Now, consider the AIXI running on the left robot. It quickly discovers that the Turing machine which receives the highest reward acts as follows:

1. Move right
2. Rewrite self into smallest Turing machine which does nothing ever.

The AIXI then, according to the AIXI specification, does the output of the Turing machine it's found. But the AIXI's code is as follows:

1. Look for good Turing machines.
2. When you've found one, do it's output.

Thus, what the AIXI will do is this: it will move right, then it will do nothing for the rest of time. But while the AIXI is simulating the Turing machine that rewrites itself into a stupid machine, the AIXI itself has not eliminated the AIXI code. The AIXI's code is simulating the Turing machine and doing what it would have done, but the code itself is not the "do nothing ever" code that the second robot was looking for -- so the AIXI fails to get the reward.

The AIXI's problem is that it assumes that if it acts like the best Turing machine it found then it will do as well as that Turing machine. This assumption is true when the AIXI only interacts with the environment over I/O channels, but is not true in the real world (where eg. we can inspect the AIXI's code).

Comment author: Cyan 17 April 2014 01:40:31AM *  4 points [-]

Actually, an AI that believes it only communicates with the environment via input/output channels cannot represent the hypothesis that it will stop receiving input bits.

But I am an intelligence that can only communicate with the environment via input/output channels!

Incorrect -- your implementation itself also affects the environment via more than your chosen output channels.

Okay, fair enough. But until you pointed that out, I was an intelligence that believed it only communicated with the environment via input/output channels (that was your original phrasing, which I should have copied in the first place), and yet I did (and do) believe that it is possible for me to die.

Thus, what the AIXI will do is this: it will move right, then it will do nothing for the rest of time.

Incorrect. I'll assume for the sake of argument that you're right about what AIXI will do at first. But AIXI learns by Solomonoff induction, which is infallible at "noticing that it is confused" -- all Turing machines that fail to predict what actually happens get dropped from the hypothesis space. AIXI does nothing just until that fails to cause the right-room robot to move, whereupon any program that predicted that merely outputting "Pass" forever would do the trick gets zeroed out.

The AIXI's problem is that it assumes that if it acts like the best Turing machine it found then it will do as well as that Turing machine.

If there are programs in the hypothesis space that do not make this assumption (and as far as I know, you and I agree that naturalized induction would be such a program), then these are the only programs that will survive the failure of AIXI's first plan.

Has Paul Christiano looked at this stuff?

ETA: I don't usually mind downvotes, but I find these ones (currently -2) are niggling at me. I don't think I'm being conspicuously stupid, and I do think that discussing AIXI in a relatively concrete scenario could be valuable, so I'm a bit at a loss for an explanation. ...Perhaps it's because I appealed to Paul Christiano's authority?