On April 1, I started working full-time for MIRI. In the weeks prior, while I was winding down my job and packing up my things, Benja and I built Botworld, a cellular automaton that we've been using to help us study self-modifying agents. Today, we're publicly releasing Botworld on the new MIRI github page. To give you a feel for Botworld, I've reproduced the beginning of the technical report below.
This report introduces Botworld, a cellular automaton that provides a toy environment for studying self-modifying agents.
The traditional agent framework, used for example in Markov Decision Processes and in Marcus Hutter’s universal agent AIXI, splits the universe into an agent and an environment, which interact only via discrete input and output channels.
Such formalisms are perhaps ill-suited for real self-modifying agents, which are embedded within their environments. Indeed, the agent/environment separation is somewhat reminiscent of Cartesian dualism: any agent using this framework to reason about the world does not model itself as part of its environment. For example, such an agent would be unable to understand the concept of the environment interfering with its internal computations, e.g. by inducing errors in the agent’s RAM through heat.
Intuitively, this separation does not seem to be a fatal flaw, but merely a tool for simplifying the discussion. We should be able to remove this “Cartesian” assumption from formal models of intelligence. However, the concrete non-Cartesian models that have been proposed (such as Orseau and Ring’s formalism for space-time embedded intelligence, Vladimir Slepnev’s models of updateless decision theory, and Yudkowsky and Herreshoff’s tiling agents) depart significantly from their Cartesian counterparts.
Botworld is a toy example of the type of universe that these formalisms are designed to reason about: it provides a concrete world containing agents (“robots”) whose internal computations are a part of the environment, and allows us to study what happens when the Cartesian barrier between an agent and its environment breaks down. Botworld allows us to write decision problems where the Cartesian barrier is relevant, program actual agents, and run the system.
As it turns out, many interesting problems arise when agents are embedded in their environment. For example, agents whose source code is readable may be subjected to Newcomb-like problems by entities that simulate the agent and choose their actions accordingly.
Furthermore, certain obstacles to self-reference arise when non-Cartesian agents attempt to achieve confidence in their future actions. Some of these issues are raised by Yudkowsky and Herreshoff; Botworld gives us a concrete environment in which we can examine them.
One of the primary benefits of Botworld is concreteness: when working with abstract problems of self-reference, it is often very useful to see a concrete decision problem (“game”) in a fully specified world that directly exhibits the obstacle under consideration. Botworld makes it easier to visualize these obstacles.
Conversely, Botworld also makes it easier to visualize suggested agent architectures, which in turn makes it easier to visualize potential problems and probe the architecture for edge cases.
Finally, Botworld is a tool for communicating. It is our hope that Botworld will help others understand the varying formalisms for self-modifying agents by giving them a concrete way to visualize such architectures being implemented. Furthermore, Botworld gives us a concrete way to illustrate various obstacles, by implementing Botworld games in which the obstacles arise.
Botworld has helped us gain a deeper understanding of varying formalisms for self-modifying agents and the obstacles they face. It is our hope that Botworld will help others more concretely understand these issues as well.
Overview
Botworld is a high level cellular automaton: the contents of each cell can be quite complex. Indeed, cells may house robots with register machines, which are run for a fixed amount of time in each cellular automaton step. A brief overview of the cellular automaton follows. Afterwards, we will present the details along with a full implementation in Haskell.
Botworld consists of a grid of cells, each of which is either a square or an impassable wall. Each square may contain an arbitrary number of robots and items. Robots can navigate the grid and possess tools for manipulating items. Some items are quite useful: for example, shields can protect robots from attacks by other robots. Other items are intrinsically valuable, though the values of various items depends upon the game being played.
Among the items are robot parts, which the robots can use to construct other robots. Robots may also be broken down into their component parts (hence the necessity for shields). Thus, robots in Botworld are quite versatile: a well-programmed robot can reassemble its enemies into allies or construct a robot horde.
Because robots are transient objects, it is important to note that players are not robots. Many games begin by allowing each player to specify the initial state of a single robot, but clever players will write programs that soon distribute themselves across many robots or construct fleets of allied robots. Thus, Botworld games are not scored depending upon the actions of the robot. Instead, each player is assigned a home square (or squares), and Botworld games are scored according to the items carried by all robots that are in the player’s home square at the end of the game. (We may imagine these robots being airlifted and the items in their possession being given to the player.)
Robots cannot see the contents of robot register machines by default, though robots can execute an inspection to see the precise state of another robot’s register machine. This is one way in which the Cartesian boundary can break down: It may not be enough to choose an optimal action, if the way in which this action is computed can matter.
For example, imagine a robot which tries to execute an action that it can prove will achieve a certain minimum expected utility u_min
. In the traditional agent framework, this can imply an optimality property: if there is any program p
our robot could have run such that our robot can prove that p
would have received expected utility ≥ u_min
, then our robot will receive expected utility ≥ u_min
(because it can always do what that other program would have done). But suppose that this robot is placed into an environment where another robot reads the contents of the first robot's register machine, and gives the first robot a reward if and only if the first robot runs the program “do nothing ever”. Then, since this is not the program our robot runs, it will not receive the reward.
It is important to note that there are two different notions of time in Botworld. The cellular automaton evolution proceeds in discrete steps according to the rules described below. During each cellular automaton step, the machines inside the robots are run for some finite number of ticks.
Like any cellular automaton, Botworld updates in discrete steps which apply to every cell. Each cell is updated using only information from the cell and its immediate neighbors. Roughly speaking, the step function proceeds in the following manner for each individual square:
The output register of the register machine of each robot in the square is read to determine the robot’s command. Note that robots are expected to be initialized with their first command in the output register.
The commands are used in aggregate to determine the robot actions. This involves checking for conflicts and invalid commands.
The list of items lying around in the square is updated according to the robot actions. Items that have been lifted or used to create robots are removed, items that have been dropped are added.
Robots incoming from neighboring squares are added to the robot list.
Newly created robots are added to the robot list.
The input registers are set on all robots. Robot input includes a list of all robots in the square (including exiting, entering, destroyed, and created robots), the actions that each robot took, and the updated item list.
Robots that have exited the square or that have been destroyed are removed from the robot list.
All remaining robots have their register machines executed (and are expected to leave a command in the output register.)
These rules allow for a wide variety of games, from NP-hard knapsack packing games to difficult Newcomb-like games such as a variant of the Parfit’s hitchhiker problem (wherein a robot will drop a valuable item only if it, after simulating your robot, concludes that your robot will give it a less valuable item).
Cartesianism in Botworld
Though we have stated that we mean to study non-Cartesian formalizations of intelligence, Botworld does in fact have a “Cartesian” boundary. The robot parts are fundamental objects, the machine registers are non-reducible. The important property of Botworld is not that it lacks a Cartesian boundary, but that the boundary is breakable.
In the real world the execution of a computer program is unaffected by the environment most of the time (except via the normal input channels). While the contents of a computer’s RAM can be changed by heating it up with a desk lamp, they are usually not. An Artificial General Intelligence (AGI) would presumably make use of this fact. Thus, an AGI may commonly wish to ensure that its Cartesian boundary is not violated in this way over some time period (during which it can make use of the nice properties of Cartesian frameworks). Botworld attempts to model this in a simple way by requiring agents to contend with the possibility that they may be destroyed by other robots.
More problematically, in the real world, the internals of a computer program will always affect the environment—for example, through waste heat emitted by the computer—but it seems likely that these effects are usually unpredictable enough that an AGI will not be able to improve its performance by carefully choosing e.g. the pattern of waste heat it emits. However, an AGI will need to ensure that these unavoidable violations of its Cartesian boundary will in fact not make an expected difference to its goals. Botworld sidesteps this issue and only requires robots to deal with a more tractable issue: Contending with the possibility that their source code might be read by another agent.
Our model is not realistic, but it is simple to reason about. For all that the robot machines are not reducible, the robots are still embedded in their environment, and they can still be read or destroyed by other agents. We hope that this captures some of the complexity of naturalistic agents, and that it will serve as a useful test bed for formalisms designed to deal with this complexity. Although being able to deal with the challenges of Botworld is presumably not a good indicator that a formalism will be able to deal with all of the challenges of naturalistic agents, it allows us to see in concrete terms how it deals with some of them.
In creating Botworld we tried to build something implementable by a lower-level system, such as Conway’s Game of Life. It is useful to imagine such an implementation when considering Botworld games.
Future versions of Botworld may treat the robot bodies as less fundamental objects. In the meantime, we hope that it is possible to picture an implementation where the Cartesian boundary is much less fundamental, and to use Botworld to gain useful insights about agents embedded within their environment. Our intent is that when we apply a formalism for naturalistic agents to the current implementation of Botworld, then there will be a straightforward translation to an application of the same formalism to an implementation of Botworld in (say) the Game of Life.
The full technical report goes on to provide an implementation of Botworld in Haskell. You can find the source code on the MIRI Botworld repository. Sample games are forthcoming.
Benja and I will be writing up some of the results we've achieved. In the meantime, you're encouraged to play around with it and build something cool.
Okay, fair enough. But until you pointed that out, I was an intelligence that believed it only communicated with the environment via input/output channels (that was your original phrasing, which I should have copied in the first place), and yet I did (and do) believe that it is possible for me to die.
Incorrect. I'll assume for the sake of argument that you're right about what AIXI will do at first. But AIXI learns by Solomonoff induction, which is infallible at "noticing that it is confused" -- all Turing machines that fail to predict what actually happens get dropped from the hypothesis space. AIXI does nothing just until that fails to cause the right-room robot to move, whereupon any program that predicted that merely outputting "Pass" forever would do the trick gets zeroed out.
If there are programs in the hypothesis space that do not make this assumption (and as far as I know, you and I agree that naturalized induction would be such a program), then these are the only programs that will survive the failure of AIXI's first plan.
Has Paul Christiano looked at this stuff?
ETA: I don't usually mind downvotes, but I find these ones (currently -2) are niggling at me. I don't think I'm being conspicuously stupid, and I do think that discussing AIXI in a relatively concrete scenario could be valuable, so I'm a bit at a loss for an explanation. ...Perhaps it's because I appealed to Paul Christiano's authority?