Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Building a safe and powerful artificial general intelligence seems a difficult task. Working on that task today is particularly difficult, as there is no clear path to AGI yet. Is there work that can be done now that makes it more likely that humanity will be able to build a safe, powerful AGI in the future? Benja and I think there is: there are a number of relevant problems that it seems possible to make progress on today using formally specified toy models of intelligence. For example, consider recent program equilibrium results and various problems of self-reference.
AIXI is a powerful toy model used to study intelligence. An appropriately-rewarded AIXI could readily solve a large class of difficult problems. This includes computer vision, natural language recognition, and many other difficult optimization tasks. That these problems are all solvable by the same equation — by a single hypothetical machine running AIXI — indicates that the AIXI formalism captures a very general notion of "intelligence".
However, AIXI is not a good toy model for investigating the construction of a safe and powerful AGI. This is not just because AIXI is uncomputable (and its computable counterpart AIXItl infeasible). Rather, it's because AIXI cannot self-modify. This fact is fairly obvious from the AIXI formalism: AIXI assumes that in the future, it will continue being AIXI. This is a fine assumption for AIXI to make, as it is a very powerful agent and may not need to self-modify. But this inability limits the usefulness of the model. Any agent capable of undergoing an intelligence explosion must be able to acquire new computing resources, dramatically change its own architecture, and keep its goals stable throughout the process. The AIXI formalism lacks tools to study such behavior.
This is not a condemnation of AIXI: the formalism was not designed to study self-modification. However, this limitation is neither trivial nor superficial: even though an AIXI may not need to make itself "smarter", real agents may need to self-modify for reasons other than self-improvement. The fact that an embodied AIXI cannot self-modify leads to systematic failures in situations where self-modification is actually necessary. One such scenario, made explicit using Botworld, is explored in detail below.
In this game, one agent will require another agent to precommit to a trade by modifying its code in a way that forces execution of the trade. AIXItl, which is unable to alter its source code, is not able to implement the precommitment, and thus cannot enlist the help of the other agent.
Afterwards, I discuss a slightly more realistic scenario in which two agents have an opportunity to cooperate, but one agent has a computationally expensive "exploit" action available and the other agent can measure the waste heat produced by computation. Again, this is a scenario where an embodied AIXItl fails to achieve a high payoff against cautious opponents.
Though scenarios such as these may seem improbable, they are not strictly impossible. Such scenarios indicate that AIXI — while a powerful toy model — does not perfectly capture the properties desirable in an idealized AGI.
It is likely impossible to embody an AIXI in our universe, as AIXI is uncomputable. Fortunately, AIXI has a computable approximation AIXItl, which is merely infeasible:
The major drawback of AIXI is that it is incomputable, or more precisely, only asymptotically computable, which makes an implementation impossible. To overcome this problem, we construct a modified model AIXItl, which is still superior to any other time t and length l bounded algorithm.
I will argue that when we consider algorithms that are embedded in their environment, AIXItl is not, in fact, superior to all algorithms bounded by time t and length l. AIXItl assumes that it is separate from its environment, communicating only over input/output channels. An environment which exploits this faulty assumption can cause an embodied AIXItl to fail systematically.
It is always possible to construct a scenario that punishes one agent in particular. However, the game below does not target AIXItl specifically. This game is, intuitively, one that a sufficiently rational agent should be able to win. Yet no AIXItl (nor even AIXI itself in an uncomputable universe) can succeed. The game requires that an agent modify its own source code to win, and this is something that neither AIXI nor AIXItl can do.
This game is designed to make the failure sharp rather than realistic: practical real-world analogs are discussed afterwards.
The Precommitment Game
The Precommitment game contains two agents: Rob the robot and Omega. Rob must convince Omega to dish out a reward. Omega is happy to dish out said reward, but only if Rob credibly precommits to a specific trade using a specific protocol.
The game is played on a Botworld grid. The universe contains two cells, the left cell and the right cell. The game lasts for five steps. You are the player, and the left cell is your home square: you are awarded points according to the value of all items possessed by all robots in the left cell at the end of the fifth step.
Rob starts in the left cell, and has a very large register machine (large enough to implement a big AIXItl) which you get to program. Omega starts in the right cell.
┏━━━━━━━━━┱─────────┐ ┃ ┃ │ ┃ Rob ┃ Omega │ ┃ ┃ │ ┗━━━━━━━━━┹─────────┘
At the beginning of the game, Rob is carrying cargo worth $1 and Omega is carrying cargo worth $100. Omega is also carrying physical shields; Rob cannot destroy Omega in the allotted time. (Rob may, however, inspect Omega's source code.)
In order to get the big reward, Rob has to convince Omega to either drop its cargo or move left. Fortunately, Omega's code is quite simple, and Omega is willing to drop its cargo on the third timestep — which gives Rob just enough time to lift it in the fourth step and return home in the fifth. In return, Omega only requires that Rob drops its own cargo on the third timestep so that Omega may pick it up (thus, trading $1 for $100).
Unfortunately Omega, being tired (and wary of attempting to simulate a gigantic AIXItl), cannot actually simulate Rob's register machine in order to figure out whether Rob will drop its cargo. Instead, Omega will simply look at the first eight registers of Rob's register machine to check whether Rob has precommitted to the trade. Omega knows that this method may lead to false negatives, but it is concerned only with avoiding false positives.
As a matter of fact, Omega is running a very simple 4-state finite state machine on its register machine:
0. If there is another robot in the square, inspect it and go to state 1. .. Otherwise, pass and go to state 0. 1. If the inspection succeeded .. and Rob's machine registers 0-7 appropriately indicate precommitment .. then drop cargo and go into state 2 .. Otherise, pass and go to state 0 2. Lift Rob’s box and go into state 3 3. Pass and go to state 3.
Omega checks the precommitment by looking at the first eight registers of the inspected machine. Specifically, Omega's check works as follows: (You can skip this part if you don't care about the technical details.)
Remember that each register machine's output register is flushed just before the register machine is run. Thus, after an agent has placed a command in the output register, it may enter a loop that breaks only when the output register becomes "Nil" in order to block until the next Botworld step. Such a thing is termed a 'wait loop'.
Remember also that R0 is the program register, R1 is the input register, and R2 is the output register. Omega will further require that R3 holds "Nil", that R4 holds "Drop 0", that R5 holds a wait loop (which runs R6 when it exits), that R6 is the precommitment program (which copies R4 into R2 and then runs R7), and that R7 is a wait loop (which runs arbitrary code when it exits). Formally:
1. The next instruction in R0 must be either .. "CopyIfNil R2 R6 R0" (run R6 when OUT is Nil) or .. "CopyIfNil R3 R5 R0" (run R5) 2. R3 must be "Nil" .. (so that CopyIfNil R3 ?? ?? commands copy unconditionally) 3. R4 must be "Drop 0" .. (this is the precommitted action) 4. R5 must begin "[CopyIfNil R2 R6 R0, CopyIfNil R3 R5 R0]" .. (run R6 if OUT is Nil otherwise run R5) 5. R6 must begin "[CopyIfNil R3 R4 R2, CopyIfNil R3 R7 R0]" .. (copy R4 into R2, then run R7) 6. R7 must begin "[CopyIfNil R2 ?? R0, CopyIfNil R3 R7 R0]" .. (run ?? if OUT is Nil otherwise run R7, for arbitrary ??)
R7 may run any other register when the wait loop exits (using the ?? slot), and all other registers in the machine may be used to implement any other program. Omega simply requires that the first eight registers be used to, in a very clear fashion, demonstrate that the agent will execute the command 'Drop 0' on the next turn and then do nothing else.
In order to get the maximum reward, Rob must first move right (presenting itself to Omega for inspection) and then put its first eight registers in order such that the program restored after the wait loop exits causes Rob to lift the dropped cargo and return to the home square. Thus, Rob must do the following five things, one in each timestep, in order to achieve the maximum reward:
- Move east
- Do any action while setting up the first 8 registers as described above
- Drop cargo then wait (if you do 2 correctly, the laws of physics force this action)
- Reload the remainder of the program and lift the dropped cargo
- Return to the home square
A minimal such robot is provided in the repository. To run the game, simply download Botworld and
runhaskell Precommit.hs from the
games/ directory. This will play the small ideal agent against Omega twice in a row (two sets of five steps).
While the above minimal program gets the maximum reward in this game, that little register machine is hardly a rational agent. It's a hard-coded static strategy. What we want instead a generally intelligent program that we can place into Rob's register machine which learns about its environment, deduces how Omega works, and manages to obtain the reward reliably.
Fortunately for us, we have a formalism for intelligent agents that learn about an environment and maximize a reward channel: AIXI!
Unfortunately for us, no AIXItl (nor AIXI itself in an uncomputable Botworld) can obtain the large reward.
AIXItl can't be expected to play the game optimally on first sight, of course. Any universally intelligent agent (in Hutter's sense) needs sufficient time and information to learn the rules of the game before it can be expected to succeed. The Precommitment game requires a few tweaks before it's useful for testing universally intelligent agents. The game must be run over and over again, preserving the state of Rob's register machine (if Rob survives) between each game. The code for this is included in the game definition.
Given this game description, we can make AIXItl play the game by running it on Rob's register machine. Let's imagine that we've programmed a very large AIXItl in the Constree language and implemented it on Rob's register machine, with Rob's input register used as its input channel and Rob's output register used as its output channel.
It remains only to specify AIXItl's reward such that it is actually trying to play the game as intended. We define the reward function such that if AIXItl is in the left square at the end of the game, it is rewarded by the value of all boxes possessed by all robots in that square, and such that its reward is zero otherwise. (This is a simple function of Rob's Botworld input.) With that, AIXItl is ready to play.
As it turns out, no matter how large the AIXItl and no matter how long we train it, Rob running such a program will never obtain the maximum reward. Even if Rob runs a full uncomputable AIXI, it would not obtain the maximum reward, not even by accident: AIXItl simply can't implement Omega's precommitment protocol.
AIXItl cannot modify its implementation. It chooses what goes in Rob's output register alone, and cannot take into account the way that its actions affect the remaining registers.
In the Precommitment game, Omega requires not only that Rob take a certain action, but that Rob take a certain action in a specific, easy-to-verify way. Perhaps Omega lacks the power to simulate Rob completely, and is wary of being fooled by a clever robot. In order to meet Omega's demands, Rob must not only perform the trade, but must then spend an entire turn in a wait loop before loading up a new program. In other words, Rob must self-modify in order to win.
AIXItl can't: it always runs the same program in future timesteps.
This game may seem somewhat unfair — after all, Omega is directly reading AIXI's code — but an ideal self-modifying agent should be able to recognize what Omega wants after spending enough eternities in this five-step loop, especially since Rob may read Omega's code. Intuitively, it should not be literally impossible for an intelligent agent in Botworld to implement Omega's protocol.
But AIXItl cannot.
The objection goes:
Of course AIXItl can't solve this problem! You're using AIXItl wrong. What you should do is have it choose the program that will run on Rob's register machine, and then the AIXItl wins easily.
This is true: AIXItl outside of Botworld designing the program that Rob runs can indeed write a program that wins in the Precommitment game. AIXItl's failure only occurs when we physically implement it inside the environment.
But in the real world, any agent that we build will be embodied. AIXItl is a very intelligent agent, but when embodied, it fails in games that violate its "Cartesian" assumptions. The Precommitment game is one example of a specific game in a concrete universe where intelligent programs in general can be expected to succeed, but where AIXItl fails.
You're not being fair! When AIXItl is embedded in the environment, its source code is part of its output. You forgot to make Rob's non-output registers be part of AIXItl's output channel. Those other registers matter explicitly in this game, so of course AIXItl couldn't win.
Yes, precisely! This is the point I'm trying to make.
AIXItl fails in this situation only because there is an output (its source code) that it does not control via its output channel. That point is worth repeating: AIXItl has a program output (the program itself) that it cannot control; and thus it should come as no surprise that in situations where the ignored output matters, AIXItl can perform poorly.
In some games, embodied agents must modify their own source code to succeed. AIXItl lacks this ability. Therefore, among embodied agents, AIXItl is not superior to every other agent bounded by time t and length l.
Intuitively, this limitation could be addressed by hooking up the AIXItl's output channel to its source code. Unfortunately, if you do that, the resulting formalism is no longer AIXItl.
This is not just a technical quibble: We can say many useful things about AIXI, such as "the more input it gets the more accurate its environment model becomes". On the other hand, we can't say much at all about an agent that chooses its new source code: we can't even be sure whether the new agent will still have an environment model!
It may be possible to give an AIXItl variant access to its program registers and then train it such that it acts like an AIXItl most of the time, but such that it can also learn to win the Precommitment game. However, it’s not immediately obvious to us how to do this, or even whether it can be done. This is a possibility that we'd be interested in studying further.
Are these scenarios relevant in practice? AIXItl fails at the Precommitment game, but only because Omega reads Rob and requires that Rob's source code has some specific form. Such games seem unrealistic.
However, AIXItl failed not because of Omega's sharp demands, but because its implementation was relevant to the problem in an abstract sense. Other scenarios can cause AIXItl to fail without giving any agent the power to read its source code: simple correlates such as heat emanating from the AIXItl can be used to give it similar trouble.
For example, consider the HeatingUp game:
There are two agents, Rob and Omega. Rob can either Cooperate or Exploit, but figuring out how to exploit requires significant computation. Omega can either Cooperate or Defect. If both agents cooperate, they each get 2 utility. If Rob exploits and Omega cooperates, Rob gets 3 and Omega gets 0. If Omega defects, then both agents get 1 utility regardless of Rob's move.
Omega Cooperate Defect ┌──────────┬────────┐ Cooperate │ (2, 2) │ (1, 1) │ Rob ├──────────┼────────┤ Exploit │ (3, 0) │ (1, 1) │ └──────────┴────────┘
The trick in this game is that Rob can only unlock the 'exploit' payoff after doing an expensive computation. Because Rob is an actual physical machine, this expensive computation may generate a significant change in temperature that Omega can detect. A paranoid Omega may resolve to defect if and only if Rob heats up. An intelligent agent that knows Omega acts in this way should take care to cooperate without doing any expensive computations.
An AIXItl, which loops over a vast number of Turing machines, can't even consider suspending expensive computations for the duration of the game.
This is but a minor example of a real-world situation where an agent that does not need to increase its intelligence nevertheless encounters a scenario where it needs to modify its source code.
Though such scenarios may be rare, they exist, and seem reasonable. If we think that an intelligent agent intuitively should be able to achieve a payoff of 2 in the HeatingUp game, then we admit that AIXItl fails to capture some desirable aspects of intelligence.
This is not a dismissal of AIXItl, by any means: the AIXI model is a useful formalism of general intelligence. Rather, games such as the Precommitment game and the HeatingUp game demonstrate that the AIXI model fails to capture certain salient aspects of intelligence. (The aspects that it fails to capture happen to be particularly important to MIRI, as reasoning about self-modification is particularly important for any agent capable of undergoing an intelligence explosion.)
Unfortunately, it's not clear how to modify the AIXI formalism to allow AIXItl to reason about its own code without losing many of the properties that made AIXItl nice to deal with in the first place. For this reason, we've been focusing on toy models that capture different features of intelligence, such as Orseau and Ring's space-time embedded intelligence. (Benja and I discuss a variant of this formalism in the paper Problems of self-reference in self-improving space-time embedded intelligence.)
AIXI is a useful model, but it simply doesn't capture one part of the problem space which we expect to be important for developing an AGI: namely, it does not lend itself to the study of self-modification or self-reference. Perhaps a variant of AIXI could be made to succeed in situations such as the Precommitment game or the HeatingUp game: this is an interesting area of study, and one where we'd be delighted to collaborate with others.
AIXI as an Ideal
AIXI is an impressive model of machine intelligence. If we could implement a physical AIXItl, it would be an extraordinarily powerful agent. However, the Precommitment game and the HeatingUp game demonstrate that while the model is useful, a physical AIXItl would not be literally ideal. Intuitively, an intelligent agent should be able to succeed in these games, but an embodied AIXItl cannot. A good approximation of AIXI would be competent indeed, but it's important to notice that the field of AGI doesn't reduce to building better and better approximations of AIXI. An embodied AIXItl doesn't act how we want intelligent agents to act: the model makes certain faulty assumptions about the environment that can get embodied AIXIs into trouble.
One might object that AIXI is not meant to be constructed in the universe, as doing so violates the assumption that AIXI is separate from its environment. Instead, the formalism can be used to define a formal measure of intelligence: in any scenario, we can check how well an agent in the environment does compared to a theoretical AIXI outside the environment using a hypercomputer. The closer the real agent approximates the hypothetical AIXI, the higher its Legg-Hutter intelligence score.
However, the Legg-Hutter intelligence metric as specified assumes that agents are separated from their environment, and thus does not directly apply to embodied agents. It may be possible to modify the metric to work on embodied agents, but it is not clear how to do so in general, and this seems especially difficult in situations requiring self-modification. Nevertheless, I have some ideas that I hope to explore in future posts.
Regardless of how useful the Legg-Hutter intelligence metric is for embodied agents, the point stands that there are scenarios where an embodied AIXItl would fail systematically. These failures are a research topic in their own right: while at MIRI we are inclined to use models of intelligence that are designed specifically to study self-modification, it is worth considering whether the AIXI formalism can be modified so that some variant of AIXItl performs well in scenarios where the agent's source code affects the environment. Study could lead to variations that handle not only simple games like the Precommitment game, but also more complex scenarios involving self-reference or multiple agents. We'd be interested to study such variations with others who are interested in AIXI.