I'm pleased to announce a new paper from MIRI: Formalizing Two Problems of Realistic World Models.
Abstract:
An intelligent agent embedded within the real world must reason about an environment which is larger than the agent, and learn how to achieve goals in that environment. We discuss attempts to formalize two problems: one of induction, where an agent must use sensory data to infer a universe which embeds (and computes) the agent, and one of interaction, where an agent must learn to achieve complex goals in the universe. We review related problems formalized by Solomonoff and Hutter, and explore challenges that arise when attempting to formalize analogous problems in a setting where the agent is embedded within the environment.
This is the fifth of six papers discussing active research topics that we've been looking into at MIRI. It discusses a few difficulties that arise when attempting to formalize problems of induction and evaluation in settings where an agent is attempting to learn about (and act upon) a universe from within. These problems have been much discussed on LessWrong; for further reading, see the links below. This paper is intended to better introduce the topic, and motivate it as relevant to FAI research.
- Intelligence Metrics with Naturalized Induction using UDT
- Building Phenomenological Bridges
- Failures of an Embodied AIXI
- The Naturalized Induction wiki page
The (rather short) introduction to the paper is reproduced below.
Suppose that instead of having well-defined actions AIXI only has access to observables and its reward function. It might seem hopeless, but consider the subset of environments containing an implementation of a UTM which is evaluating T_A, a Turing machine implementing action-less AIXI, in which the implementation of the UTM has side effects in the next turn of the environment. This embeds AIXI-actions as side effects of an actual implementation of AIXI running as T_A on a UTM in the set of environments with observables matching those that the abstract AIXI-like algorithm observes. To maximize the reward function the agent must recognize its implementation embedded in the UTM in M and predict the consequences of the side effects of various choices it could make, substituting its own output for the output of the UTM in the next turn of the simulated environment (to avoid recursively simulating itself), choosing the side effects to maximize rewards.
In this context counterfactuals are simulations of the next turns of M resulting from the possible side effects of its current UTM. To be useful there must be a relation from computation-suffixes of T_A to the potential side effects of the UTM. In other words the agent must be able to cause a specific side effect as a result of state-transitions or tape operations performed causally-after it has determined which side effect will maximize rewards. This could be as straightforward as the UTM using the most-recently written bits on the tape to choose a side effect.
In the heating game, the UTM running T_A must be physically implemented as something that has a side effect corresponding to temperature, which causally effects the box of rewards, and all these causes must be predictable from the observables accessible to T_A in the UTM. Similarly, if there is an anvil suspended above a physical implementation of a UTM running T_A, the agent can avoid an inability to increase rewards in the future of environments in which the anvil is caused to drop (or escape from the UTM before dropping it).
This reduces the naturalized induction problem to the tiling/consistent reflection problem; the agent must choose which agent it wants to be in the next turn(s) through side effects that can change its future implementation.
Thanks for the comments, this is an interesting line of reasoning :-)
An AIXI that takes no actions is just a Solomonoff inductor, and this might give you some intuition for why, if you embed AIXI-without-actions into a UTM with side effects, you won't end up with anything approaching good behavior. In each turn, it will run all environment hypotheses and update its distribution to be consistent with observation -- and then do nothing. It won't be able to "learn" how to manage the "side effects"; AIXI is simply not an algorithm that atte... (read more)