I'm pleased to announce a new paper from MIRI: Formalizing Two Problems of Realistic World Models.
Abstract:
An intelligent agent embedded within the real world must reason about an environment which is larger than the agent, and learn how to achieve goals in that environment. We discuss attempts to formalize two problems: one of induction, where an agent must use sensory data to infer a universe which embeds (and computes) the agent, and one of interaction, where an agent must learn to achieve complex goals in the universe. We review related problems formalized by Solomonoff and Hutter, and explore challenges that arise when attempting to formalize analogous problems in a setting where the agent is embedded within the environment.
This is the fifth of six papers discussing active research topics that we've been looking into at MIRI. It discusses a few difficulties that arise when attempting to formalize problems of induction and evaluation in settings where an agent is attempting to learn about (and act upon) a universe from within. These problems have been much discussed on LessWrong; for further reading, see the links below. This paper is intended to better introduce the topic, and motivate it as relevant to FAI research.
- Intelligence Metrics with Naturalized Induction using UDT
- Building Phenomenological Bridges
- Failures of an Embodied AIXI
- The Naturalized Induction wiki page
The (rather short) introduction to the paper is reproduced below.
Thanks for the comments, this is an interesting line of reasoning :-)
An AIXI that takes no actions is just a Solomonoff inductor, and this might give you some intuition for why, if you embed AIXI-without-actions into a UTM with side effects, you won't end up with anything approaching good behavior. In each turn, it will run all environment hypotheses and update its distribution to be consistent with observation -- and then do nothing. It won't be able to "learn" how to manage the "side effects"; AIXI is simply not an algorithm that attempts to do any such thing.
You're correct, though, that examining a setup where a UTM has side effects (and picking an algorithm to run on that UTM) is indeed a way to examine the "naturalized" problems. In fact, this idea is very similar to Orseau and Ring's space-time embedded intelligence formalism. The big question here (which we must answer in order to be able to talk about which algorithms "perform well") is what distribution over environments an agent will be rated against.
I'm not exactly sure how you would formalize this. Say you have a machine M implemented by a UTM which has side effects on the environment. M is doing internal predictions but has no outputs. There's this thing you could do which is predict what would happen from running M (given your uncertainty about how the side effects work), but that's not a counterfactual, that's a prediction: constructing a counterfactual would require considering different possible computations that M could execute. (There are easy ways to cash out this sort of counterfactual using CDT or EDT, but you run into the usual logical counterfactual problems if you try to construct these sorts of counterfactuals using UDT, as far as I can tell.)
Yeah, once you figure out which distribution over environments to score against and how to formalize your counterfactuals, the problem reduces to "pick the action with the best future side effects", which throws you directly up against the Vingean reflection problem in any environment where your capabilities include building something smarter than you :-)