Kaj_Sotala comments on New forum for MIRI research: Intelligent Agent Foundations Forum - Less Wrong

36 Post author: orthonormal 20 March 2015 12:35AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (43)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 26 March 2015 07:59:14PM 2 points [-]

Basically time spent now on unbounded AIXI constructs is probably completely wasted. Real AGIs don't have Solomonoff inductors or anything resembling them.

I wouldn't say that the time studying AIXI-like models is completely wasted, even if real AGIs turned out to have very little to do with AIXI. Even if AIXI approximation isn't the way that actual AGI will be built, to the extent that the behavior of a rational agent resembles the model of AIXI, studying models of AIXI can still give hints of what need to be considered in AGI design. lukeprog and Bill Hibbard advanced this argument in Exploratory Engineering in AI:

...some experts think AIXI approximation isn’t a fruitful path toward human-level AI. Even if that’s true, AIXI is the first model of cross-domain intelligent behavior to be so completely and formally specified that we can use it to make formal arguments about the properties which would obtain in certain classes of hypothetical agents if we could build them today. Moreover, the formality of AIXI-like agents allows researchers to uncover potential safety problems with AI agents of increasingly general capability—problems which could be addressed by additional research, as happened in the field of computer security after Lampson’s article on the confinement problem.

AIXI-like agents model a critical property of future AI systems: that they will need to explore and learn models of the world. This distinguishes AIXI-like agents from current systems that use predefined world models, or learn parameters of predefined world models. Existing verification techniques for autonomous agents (Fisher, Dennis, and Webster 2013) apply only to particular systems, and to avoiding unwanted optima in specific utility functions. In contrast, the problems described below apply to broad classes of agents, such as those that seek to maximize rewards from the environment.

For example, in 2011 Mark Ring and Laurent Orseau analyzed some classes of AIXIlike agents to show that several kinds of advanced agents will maximize their rewards by taking direct control of their input stimuli (Ring and Orseau 2011). To understand what this means, recall the experiments of the 1950s in which rats could push a lever to activate a wire connected to the reward circuitry in their brains. The rats pressed the lever again and again, even to the exclusion of eating. Once the rats were given direct control of the input stimuli to their reward circuitry, they stopped bothering with more indirect ways of stimulating their reward circuitry, such as eating. Some humans also engage in this kind of “wireheading” behavior when they discover that they can directly modify the input stimuli to their brain’s reward circuitry by consuming addictive narcotics. What Ring and Orseau showed was that some classes of artificial agents will wirehead—that is, they will behave like drug addicts.

Fortunately, there may be some ways to avoid the problem. In their 2011 paper, Ring and Orseau showed that some types of agents will resist wireheading. And in 2012, Bill Hibbard (2012) showed that the wireheading problem can also be avoided if three conditions are met: (1) the agent has some foreknowledge of a stochastic environment, (2) the agent uses a utility function instead of a reward function, and (3) we define the agent’s utility function in terms of its internal mental model of the environment. Hibbard’s solution was inspired by thinking about how humans solve the wireheading problem: we can stimulate the reward circuitry in our brains with drugs, yet most of us avoid this temptation because our models of the world tell us that drug addiction will change our motives in ways that are bad according to our current preferences.

Relatedly, Daniel Dewey (2011) showed that in general, AIXI-like agents will locate and modify the parts of their environment that generate their rewards. For example, an agent dependent on rewards from human users will seek to replace those humans with a mechanism that gives rewards more reliably. As a potential solution to this problem, Dewey proposed a new class of agents called value learners, which can be designed to learn and satisfy any initially unknown preferences, so long as the agent’s designers provide it with an idea of what constitutes evidence about those preferences.