The Blue-Minimizing Robot

Scott Alexander

336 The Blue-Minimizing Robot

4th Jul 2011

4 min read

336

Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.

Watching the robot's behavior, we would conclude that this is a robot that destroys blue objects. Maybe it is a surgical robot that destroys cancer cells marked by a blue dye; maybe it was built by the Department of Homeland Security to fight a group of terrorists who wear blue uniforms. Whatever. The point is that we would analyze this robot in terms of its goals, and in those terms we would be tempted to call this robot a blue-minimizer: a machine that exists solely to reduce the amount of blue objects in the world.

Suppose the robot had human level intelligence in some side module, but no access to its own source code; that it could learn about itself only through observing its own actions. The robot might come to the same conclusions we did: that it is a blue-minimizer, set upon a holy quest to rid the world of the scourge of blue objects.

But now stick the robot in a room with a hologram projector. The hologram projector (which is itself gray) projects a hologram of a blue object five meters in front of it. The robot's camera detects the projector, but its RGB value is harmless and the robot does not fire. Then the robot's camera detects the blue hologram and zaps it. We arrange for the robot to enter this room several times, and each time it ignores the projector and zaps the hologram, without effect.

Here the robot is failing at its goal of being a blue-minimizer. The right way to reduce the amount of blue in the universe is to destroy the projector; instead its beams flit harmlessly through the hologram.

Again, give the robot human level intelligence. Teach it exactly what a hologram projector is and how it works. Now what happens? Exactly the same thing - the robot executes its code, which says to scan the room until its camera registers blue, then shoot its laser.

In fact, there are many ways to subvert this robot. What if we put a lens over its camera which inverts the image, so that white appears as black, red as green, blue as yellow, and so on? The robot will not shoot us with its laser to prevent such a violation (unless we happen to be wearing blue clothes when we approach) - its entire program was detailed in the first paragraph, and there's nothing about resisting lens alterations. Nor will the robot correct itself and shoot only at objects that appear yellow - its entire program was detailed in the first paragraph, and there's nothing about correcting its program for new lenses. The robot will continue to zap objects that register a blue RGB value; but now it'll be shooting at anything that is yellow.

The human-level intelligence version of the robot will notice its vision has been inverted. It will know it is shooting yellow objects. It will know it is failing at its original goal of blue-minimization. And maybe if it had previously decided it was on a holy quest to rid the world of blue, it will be deeply horrified and ashamed of its actions. It will wonder why it has suddenly started to deviate from this quest, and why it just can't work up the will to destroy blue objects anymore.

The robot goes to Quirinus Quirrell, who explains that robots don't really care about minimizing the color blue. They only care about status and power, and pretend to care about minimizing blue in order to impress potential allies.

The robot goes to Robin Hanson, who explains that there are really multiple agents within the robot. One of them wants to minimize the color blue, the other wants to minimize the color yellow. Maybe the two of them can make peace, and agree to minimize yellow one day and blue the next?

The robot goes to Anna Salamon, who explains that robots are not automatically strategic, and that if it wants to achieve its goal it will have to learn special techniques to keep focus on it.

I think all of these explanations hold part of the puzzle, but that the most fundamental explanation is that the mistake began as soon as we started calling it a "blue-minimizing robot". This is not because its utility function doesn't exactly correspond to blue-minimization: even if we try to assign it a ponderous function like "minimize the color represented as blue within your current visual system, except in the case of holograms" it will be a case of overfitting a curve. The robot is not maximizing or minimizing anything. It does exactly what it says in its program: find something that appears blue and shoot it with a laser. If its human handlers (or itself) want to interpret that as goal directed behavior, well, that's their problem.

It may be that the robot was created to achieve a specific goal. It may be that the Department of Homeland Security programmed it to attack blue-uniformed terrorists who had no access to hologram projectors or inversion lenses. But to assign the goal of "blue minimization" to the robot is a confusion of levels: this was a goal of the Department of Homeland Security, which became a lost purpose as soon as it was represented in the form of code.

The robot is a behavior-executor, not a utility-maximizer.

In the rest of this sequence, I want to expand upon this idea. I'll start by discussing some of the foundations of behaviorism, one of the earliest theories to treat people as behavior-executors. I'll go into some of the implications for the "easy problem" of consciousness and philosophy of mind. I'll very briefly discuss the philosophical debate around eliminativism and a few eliminativist schools. Then I'll go into why we feel like we have goals and preferences and what to do about them.

Ontological CrisisMotivations

Frontpage

336

New Comment

Rendering 0/160 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 9:26 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

336 The Blue-Minimizing Robot

by Scott Alexander

4th Jul 2011

4 min read

160

336

Ontological CrisisMotivations

Frontpage

336

Mentioned in

137Secrets of the eliminati

131The Library of Scott Alexandria

130Urges vs. Goals: The analogy to anticipation and belief

91An artificially structured argument for expecting AGI ruin

82To what degree do we have goals?

Load More (5/19)

New Comment

Rendering 0/160 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 9:26 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Scott Alexander

Curated and popular this week

160Comments

160

Comment Permalink

Richard_Kennaway15y151

When you say the robot has a "different goal", I'm not sure what you mean. What is the robot's goal? To follow the program detailed in the first paragraph?

The robot's goal is not to follow its own program. The program is simply what the robot does. In the environment it is designed to operate in, what it does is destroy blue objects. In the vocabulary of control theory, the controlled variable is the number of blue objects, the reference value is zero, the difference between the two is the error, firing the laser is the action is takes when the error is positive, and the action has the effect of reducing the error. The goal, as with any control system, is to keep the error at zero. It does not have an additional goal of being the best destroyer of blue objects possible. Its designers might have that goal, but if so, that goal is in the designers, not in the system they have designed.

In an environment containing blue objects invulnerable to laser fire, the robot will fail to control the number of blue objects at zero. That does not make it not a control system, just a control system encountering disturbances it is unable to control. To ask whether it is still a control system veers into a purely verbal argument, like asking whether a table is still a table if one leg has broken off and it cannot stand upright.

People are more complex. They have (according to PCT) a large hierarchy of control systems (very broad, but less than a dozen levels deep), in which the reference signal for each controller is set by the output signals of higher level controllers. (At the top, reference signals are presumably hard-wired, and at the bottom, output signals go to organs not made of neurons -- muscles, mainly.) In addition, the hierarchy is subject to reorganisation and other forms of adaptation. The adaptations present to consciousness are the ability to think about our goals, consider whether we are doing the best things to achieve them, and change what we are doing. The robot in the example cannot do this.

You might be thinking of "goal" as meaning this sort of conscious, reflective, adaptive attempt to achieve what we "really" want, but I find that too large and fuzzy a concept. It leads into a morass of talk about our "real" goals vs. the goals we think we have, self-reflexive decision theory, extreme thought experiments, and so on. A real science of living things has to start smaller, with theories and observations that can be demonstrated as surely and reproducibly as the motion of balls rolling down inclined planes.

(ETA: When the neuroscience fails to discover this huge complex thing that never carved reality at the joints in the first place, people respond by saying it doesn't exist, that it went the way of kobolds rather than rainbows.)

Maybe you're also thinking of this robot's program as a plain stimulus-response system, as in the behaviourist view of living systems. But what makes it a control system is the environment it is embedded in, an environment in which shooting at blue objects destroys them.

If goals reduce to a program like the robot's in any way, it's in the way that Einsteinian mechanics "reduce" to Newtonian mechanics - giving good results in most cases but being fundamentally different and making different predictions on border cases.

If I replace "program" by "behaviourism", then I would say that it is behaviourism that is explained away by PCT.

Scott Alexander15y120

Now I'm very confused. I understand that you think humans are PCT systems and that you have some justifications for that. But unlike humans, we know exactly what motivates this robot (the program in the first paragraph) and it doesn't contain a controlled variable corresponding to the number of blue objects, or anything else that sounds PCT.

So are you saying that any program can be modeled by PCT better than by looking at the program itself, or that although this particular robot isn't PCT, a hypothetical robot that was more reflective of real human behav... (read more)

See in context