The Blue-Minimizing Robot

Scott Alexander

Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.

Watching the robot's behavior, we would conclude that this is a robot that destroys blue objects. Maybe it is a surgical robot that destroys cancer cells marked by a blue dye; maybe it was built by the Department of Homeland Security to fight a group of terrorists who wear blue uniforms. Whatever. The point is that we would analyze this robot in terms of its goals, and in those terms we would be tempted to call this robot a blue-minimizer: a machine that exists solely to reduce the amount of blue objects in the world.

Suppose the robot had human level intelligence in some side module, but no access to its own source code; that it could learn about itself only through observing its own actions. The robot might come to the same conclusions we did: that it is a blue-minimizer, set upon a holy quest to rid the world of the scourge of blue objects.

But now stick the robot in a room with a hologram projector. The hologram projector (which is itself gray) projects a hologram of a blue object five meters in front of it. The robot's camera detects the projector, but its RGB value is harmless and the robot does not fire. Then the robot's camera detects the blue hologram and zaps it. We arrange for the robot to enter this room several times, and each time it ignores the projector and zaps the hologram, without effect.

Here the robot is failing at its goal of being a blue-minimizer. The right way to reduce the amount of blue in the universe is to destroy the projector; instead its beams flit harmlessly through the hologram.

Again, give the robot human level intelligence. Teach it exactly what a hologram projector is and how it works. Now what happens? Exactly the same thing - the robot executes its code, which says to scan the room until its camera registers blue, then shoot its laser.

In fact, there are many ways to subvert this robot. What if we put a lens over its camera which inverts the image, so that white appears as black, red as green, blue as yellow, and so on? The robot will not shoot us with its laser to prevent such a violation (unless we happen to be wearing blue clothes when we approach) - its entire program was detailed in the first paragraph, and there's nothing about resisting lens alterations. Nor will the robot correct itself and shoot only at objects that appear yellow - its entire program was detailed in the first paragraph, and there's nothing about correcting its program for new lenses. The robot will continue to zap objects that register a blue RGB value; but now it'll be shooting at anything that is yellow.

The human-level intelligence version of the robot will notice its vision has been inverted. It will know it is shooting yellow objects. It will know it is failing at its original goal of blue-minimization. And maybe if it had previously decided it was on a holy quest to rid the world of blue, it will be deeply horrified and ashamed of its actions. It will wonder why it has suddenly started to deviate from this quest, and why it just can't work up the will to destroy blue objects anymore.

The robot goes to Quirinus Quirrell, who explains that robots don't really care about minimizing the color blue. They only care about status and power, and pretend to care about minimizing blue in order to impress potential allies.

The robot goes to Robin Hanson, who explains that there are really multiple agents within the robot. One of them wants to minimize the color blue, the other wants to minimize the color yellow. Maybe the two of them can make peace, and agree to minimize yellow one day and blue the next?

The robot goes to Anna Salamon, who explains that robots are not automatically strategic, and that if it wants to achieve its goal it will have to learn special techniques to keep focus on it.

I think all of these explanations hold part of the puzzle, but that the most fundamental explanation is that the mistake began as soon as we started calling it a "blue-minimizing robot". This is not because its utility function doesn't exactly correspond to blue-minimization: even if we try to assign it a ponderous function like "minimize the color represented as blue within your current visual system, except in the case of holograms" it will be a case of overfitting a curve. The robot is not maximizing or minimizing anything. It does exactly what it says in its program: find something that appears blue and shoot it with a laser. If its human handlers (or itself) want to interpret that as goal directed behavior, well, that's their problem.

It may be that the robot was created to achieve a specific goal. It may be that the Department of Homeland Security programmed it to attack blue-uniformed terrorists who had no access to hologram projectors or inversion lenses. But to assign the goal of "blue minimization" to the robot is a confusion of levels: this was a goal of the Department of Homeland Security, which became a lost purpose as soon as it was represented in the form of code.

The robot is a behavior-executor, not a utility-maximizer.

In the rest of this sequence, I want to expand upon this idea. I'll start by discussing some of the foundations of behaviorism, one of the earliest theories to treat people as behavior-executors. I'll go into some of the implications for the "easy problem" of consciousness and philosophy of mind. I'll very briefly discuss the philosophical debate around eliminativism and a few eliminativist schools. Then I'll go into why we feel like we have goals and preferences and what to do about them.

I agree with you that behaviorism and PCT are different, which is why I don't understand why you're interpreting the robot as PCT and not behaviorist. From the program, it seems pretty clearly (STIMULUS: see blue -> RESPONSE: fire laser) to me.

Well, your robot example was an intuition pump constructed so as to be as close as possible to stimulus-response nature. If you consider something only slightly more complicated the distinction may become clearer: a room thermostat. Physically ripped out of its context, you can see it as a stimulus-response device. Temperature at sensor goes above threshold --> close a switch, temperature falls below threshold --> open the switch. You can set the temperature of the sensor to anything you like, and observe the resulting behaviour of the switch. Pure S-R.

In context, though, the thermostat has the effect of keeping the room temperature constant. You can no longer set the temperature of the sensor to anything you like. Put a candle near it, and the temperature of the rest of the room will fall while the sensor remains at a constant temperature. Use a strong enough heat source or cold source, and you will be able to overwhelm the control system's efforts to maintain a constant temperature, but this fails to tell you anything about how the control system works normally. Do the analogous thing to a living organism and you either kill it or put it under such stress that whatever you observe is unlikely to tell you much about its normal operation -- and biology and psychology should be about how organisms work, not how they fail under torture.

Did you know that lab rats are normally starved until they have lost 20% of their free-feeding weight, before using them in behavioural experiments?

Here's a general block diagram of a control system. The controller is the part above the dotted line and its environment the part below (what would be called the plant in an industrial context). R = reference, P = perception, O = output, D = disturbance (everything in the environment besides O that affects the perception). I have deliberately drawn this to look symmetrical, but the contents of those two boxes makes its functioning asymmetrical. P remains close to R, but O and D need have no visible relationship at all.

                  R |
                    |
                    V
                +-------+
                |       |
           +--->|       |----+
           |    |       |    |
           ^    +-------+    v
           |                 |
....... P  | ............... | O .......
           |                 |
           ^    +-------+    v
           |    |       |    |
           +----|       |<---+
                |       |
                +-------+
                    ^
                    |
                  D |

When you are dealing with a living organism, R is somewhere inside it. You probably cannot measure it even if you know it exist. (E.g. just what and where, physically, is the set point for deep body temperature in a mammal? Not an easy question to answer.) You may or may not know what P is -- what the organism is actually sensing. It is important to realise that when you perform an experiment on an animal, you have no way of setting P. All you can do is create a disturbance D that may influence P. D, from a behavioural point of view, is the "stimulus" and O, the creature's action on its environment, is the "response". the behaviourist description of the situation is this:

                +-------+
            D   |       |   O
          ----->|       |----->
                |       |
                +-------+

This is simply wrong. The system does not work like that and cannot be understood like that. It may look as if D causes O, but that is like thinking that a candle put in a certain place chills the room, a fact that will seem mysterious and paradoxical when you do not know that the thermostat is present, and will only be explained by discovering the actual mechanism, discarding the second diagram in favour of the first. No amount of data collection will help until one has made that change. This is why correlations are so lamentably low in psychological experiments.

Do you have GChat or any kind of instant messenger?

No, I've never used any of those systems. I prefer a medium in which I can take my time to work out exactly what I want to say.

Okay, we agree that the simple robot described here is behaviorist and the thermostat is PCT. And I certainly see where you're coming from with the rats being PCT because hunger only works as a motivator if you're hungry. But I do have a few questions:

There are some things behaviorism can explain pretty well that I don't know how to model in PCT. For example, consider heroin addiction. An animal can go its whole life not wanting heroin until it's exposed to some. Then suddenly heroin becomes extraordinarily motivating and it will preferentially choose sh

... (read more)

3Perplexed15y

Outstanding comment - particularly the point at the end about the candle cooling the room. It might be worthwhile to produce a sequence of postings on the control systems perspective - particularly if you could use better-looking block diagrams as illustrations. :)

336

The Blue-Minimizing Robot

336

336

336

The Blue-Minimizing Robot

336

336