The robot is not consequentialist, its decisions are not controlled by the dependence of facts about the future on its decisions.
Good point, but the fact that humans are consequentialists (at least partly) doesn't seem to make the problem much easier. Suppose we replace Yvain's blue-minimizer robot with a simple consequentialist robot that has the same behavior (let's say it models the world as a 2D grid of cells that have intrinsic color, it always predicts that any blue cell that it shoots at will turn some other color, and its utility function assigns negative utility to the existence of blue cells). What does this robot "actually want", given that the world is not really a 2D grid of cells that have intrinsic color?
What does this robot "actually want", given that the world is not really a 2D grid of cells that have intrinsic color?
Who cares about the question what the robot "actually wants"? Certainly not the robot. Humans care about the question what they "actually want", but that's because they have additional structure that this robot lacks. But with humans, you're not limited to just looking at what they do on auto-pilot; instead, you can just ask the aforementioned structure when you run into problems like this. For example, if you asked me what I really wanted under some weird ontology change, I could say, "I have some guesses, but I don't really know; I would like to defer to a smarter version of me". That's how I understand preference extrapolation: not as something that looks at what your behavior suggests that you're trying to do and then does it better, but as something that poses the question of what you want to some system you'd like to answer the question for you.
It looks to me like there's a mistaken tendency among many people here, including some very smart people, to say that I'd be irrational to let my stated preferences deviate fr...
In other words, our "actual values" come from our being philosophers, not our being consequentialists.
It seems plausible to me, and I'm not sure that "many" others do disagree with you.
The conclusion I'd draw from this essay is that one can't necessarily derive a "goal" or a "utility function" from all possible behavior patterns. If you ask "What is the robot's goal?", the answer is, "it doesn't have one," because it doesn't assign a total preference ordering to states of the world. At best, you could say that it prefers state [I SEE BLUE AND I SHOOT] to state [I SEE BLUE AND I DON'T SHOOT]. But that's all.
This has some implications for AI, I think. First of all, not every computer program has a goal or a utility function. There is no danger that your TurboTax software will take over the world and destroy all human life, because it doesn't have a general goal to maximize the number of completed tax forms. Even rather sophisticated algorithms can completely lack goals of this kind -- they aren't designed to maximize some variable over all possible states of the universe. It seems that the narrative of unfriendly AI is only a risk if an AI were to have a true goal function, and many useful advances in artificial intelligence (defined in the broad sense) carry no risk of this kind.
Do humans have goals? I don't know; it...
I'll be interested to see where you go with this, but it seems to me that saying, "look, this is the program the robot runs, therefore it doesn't really have a goal", is exactly like saying "look, it's made of atoms, therefore it doesn't really have a goal".
Goals are precisely explained (like rainbows), and not explained away (like kobolds), as the controlled variables of control systems. This robot is such a system. The hypothetical goal of its designers at the Department of Homeland Security is also a goal. That does not make the robot's goal not a goal; it just makes it a different goal.
We feel like we have goals and preferences because we do, in fact, have goals and preferences, and we not only have them, but we are also aware of having them. The robot is not aware of having the goal that it has. It merely has it.
First of all, your control theory work was...not exactly what started me thinking along these lines, but what made it click when I realized the lines I had been thinking along were similar to the ones I had read about in one of your introductory posts about performing complex behaviors without representations. So thank you.
Second - When you say the robot has a "different goal", I'm not sure what you mean. What is the robot's goal? To follow the program detailed in the first paragraph?
Let's say Robot-1 genuinely has the goal to kill terrorists. If a hacker were to try to change its programming to "make automobiles" instead, Robot-1 would do anything it could to thwart the hacker; its goal is to kill terrorists, and letting a hacker change its goal would mean more terrorists get left alive. This sort of stability, in which the preference remains a preference regardless of context are characteristic of my definition of "goal".
This "blue-minimizing robot" won't display that kind of behavior. It doesn't thwart the person who places a color inversion lens on it (even though that thwarts its stated goal of "minimizing blue"), and it wouldn...
When you say the robot has a "different goal", I'm not sure what you mean. What is the robot's goal? To follow the program detailed in the first paragraph?
The robot's goal is not to follow its own program. The program is simply what the robot does. In the environment it is designed to operate in, what it does is destroy blue objects. In the vocabulary of control theory, the controlled variable is the number of blue objects, the reference value is zero, the difference between the two is the error, firing the laser is the action is takes when the error is positive, and the action has the effect of reducing the error. The goal, as with any control system, is to keep the error at zero. It does not have an additional goal of being the best destroyer of blue objects possible. Its designers might have that goal, but if so, that goal is in the designers, not in the system they have designed.
In an environment containing blue objects invulnerable to laser fire, the robot will fail to control the number of blue objects at zero. That does not make it not a control system, just a control system encountering disturbances it is unable to control. To ask whether it is still a control ...
Now I'm very confused. I understand that you think humans are PCT systems and that you have some justifications for that. But unlike humans, we know exactly what motivates this robot (the program in the first paragraph) and it doesn't contain a controlled variable corresponding to the number of blue objects, or anything else that sounds PCT.
So are you saying that any program can be modeled by PCT better than by looking at the program itself, or that although this particular robot isn't PCT, a hypothetical robot that was more reflective of real human behavior would be?
As for goals, if I understand your definition correctly, even a behaviorist system could be said to have goals (if you reinforce it every time it pulls the lever, then its new goal will be to pull a lever). If that's your definition, I agree that this robot has goals, and I would rephrase my thesis as being that those goals are not context-independent and reflective.
I agree with you that behaviorism and PCT are different, which is why I don't understand why you're interpreting the robot as PCT and not behaviorist. From the program, it seems pretty clearly (STIMULUS: see blue -> RESPONSE: fire laser) to me.
Well, your robot example was an intuition pump constructed so as to be as close as possible to stimulus-response nature. If you consider something only slightly more complicated the distinction may become clearer: a room thermostat. Physically ripped out of its context, you can see it as a stimulus-response device. Temperature at sensor goes above threshold --> close a switch, temperature falls below threshold --> open the switch. You can set the temperature of the sensor to anything you like, and observe the resulting behaviour of the switch. Pure S-R.
In context, though, the thermostat has the effect of keeping the room temperature constant. You can no longer set the temperature of the sensor to anything you like. Put a candle near it, and the temperature of the rest of the room will fall while the sensor remains at a constant temperature. Use a strong enough heat source or cold source, and you will be able to overwhelm the cont...
Okay, we agree that the simple robot described here is behaviorist and the thermostat is PCT. And I certainly see where you're coming from with the rats being PCT because hunger only works as a motivator if you're hungry. But I do have a few questions:
There are some things behaviorism can explain pretty well that I don't know how to model in PCT. For example, consider heroin addiction. An animal can go its whole life not wanting heroin until it's exposed to some. Then suddenly heroin becomes extraordinarily motivating and it will preferentially choose shots of heroin to food, water, or almost anything else. What is the PCT explanation of that?
I'm not entirely sure which correlation studies you're talking about here; most psych studies I read are done in an RCT type design and so use p-values rather than r-values; they can easily end up with p < .001 if they get a large sample and a good hypothesis. Some social psych studies work off of correlations (eg correlation between being observer-rated attractiveness and observer-rated competence at a skill); correlations are "lamentably low" in social psychology because high level processes (like opinion formation, social i
My interpretation of this interaction (which is fascinating to read, btw, because both of you are eloquently defending a cogent and interesting theory as far as I can tell) is that you've indirectly proposed Robot-1 as the initial model of an agent (which is clearly not a full model of a person and fails to capture many features of humans) in the first of a series of articles. I think Richard is objecting to the connections he presumes that you will eventually draw between Robot-1 and actual humans, and you're getting confused because you're just trying to talk about the thing you actually said, not the eventual conclusions he expects you to draw from your example.
If he's expecting you to verbally zig when you're actually planning to zag and you don't notice that he's trying to head you off at a pass you're not even heading towards, its entirely reasonable for you to be confused by what he's saying. (And if some of the audience also thinks you're going to zig they'll also see the theory he's arguing against, and see that his arguments against "your predicted eventual conclusions" are valid, and upvote his criticism of something you haven't yet said. And both of you are...
I suspect Richard would say that the robot's goal is minimizing its perception of blue. That's the PCT perspective on the behavior of biological systems in such scenarios.
This 'minimization' goal would require a brain that is powerful enough to believe that lasers destroy or discolor what they hit.
If this post were read by blue aliens that thrive on laser energy, they'd wonder they we were so confused as to the purpose of a automatic baby feeder.
This robot is not a consequentialist - it doesn't have a model of the world which allows it to extrapolate (models of) outcomes that follow causally from its choices. It doesn't seem to steer the universe any particular place, across changes of context, because it explicitly doesn't contain a future-steering engine.
What exactly is meant by the robot having a human-level intelligence? Does it have two non-interacting programs: shoot blue and think?
Ah, excellent. This post comes at a great time. A few weeks ago, I talked with someone who remarked that although decision theory speaks in terms of preferences and information being separate, trying to apply that into humans is fitting the data to the theory. He was of the opinion that humans don't really have preferences in the decision theoretic sense of the word. Pondering that claim, I came to the conclusion that he's right, and have started to increasingly suspect that CEV-like plans to figure out the "ultimate" preferences of people are somewhat misguided. Our preferences are probably hopelessly path-, situation- and information-dependent. Which is not to say that CEV would be entirely pointless - even if the vast majority of our "preferences" would never converge, there might be some that did. And of course, CEV would still be worth trying, just to make sure I'm not horribly mistaken on this.
The ease at which I accepted the claim "humans don't have preferences" makes me suspect that I've myself had a subconscious intuition to that effect for a long time, which was probably partially responsible for an unresolved disagreement between me and Vladimir Nesov earlier.
I'll be curious to hear what you have to say.
...CEV-like plans to figure out the "ultimate" preferences of people are somewhat misguided. Our preferences are probably hopelessly path-, situation- and information-dependent.
This is off-topic but since you mentioned it and since I don't think it warrants a new post, here are my latest thoughts on CEV (a convergence of some of my recent comments originally posted as a response to a post by Michael Anissimov):
Consider the difference between a hunter-gatherer, who cares about his hunting success and to become the new clan chief, and a member of lesswrong who wants to determine if a “sufficiently large randomized Conway board could turn out to converge to a barren ‘all off’ state.”
The utility of the success in hunting down animals and proving abstract conjectures about cellular automata is largely determined by factors such as your education, culture and environmental circumstances. The same hunter gatherer who cared to kill a lot of animals, to get the best ladies in its clan, might have under different circumstances turned out to be a vegetarian mathematician solely caring about his understanding of the nature of reality. Both sets of values are to some extent mutua...
... the mistake began as soon as we started calling it a "blue-minimizing robot".
Agreed. But what kind of mistake was that?
Is "This robot is a blue-minimizer" a false statement? I think not. I would classify it as more like the unfortunate selection of the wrong Kuhnian paradigm for explaining the robot's behavior. A pragmatic mistake. A mistake which does not bode well for discovering the truth, but not a mistake which involves starting from objectively false beliefs.
Why does the human-level intelligence component of the robot care about blue? It seems to me that it is mistaken in doing so. If my motor cortex was replaced by this robot's program, I would not conclude that I had suddenly started to only care about blue, I would conclude that I had lost control of my motor cortex. I don't see how it makes any difference that the robot always had it actions controlled by the blue-minimizing program. If I were the robot then, upon being informed about my design, I would conclude that I did not really care about blue. My human-level intelligence is the part that is me and therefore contains my preferences, not my motor cortex.
If my motor cortex was replaced by this robot's program, I would not conclude that I had suddenly started to only care about blue, I would conclude that I had lost control of my motor cortex.
I predict this would not happen the way you anticipate, at least for some ways to cash out 'taking control of your motor cortex'. For example, when a neurosurgeon uses a probe to stimulate a part of the motor cortex responsible for moving the arm, and eir patient's arm moves, and the neurosurgeon asks the patient why ey moved eir arm, the patient often replies something like "I had an itch", "it was uncomfortable in that position", or "What, I'm not allowed to move my arm now without getting grilled on it?"
Or in certain forms of motor cortex damage in which patients can't move their arm, they explain it by saying "I could move my arm right now, I just don't feel like it" or "That's not even my real arm, how could you expect me to move that?".
Although I won't get there in a while, part of my thesis for this sequence is that we infer our opinions from our behaviors, although it's probably more accurate to say that our behaviors feed back to the same processes that generate our opinions and can alter them. If this is true, then there are probably very subtle ways of taking control of your motor cortex that would leave your speech centers making justifications for whatever you did.
I'd be very surprised if this worked on me for more than, say, a day. Even if the intuition that I'm the one in control doesn't go away, I expect to eventually notice that it's actually false and consciously choose to not take it into account, at least in verbal reasoning. Has it been tried (on someone more qualified than a random patient)? If it doesn't work, the effect should be seen as rather more horrible than just overriding one's limb movement.
Without overfitting, the robot has the goal of shooting at what it sees blue. It achieves its goal. What I get from the article is that the human intelligence mis interprets the goal. Here I see the definition of a goal to equal what the program is written to do, hence it seems inevitable that the robot wll achieve its goal (if there is a bug in the code that misses shooting a blue object every 10 days, then this should be considered part of the goal as well, since we are forced to define the goal in hindsight, if we have to define one)
Does that "human level intelligence module" have any ability to actually control the robot's actions, or just to passively observe and then ask "why did I do that?" What're the rules of the game, as such, here?
this is all a metaphor for people
One that presents consciousness as an epiphenomenon. In the version of the robot that has human intelligence, you describe it as bolted on, experiencing the robot's actions but having no causal influence on them, an impotent spectator.
Are your projected postings going to justify this hypothesis?
One of the obvious extensions of this thought experiment is to posit a laser-powered blue goo that absorbs laser energy, and uses it to grow larger.
This thought experiment also reminds me: Omohundro's arguments regarding likely uFAI behavior are based on the AI having goals of some sort - that is, something we would recognize as goals. It's entirely possible that we wouldn't perceive it as having goals at all, merely behavior.
Is there a sense in which we can conclude that the robot is a blue-minimizing robot, in which we can't also conclude that it's an object-minimizing robot that happens to be optimized for situations where most of the objects are blue or most of the backgrounds are non-blue? (Perhaps it's one of a set of robots, or perhaps the ability to change its filter is an intentional feature.)
A couple of points here. First, as other people seem to have indicated, there does seem to be a problem with saying 'the robot has human level intelligence/self-reflective insight' and simultaneously that it carries out its programming with regards to firing lasers at percepts which appear blue unreflectively, in so far as it would seem that the former would entail that the latter would /not/ be done unreflectively. What you have here are two seperate and largely unintegrated cognitive systems one of which ascribes functional-intentional properties to thin...
Shouldn't the human intelligence part be considered part of the source code? With its own goals/value functions? Otherwise it will be just a human watchig the robot kind of thing.
I wonder how many people upvoted this post less for the ideas expressed and more because they like robots.
I think I upvoted it for the ideas, but can't honestly guarantee that the "oooh, shiny cool robot analogy appeals to my geeky heart" factor wouldn't have made me upvote even if I found the ideas uninteresting.
I wonder how many people upvoted this post less for the ideas expressed and more because they like robots.
I wonder how many people upvoted this post less for the ideas expressed and more because they like Yvain.
I like how Yvain creates clarity. This takes a lot of effort. I'd like to encourage his effort.
I haven't thought much about why it feels like we have goals (desires), so I look forward to that! I do think it's quite possible that eliminativism about 'beliefs' and 'desires' will turn out to be the best way to go. Certainly, the language of 'reinforcers' fits well with our understanding of the reward-learning system in the brain.
I consider all of the behaviors you describe as basically transform functions. In fact, I consider any decision maker a type of transform function where you have input data that is run through a transform function (such as a behavior-executor, utility-maximizer, weighted goal system, a human mind, etc.) and output data is generated (and in the case of humans sent to our muscles, organs, etc.). The reason I mention this is that trying to describe a human's transform function (i.e., what people normally call their mind) as mostly a behavior-executor or jus...
Well-written and insightful. This post reminds me very much of the free will sequence, except that the reduction is on the level of genes and the adaptations they code for rather than physical laws. I look forward to seeing the rest of the sequence, and I'm interested to see how you will dissolve the feeling that our behavior and actions are goal-oriented.
The entire example is deeply misleading. We model the robot as a fairly stupid blue minimizer because this seems to be a good succinct description of the robots entire externally observable behavior and would cease to do so if it also had a speaker or display window with which it communicated it's internal reflections.
So to retain the intuitive appeal of describing the robot as a blue minimizer the robots human level intelligence must be walled off inside the robot unable to effectively signal to the outside world. But so long as the human level intellig...
EDIT: It just clicked after finishing my thought.
If its human handlers (or itself) want to interpret that as goal directed behavior, well, that's their problem.
I was thrown off by all the comments about the robot and its behavior. This is more about the comparison of behavior-executor vs. utility-maximizer, not the robot. EDIT:
Perhaps I am missing the final direction of this conversation, but I think the intelligence involved in the example has mapped the terrain and failing to update the map once it has been seen to not be correct.
...Watching the robot
The human-level intelligence version of the robot will notice its vision has been inverted. It will know it is shooting yellow objects. It will know it is failing at its original goal of blue-minimization.
I don't see that, why would it "care" if its goal isn't complex enough to allow it to care about the subversion of its sensors? I mean, the level of intelligence seems irrelevant here. Intelligence isn't even instrumental to such simple goals because all it "wants" is to fire a laser at blue objects. Its utility function says nothing about maximizing its efficiency or anything like that.
Take three common "broad" or "generally-categorizable" demographics of minds: Autistic people, Empaths (lots of mirror neurons dedicated to modeling the behavior of others), Sociopaths, or "Professional Psychopaths" (high-functioning without mirror neurons, responsible for most systemic destruction, precisely because they can appear to be "highly functional and productive well-respected citizens"), Psychopaths (low-functioning without mirror neurons, most commonly an "obvious problem"). All of the prior hum...
This article got me thinking about a few things. Centrally, there is no necessary condition on things which really do have goal directed behaviour to maximise. Revealed preference theory in economics effectively identifies the choices people make with their preference. Thus humans are seen as maximisers of their preferences in an almost trivial sense, if preferences are also assumed to fall under a ranking (another thing economists assume). Yet this goal orientated explanation in terms of rationality could be wrong for two reasons: firstly, the choices cou...
Also, the assumption that the laser is for destruction may be false. What is the laser tags or scans blue objects? Maybe for a space probe, or a scan on enemy robots? What if the robot makes the assumption that the laser is for destruction when it's not, or vise-versa? Does it really matter, when it is programmed to do it anyway? What if the AI is destroying, or thinks it's destroying fellow robots? What if it does not want to? Maybe this is an example of religion from assumption? Because it is 'destroying' blue things, it assumes that is it's purpose. Beh...
the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area
I note that if the the robot only looks at the blue RGB component, then it will end up firing its laser not just at blue things (low R, low G, high B), but also white things (high R, high G, high B), fuchsia things (high R, low G, high B), teal things (low R, high G, high B), and variants of said colors. "Blue-minimizing" is not even a correct description!
The robot is a behavior-executor, not a utility-maximizer.
Or it only gets disutility from seeing blue objects :P
It seems like we need to taboo the word "goal" and replace it with several different things. 1. Actions. 2. Conscious (verbal) intentions (past intentions may conflict with present intentions). 3. "unconscious intentions" (this should probably be tabooed, but I haven't figured out how best to do it. Perhaps an unconscious intention is something which we do, but we don't know why or we confabulate why.).
it seems that Anna Salamon's and Hanson's interpretations could simply be viewed as changing verbal intentions and the question if how to gain goal (intention) stability.
A lot of this is discussed at length in the book The Ecological Approach to Visual Perception by Gibson. He put out a pretty radical idea that all cognition is goal driven, and perception is the action of converting sensory energy into scored models (scores are computed as anticipated experience of goal-fulfillment). Fundamentally, all thinking is planning. There's quite a bit of literature coming out on this in the fields of online learning theory and actionable information theory, in particular by Soatto at UCLA.
Curious that you worded it: "...we would be tempted to call this robot a blue-minimizer." Then you said the robot "wants". The entire discussion which followed is invalid, because everyone involved assumed human characteristics to an electronic mechanical device. Robots do not have "goals" either, nor self-motivation. IT has no concept of reducing blue objects. Your "temptation" was your sensed instinct that you were stepping over the line of limitations. Even the origin of the word "robot" meaning "forced labor" is inaccurate.
I think all of these explanations hold part of the puzzle, but that the most fundamental explanation is that the mistake began as soon as we started calling it a "blue-minimizing robot".
Your post reminded me of a quote by Eliezer Yudkowsky:
...An electron is not a billiard ball, and it’s not a crest and trough moving through a pool of water. An electron is a mathematically different sort of entity, all the time and under all circumstances, and it has to be accepted on its own terms.
The universe is not wavering between using particles and waves,
What is the difference between a smart 'shoot lasers at "blue" things' robot and a really dumb 'minimize blue' robot with a laser?
Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.
Watching the robot's behavior, we would conclude that this is a robot that destroys blue objects. Maybe it is a surgical robot that destroys cancer cells marked by a blue dye; maybe it was built by the Department of Homeland Security to fight a group of terrorists who wear blue uniforms. Whatever. The point is that we would analyze this robot in terms of its goals, and in those terms we would be tempted to call this robot a blue-minimizer: a machine that exists solely to reduce the amount of blue objects in the world.
Suppose the robot had human level intelligence in some side module, but no access to its own source code; that it could learn about itself only through observing its own actions. The robot might come to the same conclusions we did: that it is a blue-minimizer, set upon a holy quest to rid the world of the scourge of blue objects.
But now stick the robot in a room with a hologram projector. The hologram projector (which is itself gray) projects a hologram of a blue object five meters in front of it. The robot's camera detects the projector, but its RGB value is harmless and the robot does not fire. Then the robot's camera detects the blue hologram and zaps it. We arrange for the robot to enter this room several times, and each time it ignores the projector and zaps the hologram, without effect.
Here the robot is failing at its goal of being a blue-minimizer. The right way to reduce the amount of blue in the universe is to destroy the projector; instead its beams flit harmlessly through the hologram.
Again, give the robot human level intelligence. Teach it exactly what a hologram projector is and how it works. Now what happens? Exactly the same thing - the robot executes its code, which says to scan the room until its camera registers blue, then shoot its laser.
In fact, there are many ways to subvert this robot. What if we put a lens over its camera which inverts the image, so that white appears as black, red as green, blue as yellow, and so on? The robot will not shoot us with its laser to prevent such a violation (unless we happen to be wearing blue clothes when we approach) - its entire program was detailed in the first paragraph, and there's nothing about resisting lens alterations. Nor will the robot correct itself and shoot only at objects that appear yellow - its entire program was detailed in the first paragraph, and there's nothing about correcting its program for new lenses. The robot will continue to zap objects that register a blue RGB value; but now it'll be shooting at anything that is yellow.
The human-level intelligence version of the robot will notice its vision has been inverted. It will know it is shooting yellow objects. It will know it is failing at its original goal of blue-minimization. And maybe if it had previously decided it was on a holy quest to rid the world of blue, it will be deeply horrified and ashamed of its actions. It will wonder why it has suddenly started to deviate from this quest, and why it just can't work up the will to destroy blue objects anymore.
The robot goes to Quirinus Quirrell, who explains that robots don't really care about minimizing the color blue. They only care about status and power, and pretend to care about minimizing blue in order to impress potential allies.
The robot goes to Robin Hanson, who explains that there are really multiple agents within the robot. One of them wants to minimize the color blue, the other wants to minimize the color yellow. Maybe the two of them can make peace, and agree to minimize yellow one day and blue the next?
The robot goes to Anna Salamon, who explains that robots are not automatically strategic, and that if it wants to achieve its goal it will have to learn special techniques to keep focus on it.
I think all of these explanations hold part of the puzzle, but that the most fundamental explanation is that the mistake began as soon as we started calling it a "blue-minimizing robot". This is not because its utility function doesn't exactly correspond to blue-minimization: even if we try to assign it a ponderous function like "minimize the color represented as blue within your current visual system, except in the case of holograms" it will be a case of overfitting a curve. The robot is not maximizing or minimizing anything. It does exactly what it says in its program: find something that appears blue and shoot it with a laser. If its human handlers (or itself) want to interpret that as goal directed behavior, well, that's their problem.
It may be that the robot was created to achieve a specific goal. It may be that the Department of Homeland Security programmed it to attack blue-uniformed terrorists who had no access to hologram projectors or inversion lenses. But to assign the goal of "blue minimization" to the robot is a confusion of levels: this was a goal of the Department of Homeland Security, which became a lost purpose as soon as it was represented in the form of code.
The robot is a behavior-executor, not a utility-maximizer.
In the rest of this sequence, I want to expand upon this idea. I'll start by discussing some of the foundations of behaviorism, one of the earliest theories to treat people as behavior-executors. I'll go into some of the implications for the "easy problem" of consciousness and philosophy of mind. I'll very briefly discuss the philosophical debate around eliminativism and a few eliminativist schools. Then I'll go into why we feel like we have goals and preferences and what to do about them.