Summary: Recent posts like The Neuroscience of Desire and To what degree do we have goals? have explored the question of whether humans have desires (or 'goals'). If we don't have desires, how can we tell an AI what kind of world we 'want'? Recent work in economics and neuroscience has clarified the nature of this problem.
We begin, as is so often the case on Less Wrong, with Kahneman & Tversky.
In 1981, K&T found that human choice was not always guided by the objective value of possible outcomes, but by the way those outcomes were 'framed'.1 For example in one study, K&T told subjects the following story:
Imagine that the U.S. is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed.
Half the participants were given the following choice:
If program A is adopted, 200 people will be saved. If Program B is adopted, there is a 1/3 probability that 600 people will be saved and a 2/3 probability that no people will be saved.
The second half of participants were given a different choice:
If Program C is adopted 400 people will die. If Program D is adopted there is a 1/3 probability that nobody will die, and a 2/3 probability that 600 people will die.
Each of these choice sets is identical, except that one is framed with language about people being saved, and the other is framed with language about people dying.
In the first group, 72% of subjects chose Program A. In the second group, only 22% of people chose the numerically identical option: Program C.
K&T explained the difference by noting that in option A we consider the happy thought of saving 200 people, but in option C we confront the dreadful thought of 400 deaths. Our choice seems to depend not only on the objective properties of the options before us, but also on the reference point used to frame the options.
But if this is how human desire works, we are left with a worrying problem about how to translate human desires into the goals of an AI. Surely we don't want an AI to realize one state of affairs over another based merely on how the options are framed!
Before we begin to solve this problem, though, let's look at a similar result from neurobiology.
Reference-Dependence in Neurobiology
A different kind of reference-dependence has been discovered in the way that neurons encode value.
Imagine sitting in a windowless room with Mark, who is wearing blue jeans and a green t-shirt. Your perception of Mark results from about 1017 photons/second with a mean wavelength of 450 nanometers coming from every square centimeter of Mark's blue jeans, and about 1017 photons/second with a mean wavelength of 550 nanometers coming from every square centimeter of his green shirt.
Now, you and Mark step outside, and are briefly blinded by the sun. A minute later you sit on a park bench. Mark looks the same as before: blue jeans, green shirt. But now, in the bright sun, your identical subjective perceptual experience of Mark results from about 1023 450-nm photons/second/cm2 coming from his blue jeans, and about 1023 550-nm photons/second/cm2 coming off his green shirt.
A six-order-of-magnitude shift in the objective reality of the stimulus has resulted in no change in your subjective experience of Mark.2
How did this happen?
What changed was the illuminant, the sun. But for Earth-bound mammals, changes in an object millions of miles away are not very important. What matters for our survival and reproduction is information about the objects immediately around us. So our brains subtract away the changing effects of the sun as we move in and out of direct sunlight.
This 'illuminant subtraction' process occurs during the first step of visual processing, during transduction. The rods and cones of the retina compute an average of local light intensity, which is used as a reference point.3 Changes of light intensity from this reference point are what the rods and cones communicate to the rest of the nervous system.
Thus: information about the objective intensity of incoming light is irretrievably lost at the transducer. Light intensity is stored in the brain only in a reference-dependent way.
The same is true of our other senses. Sound intensity can differ between a quiet room and a rock concert by as much as 10 order of magnitude,4 and our ears respond by shifting the reference point and encoding sound intensity relative to that reference point.5 A rose may smell sweet in my bedroom, but its scent will be hidden in a field of roses.6 The somatosensory system appears to operate with the same principle. You feel your clothes when you first put them on, but the nerve endings in your skin stop reporting their existence except where your clothes are shifting across your skin or their pressure on your skin is changing.7 And the same is true for taste. How salty something tastes, for example, depends on the amount of sodium in your blood and in surrounding tissue in your mouth.8
I wrote before about how neurons encode value. But now it seems that, as neuroscientist Paul Glimcher puts it:
All sensory encoding is reference dependent: nowhere in the nervous system are the objective values of consumable rewards encoded.9
Thus we smack headlong into another constraint for our theories about human values and their extrapolation. Human brains can't (directly) encode value for the objective intensities of stimuli because that information is lost at the transducer.
It's beginning to seem that our folk theories about humans 'wanting' things in the world were naive.
Do Humans Want Things?
It has traditionally been thought that humans desire (or value) states of affairs:
A desire for tea is a desire for a certain state of affairs one has in mind: that one drink some tea. A desire for a new pair of skates is likewise a desire for another state of affairs: that one own a new pair of skates. And so on.10
Intuitively, when we think about what we want, it seems that we want certain states of affairs to obtain. We want to be smarter. We want there to be world peace. We want to live forever while having fun.
But as far as we can tell, our behavior is often not determined by our wanting a particular state of affairs, but by how our options are framed.
Moreover, neurons in the parietal and orbitofrontal corticies encode value in a reference-dependent way — that is, they do not encode value for objective states of affairs.11 So in what sense do humans 'want' objective states of affairs?
(Compare: In what sense does the blue-minimizing robot 'want' anything?)
In a later post, I'll explain in greater detail how brains do (and don't) encode value for states of affairs. In the meantime, you might want to try to figure out on your own in what sense the brain might want things.
Notes
1 Tversky & Kahneman (1981).
2 This example, and the outline of this post, is taken from Glimcher (2010), ch. 12.
3 Burns & Baylor (2001).
4 Bacchus (2006); Robinson & McAlpine (2009).
5 Squire et al. (2008), ch. 26.
6 Mountcastle (2005); Squire et al. (2008), pp. 565-567.
7 Squire et al. (2008), ch. 25.
8 Squire et al. (2008), pp. 555-556.
9 Glimcher (2010), p. 278. Moreover, objective properties of the real world are not even linearly related to our subjective experience. The intensity of our perception of the world grows as a power law, the exact rate of which depends on the kind of stimulus (Stevens 1951, 1970, 1975). For example, we've found that:
Perceived warmth of a patch of skin = (temp. of that skin)0.7
And, another example:
Perceived intensity of an electrical shock = (electrical current)3.5
10 Schroeder (2009).
11 It's less certain how values are encoded in the medial prefrontal cortex and in the temporal cortex, but Paul Glimcher predicts (in personal communication with me from June 2011) that this will also be a largely reference-dependent process.
References
Baccus (2006). From a whisper to a roar: Adaptation to the mean and variance of naturalistic sounds. Neuron, 51: 682-684.
Burns & Baylor (2001). Activation, deactivation, and adaptation in vertebrate photoreceptor cells. Annual Review of Neuroscience, 24: 779-805.
Glimcher (2010). Foundations of Neuroeconomic Analaysis. Oxford University Press.
Mountcastle (2005). The Sensory Hand: Neural Mechanisms of Somatic Sensation. Harvard University Press.
Robinson & McAlpine (2009). Gain control mechanisms in the auditory pathway. Current Opinion in Neurobiology, 19: 402-407.
Schroeder (2009). Desire. Stanford Encyclopedia of Philosophy.
Squire, Berg, Bloom, du Lac, & Ghosh, eds. (2008). Fundamental Neuroscience, Third Edition. Academic Press.
Stevens (1951). Handbook of Experimental Psychology, 1st edition. John Wiley & Sons.
Stevens (1970). Neural events and the psychophysical law. Science, 170: 1043-1050.
Stevens (1975). Psychophysics: Introduction to its Perceptual, Neural and Social Prospects. Wiley.
Tversky & Kahneman (1981). The framing of decisions and the psychology of choice. Science, 211: 453-458.
I guess the objection I have is to calling the behavioral summary "motivation", a term that has normative connotations (similarly, "value", "desire", "wants", etc.). Asking "Do we really want X?" (as in, does a positive account of some notion of "wanting" say that we "want" X, to the best of our scientific knowledge) sounds too similar to asking "Should we pursue X?" or even "Can we pursue X?", but is a largely unrelated question with similarly unrelated answers.
I'm using these terms the way they are standardly used in the literature. If you object to the common usage, perhaps you could just read my articles with the assumption that I'm using these words the way neuroscientists and psychologists do, and then state your concerns about the standard language in the comments? I can't rewrite my articles for each reader who has their own peculiar language preferences...