Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice. And suppose that upon reflection we would clearly reject the outputs of two of these systems, whereas the third system looks something more like a utility function we might be able to use in CEV.
What I just described is part of the leading theory of choice in the human brain.
Recall that human choices are made when certain populations of neurons encode expected subjective value (in their firing rates) for each option in the choice set, with the final choice being made by an argmax or reservation price mechanism.
Today's news is that our best current theory of human choices says that at least three different systems compute "values" that are then fed into the final choice circuit:
-
The model-based system "uses experience in the environment to learn a model of the transition distribution, outcomes and motivationally-sensitive utilities." (See Sutton & Barto 1998 for the meanings of these terms in reinforcement learning theory.) The model-based system also "infers choices by... building and evaluating the search decision tree to work out the optimal course of action." In short, the model-based system is responsible for goal-directed behavior. However, making all choices with a goal-directed system using something like a utility function would be computationally prohibitive (Daw et al. 2005), so many animals (including humans) first evolved much simpler methods for calculating the subjective values of options (see below).
-
The model-free system also learns a model of the transition distribution and outcomes from experience, but "it does so by caching and then recalling the results of experience rather than building and searching the tree of possibilities. Thus, the model-free controller does not even represent the outcomes... that underlie the utilities, and is therefore not in any position to change the estimate of its values if the motivational state changes. Consider, for instance, the case that after a subject has been taught to press a lever to get some cheese, the cheese is poisoned, so it is no longer worth eating. The model-free system would learn the utility of pressing the lever, but would not have the informational wherewithal to realize that this utility had changed when the cheese had been poisoned. Thus it would continue to insist upon pressing the lever. This is an example of motivational insensitivity."
-
The Pavlovian system, in contrast, calculates values based on a set of hard-wired preparatory and consummatory "preferences." Rather than calculate value based on what is likely to lead to rewarding and punishing outcomes, the Pavlovian system calculates values consistent with automatic approach toward appetitive stimuli, and automatic withdrawal from aversive stimuli. Thus, "animals cannot help but approach (rather than run away from) a source of food, even if the experimenter has cruelly arranged things in a looking-glass world so that the approach appears to make the food recede, whereas retreating would make the food more accessible (Hershberger 1986)."
Or, as Jandila put it:
- Model-based system: Figure out what's going on, and what actions maximize returns, and do them.
- Model-free system: Do the thingy that worked before again!
- Pavlovian system: Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.
In short:
We have described three systems that are involved in making choices. Even in the case that they share a single, Platonic, utility function for outcomes, the choices they express can be quite different. The model-based controller comes closest to being Platonically appropriate... The choices of the model-free controller can depart from current utilities because it has learned or cached a set of values that may no longer be correct. Pavlovian choices, though determined over the course of evolution to be appropriate, can turn out to be instrumentally catastrophic in any given experimental domain...
[Having multiple systems that calculate value] is [one way] of addressing the complexities mentioned, but can lead to clashes between Platonic utility and choice. Further, model-free and Pavlovian choices can themselves be inconsistent with their own utilities.
We don't yet know how choice results from the inputs of these three systems, nor how the systems might interact before they deliver their value calculations to the final choice circuit, nor whether the model-based system really uses anything like a coherent utility function. But it looks like the human might have a "hidden" utility function that would reveal itself if it wasn't also using the computationally cheaper model-free and Pavlovian systems to help determine choice.
At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system's preferences as representing "my values," and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.
On the other hand, as Eliezer points out, perhaps we ought to be suspicious of this, because "it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone's going to jump up and say: 'Ha ha! Love and friendship were actually in the other two!'"
Unfortunately, it's too early to tell whether these results will be useful for CEV. But it's a little promising. This is the kind of thing that sometimes happens when you hack away at the edges of hard problems. This is also a repeat of the lesson that "you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it."
(For pointers to the relevant experimental data, and for an explanation of the mathematical role of each valuation system in the brain's reinforcement learning system, see Dayan (2011). All quotes in this post are from that chapter, except for the last one.)
In these terms, the plan I see as the most promising is that the correct way of extracting preferences from humans that doesn't require further "extrapolation" falls out of decision theory.
(Not sure what you meant by Drescher's option (what's "response to preferences"?): does the book suggest that it's unnecessary to use humans as utility definition material? In any case, this doesn't sound like something he would currently believe.)
As I recall, Drescher still used humans as utility definition material but thought that there might be a single correct response to these utilities — one which falls out of decision theory and game theory.