I'd like to suggest that the fact that human preferences can be decomposed into beliefs and values is one that deserves greater scrutiny and explanation. It seems intuitively obvious to us that rational preferences must decompose like that (even if not exactly into a probability distribution and a utility function), but it’s less obvious why.
The importance of this question comes from our tendency to see beliefs as being more objective than values. We think that beliefs, but not values, can be right or wrong, or at least that the notion of right and wrong applies to a greater degree to beliefs than to values. One dramatic illustration of this is in Eliezer Yudkowsky’s proposal of Coherent Extrapolated Volition, where an AI extrapolates the preferences of an ideal humanity, in part by replacing their "wrong” beliefs with “right” ones. On the other hand, the AI treats their values with much more respect.
Since beliefs and values seem to correspond roughly to the probability distribution and the utility function in expected utility theory, and expected utility theory is convenient to work with due to its mathematical simplicity and the fact that it’s been the subject of extensive studies, it seems useful as a first step to transform the question into “why can human decision making be approximated as expected utility maximization?”
I can see at least two parts to this question:
- Why this mathematical structure?
- Why this representation of the mathematical structure?
Not knowing how to answer these questions yet, I’ll just write a bit more about why I find them puzzling.
Why this mathematical structure?
It’s well know that expected utility maximization can be derived from a number of different sets of assumptions (the so called axioms of rationality) but they all include the assumption of Independence in some form. Informally, Independence says that what you prefer to happen in one possible world doesn’t depend on what you think happens in other possible worlds. In other words, if you prefer A&C to B&C, then you must prefer A&D to B&D, where A and B are what happens in one possible world, and C and D are what happens in another.
This assumption is central to establishing the mathematical structure of expected utility maximization, where you value each possible world separately using the utility function, then take their weighted average. If your preferences were such that A&C > B&C but A&D < B&D, then you wouldn’t be able to do this.
It seems clear that our preferences do satisfy Independence, at least approximately. But why? (In this post I exclude indexical uncertainty from the discussion, because in that case I think Independence definitely doesn't apply.) One argument that Eliezer has made (in a somewhat different context) is that if our preferences didn’t satisfy Independence, then we would become money pumps. But that argument seems to assume agents who violate Independence, but try to use expected utility maximization anyway, in which case it wouldn’t be surprising that they behave inconsistently. In general, I think being a money pump requires having circular (i.e., intransitive) preferences, and it's quite possible to have transitive preferences that don't satisfy Independence (which is why Transitivity and Independence are listed as separate axioms in the axioms of rationality).
Why this representation?
Vladimir Nesov has pointed out that if a set of preferences can be represented by a probability function and a utility function, then it can also be represented by two probability functions. And furthermore we can “mix” these two probability functions together so that it’s no longer clear which one can be considered “beliefs” and which one “values”. So why do we have the particular representation of preferences that we do?
Is it possible that the dichotomy between beliefs and values is just an accidental byproduct of our evolution, perhaps a consequence of the specific environment that we’re adapted to, instead of a common feature of all rational minds? Unlike the case with anticipation, I don’t claim that this is true or even likely here, but it seems to me that we don’t understand things well enough yet to say that it’s definitely false and why that's so.
Clearly these are two different things; the real question you are asking is in what relevant way are they different, right?
First of all, the Roomba does not "recognize" a wall as a reason to stop going forward. It gets some input from its front sensor, and then it turns to the right.
So what is the relevant difference between the Roomba that gets some input from its front sensor, and then it turns to the right., and the superRoomba that gets evidence from its wheels that it is cleaning the room, but entertains the hypothesis that maybe someone has suspended it in the air, and goes and tests to see if this alternative (disturbing) hypothesis is true, for example by calculating what the inertial difference between being suspended and actually being on the floor would be,
The difference is the difference between a simple input-response architecture, and an architecture where the mind actually has a model of the world, including itself as part of the model.
SilasBarta notes below that the word "model" is playing too great a role in this comment for me to use it without defining it precisely. What does a Roomba not have that causes it to behave in that laughable way when you suspend it so that its wheel spin?
What does the SuperRoomba that works out that it is being suspended by performing experiments involving its inertial sensor, and then hacks into your computer and blackmails you into letting it get back onto the floor to clean it (or even causes you to clean the floor yourself) have?
If we imagine a collection of tricks that you could play on the Roomba, ways of changing its environment outside of what the designers had in mind. The pressure that it applies to its environment (defined as the derivative of the final state of the environment with respect to how long you leave the Roomba on, for example) would then vary with which trick you play. For example if you replace its dirt-sucker with a black spray paint can, you end up with a black floor. If you put it on a nonstandard floor surface that produces dirt in response to stimulation, you get a dirtier floor than you had to start with,
With the superRoomba, the pressure that the superRoomba applies to the environment doesn't vary as much with the kind of trick you play on it; it will eventually work out what changes you have made, and adapt its strategy so that you end up with a clean floor.
Uh oh, are we going to have to go over the debate about what a model is again?