EDIT: My original post was wrong. I will leave it quoted at the end for the purposes of preserving information, but it is now replaced with a new post that correctly expresses my sentiments. The original title of this post was "expected utility maximization is not rational".
As many people are probably aware, there is a theorem, called the Von Neumann-Morgenstern utility theorem, which states that anyone expressing consistent preferences must be maximizing the expected value of some function. The definition of consistent preferences is as follows:
Let A, B, and C be probability distributions over outcomes. Let A < B denote that B is preferred to A, and A = B denote that someone is indifferent between A and B. Then we assume
- Either A < B, A > B, or A = B. In other words, you have to express a preference. This is reasonable because in the real world, you always have to make a decision (even "lack of action" is a decision).
- If A < B, and B < C, then A < C. I believe that this is also clearly reasonable. If you have three possible actions, leading to distributions over outcomes A, B, and C, then you have to choose one of the three, meaning one of them is always preferred. So you can't have cycles of preferences.
- If A < B, then (1-x)A+xC < B for some x in (0,1) that is allowed to depend on A, B, and C. In other words, if B is preferred to A then B is also preferred to sufficiently small changes to A.
- If A < B then pA+(1-p)C < pB+(1-p)C for all p in (0,1). This is the least intuitive of the four axioms to me, and the one that I initially disagreed with. But I believe that you can argue in favor of it as follows: I flip a coin with weight p, and draw from X if p is heads and C if p is tails. I let you choose whether you want x to be A or B. It seems clear that if you prefer B to A, then you should choose B in this situation. However, I have not thought about this long enough to be completely sure that this is the case. Most other people seem to also think this is a reasonable axiom, so I'm going to stick with it for now.
Given these axioms, we can show that there exists a real-valued function u over outcomes such that A < B if and only if EA[u] < EB[u], where EX is the expected value with respect to the distribution X.
Now, the important thing to note here is that this is an existence proof only. The function u doesn't have to look at all reasonable, it merely assigns a value to every possible outcome (in particular, even if E1 and E2 seem like completely unrelated events, there is no reason as far as I can tell why u([E1 and E2]) has to have anything to do with u(E1)+u(E2), for instance. Among other things, u is only defined up to an additive constant and so not only is there no reason to be true, it will be completely false for almost all possible utility functions, *even if you keep the person whose utility you are considering fixed*.
In particular, it seems ridiculous that we would worry about an outcome that only occurs with probability 10-100. What this actually means is that our utility function is always much smaller than 10100, or rather that the ratio of the difference in utility between trivially small changes in outcome and arbitrarily large changes in outcome is always much larger than 10-100. This is how to avoid issues like Pascal's mugging, even in the least convenient possible world (since utility is an abstract construction, no universe can "make" a utility function become unbounded).
What this means in particular is that saying that someone must maximize expected utility to be rational is not very productive. In particular, unless the other person has a sufficiently good technical grasp of what this means, they will probably do the wrong thing. Also, unless *you* have a good technical grasp of what it means, something that appears to violated expected utility might not. Remember, because utility is an artificial construct that has no reason to look reasonable, someone with completely reasonable preferences could have a very weird-*looking* utility function. Instead of telling people to maximize expected utility, we should identify which of the four above axioms they are violating, then explain why they are being irrational (or, if the purpose is to educate in advance, explain to them why the four axioms above should be respected). [Note however that just because a perfectly rational person *always* satisfies the above axioms, doesn't mean that you will be better off if you satisfy the above axioms more often. Your preferences might have a complicated cycle that you are unsure how to correctly resolve. Picking a resolution at random is unlikely to be a good idea.]
Now, utility is this weird function that we don't understand at all. Then why does it seem like there's something called utility that **both** fits our intuitions and that people should be maximizing? The answer is that in many cases utility *can* be equated with something like money + risk aversion. The reason why is due to the law of large numbers, formalized through various bounds such as Hoeffding's inequality and the Chernoff bound, as well as more powerful arguments likeconcentration of measure. What these arguments say is that if you have a large number of random variables that are sufficiently uncorrelated and that have sufficiently small standard deviation relative to the mean, then with high probability their sum is very close to their expected sum. So when our variables all have means that are reasonable close to each other (as is the case for most every day events), we can say something like the total *monetary* value of our combined actions will be very close to the sum of the expected monetary values of our individual actions (and likewise for other quantities like time). So in situations where, e.g., your goal is to spend as little time on undesirable work as possible, you want to minimize expected time spent on undesirable work, **as a heuristic that holds in most practical cases**. While this might make it *look* like your utility function is time in this case, I believe that the resemblance is purely coincidental, and you certainly shouldn't be willing to make very low-success-rate gambles with large time payoffs.
Old post:
I'm posting this to the discussion because I don't plan to make a detailed argument, mainly because I think this point should be extremely clear, even though many people on LessWrong seem to disagree with me.
Maximizing expected utility is not a terminal goal, it is a useful heuristic. To see why always maximizing expected utility is clearly bad, consider an action A with a 10-10 chance of giving you 10100 units of utility, and a 1-10-10 chance of losing you 1010 units of utility. Then expected utility maximization requires you to perform A, even though it is obviously a bad idea. I believe this has been discussed here previously as Pascal's mugging.
For some reason, this didn't lead everyone to the obvious conclusion that maximizing expected utility is the wrong thing to do, so I'm going to try to dissolve the issue by looking at why we would want to maximize expected utility in most situations. I think once this is accomplished it will be obvious why there is no particular reason to maximize expected utility for very low-probability events (in fact, one might consider having a utility function over probability distributions rather than actual states of the world).
The reason that you normally want to maximize expected utility is because of the law of large numbers, formalized through various bounds such as Hoeffding's inequality and the Chernoff bound, as well as more powerful arguments like concentration of measure. What these arguments say is that if you have a large number of random variables that are sufficiently uncorrelated and that have sufficiently small variance relative to the mean, then with high probability their sum is very close to their expected sum. Thus for events with probabilities that are bounded away from 0 and 1 you always expect your utility to be very close to your expected utility, and should therefore maximize expected utility in order to maximize actual utility. But once the probabilities get small (or the events correlated, e.g. you are about to make an irreversible decision), these bounds no longer hold and the reasons for maximizing expected utility vanish. You should instead consider what sort of distribution over outcomes you find desirable.
I used to think I understood this stuff, but now jsteinhardt has me confused. Could you, or someone else familiar with economic orthodoxy, please tell me whether the following is a correct summary of the official position?
A lottery ticket offers one chance in a thousand to win a prize of $1,000,000. The ticket has an expected value of $1000. If you turn down a chance to purchase such a ticket for $900 you are said to be money risk averse.
A rational person can be money risk averse.
The "explanation" for this risk aversion in a rational person is that the person judges that money has decreasing marginal utility with wealth. That is, the person (rationally) judges that $1,000,000 is not 1000 times as good (useful) as $1000. An extra dollar means less to a rich man, than to a poor man.
This shifting relationship between money and utility can be expressed by a "utility function". For example, it may be the case for this particular rational individual that one util corresponds to $1. But $1000 corresponds to 800 utils and $1,000,000 corresponds to 640,000 utils.
And the rationality of not buying the lottery ticket can be seen by considering the transaction in utility units. The ticket costs 800 utils, but the expected utility of the ticket is only 640 utils. A rational, expected utility maximizing agent will not play this lottery.
ETA: One thing I forgot to insert at this point. How do we create a utility function for an agent? I.e. how do we know that $1,000,000 is only worth 640,000 utils to him. We do so by offering a lottery ticket paying $1,000,000 and then adjusting the odds until he is willing to pay $1 (equal to 1 util by definition) for the ticket. In this case, he buys the ticket when the odds improve to 640,000 to 1.
Now imagine a lottery paying 1,000,000 utils, again with 0.001 probability of winning. The ticket costs 900 utils. An agent who turns down the chance to buy this ticket could be called utility risk averse.
An agent who is utility risk averse is irrational. By definition. Money risk aversion can be rational, but that is explained by diminishing utility of money. There is no such thing as diminishing utility of utility.
That is my understanding of the orthodox position. Now, the question that jsteinhardt asks is whether it is not time to challenge that orthodoxy. In effect, he is asking us to change our definition of "rational". (It is obvious, of course, that humans are not always "rational" by this definition - it is even true that they have biases which make them systematically deviate from rationality, for reasons which seem reasonable to them. But this, by itself, is not reason to change our definition of "rationality".)
Recall that the way we rationalized away money risk aversion was to claim that money units become less useful as our wealth increases. Is there some rationalization which shows that utility units become less pleasing as happiness increases? Strikes me as a question worth looking into.
That's the issue of the usefulness of the Axiom of Independence - I believe.
You can drop that - though you are still usually left with expected utility maximisation.