VNM expected utility theory: uses, abuses, and interpretation

21 Academian 17 April 2010 08:23PM

When interpreted convservatively, the von Neumann-Morgenstern rationality axioms and utility theorem are an indispensible tool for the normative study of rationality, deserving of many thought experiments and attentive decision theory.  It's one more reason I'm glad to be born after the 1940s. Yet there is apprehension about its validity, aside from merely confusing it with Bentham utilitarianism (as highlighted by Matt Simpson).  I want to describe not only what VNM utility is really meant for, but a contextual reinterpretation of its meaning, so that it may hopefully be used more frequently, confidently, and appropriately.

  1. Preliminary discussion and precautions
  2. Sharing decision utility is sharing power, not welfare
  3. Contextual Strength (CS) of preferences, and VNM-preference as "strong" preference
  4. Hausner (lexicographic) decision utility
  5. The independence axiom isn't bad either
  6. Application to earlier LessWrong discussions of utility

1.  Preliminary discussion and precautions

The idea of John von Neumann and Oskar Mogernstern is that, if you behave a certain way, then it turns out you're maximizing the expected value of a particular function.  Very cool!  And their description of "a certain way" is very compelling: a list of four, reasonable-seeming axioms.  If you haven't already, check out the Von Neumann-Morgenstern utility theorem, a mathematical result which makes their claim rigorous, and true.

VNM utility is a decision utility, in that it aims to characterize the decision-making of a rational agent.  One great feature is that it implicitly accounts for risk aversion: not risking $100 for a 10% chance to win $1000 and 90% chance to win $0 just means that for you, utility($100) > 10%utility($1000) + 90%utility($0). 

But as the Wikipedia article explains nicely, VNM utility is:

  1. not designed to predict the behavior of "irrational" individuals (like real people in a real economy);
  2. not designed to characterize well-being, but to characterize decisions;
  3. not designed to measure the value of items, but the value of outcomes;
  4. only defined up to a scalar multiple and additive constant (acting with utility function U(X) is the same as acting with a·U(X)+b, if a>0);
  5. not designed to be added up or compared between a number of individuals;
  6. not something that can be "sacrificed" in favor of others in a meaningful way.

[ETA]  Additionally, in the VNM theorem the probabilities are understood to be known to the agent as they are presented, and to come from a source of randomness whose outcomes are not significant to the agent.  Without these assumptions, its proof doesn't work.

Because of (4), one often considers marginal utilities of the form U(X)-U(Y), to cancel the ambiguity in the additive constant b.  This is totally legitimate, and faithful to the mathematical conception of VNM utility.

Because of (5), people often "normalize" VNM utility to eliminate ambiguity in both constants, so that utilities are unique numbers that can be added accross multiple agents.  One way is to declare that every person in some situation values $1 at 1 utilon (a fictional unit of measure of utility), and $0 at 0.  I think a more meaningful and applicable normalization is to fix mean and variance with respect to certain outcomes (next section).

Because of (6), characterizing the altruism of a VNM-rational agent by how he sacrifices his own VNM utility is the wrong approach.  Indeed, such a sacrifice is a contradiction.  Kahneman suggests1, and I agree, that something else should be added or substracted to determine the total, comparative, or average well-being of individuals.  I'd call it "welfare", to avoid confusing it with VNM utility.  Kahneman calls it E-utility, for "experienced utility", a connotation I'll avoid.  Intuitively, this is certainly something you could sacrifice for others, or have more of compared to others.  True, a given person's VNM utility is likely highly correlated with her personal "welfare", but I wouldn't consider it an accurate approximation. 

So if not collective welfare, then what could cross-agent comparisons or sums of VNM utilities indicate?  Well, they're meant to characterize decisions, so one meaningful application is to collective decision-making:

continue reading »

Maximise Expected Utility, not Expected Perception of Utility

12 JGWeissman 26 March 2010 04:39AM

Suppose we are building an agent, and we have a particular utility function U over states of the universe that we want the agent to optimize for. So we program into this agent a function CalculateUtility that computes the value of U given its current knowledge. Then we can program it to make decisions by searching through its available actions for the one that maximizes its expectation for its result of running CalculateUtility. But wait, how will an agent with this programming behave?

Suppose the agent has the opportunity (option A) to arrange to falsely believe the universe is in a state that is worth utility uFA but this action really leads to a different state worth utility uTA, and a competing opportunity (option B) to actually achieve a state of the universe that has utility uB, with uTA < uB < uFA. Then the agent will expect that if it takes option A that its CalculateUtility function will return uFA, and if it takes option B that its CalculateUtility function will return uB. uFA > uB, so the agent takes option A, and achieves a states of the universe with utility uTA which is worse than the utility uB it could have achieved if it had taken option B. This agent is not a very effective optimization process1. It would rather falsely believe that it has achieved its goals than actually achieve its goals. This sort of problem2 is known as wireheading.

Let us back up a step, and instead program our agent to make decisions by searching through its available actions for the one whose expected results maximizes its current calculation of CalculateUtility. Then, the agent would calculate that option A gives it expected utility uTA and option B gives it expected utility uB. uB > uTA, so it chooses option B and actually optimizes the universe. That is much better.

So, if you care about states of the universe, and not just your personal experience of maximizing your utility function, you should make choices that maximize your expected utility, not choices that maximize your expectation of perceived utility.

 


1. We might have expected this to work, because we built our agent to have beliefs that correspond to the actual state of the world.

 

2. A similar problem occurs if the agent has the opportunity to modify its CalculateUtility function, so it returns large values for states of the universe that would have occurred anyways (or any state of the universe).

Applying utility functions to humans considered harmful

26 Kaj_Sotala 03 February 2010 07:22PM

There's a lot of discussion on this site that seems to be assuming (implicitly or explicitly) that it's meaningful to talk about the utility functions of individual humans. I would like to question this assumption.

To clarify: I don't question that you couldn't, in principle, model a human's preferences by building this insanely complex utility function. But there's an infinite amount of methods by which you could model a human's preferences. The question is which model is the most useful, and which models have the least underlying assumptions that will lead your intuitions astray.

Utility functions are a good model to use if we're talking about designing an AI. We want an AI to be predictable, to have stable preferences, and do what we want. It is also a good tool for building agents that are immune to Dutch book tricks. Utility functions are a bad model for beings that do not resemble these criteria.

continue reading »

Are wireheads happy?

108 Yvain 01 January 2010 04:41PM

Related to: Utilons vs. Hedons, Would Your Real Preferences Please Stand Up

And I don't mean that question in the semantic "but what is happiness?" sense, or in the deep philosophical "but can anyone not facing struggle and adversity truly be happy?" sense. I mean it in the totally literal sense. Are wireheads having fun?

They look like they are. People and animals connected to wireheading devices get upset when the wireheading is taken away and will do anything to get it back. And it's electricity shot directly into the reward center of the brain. What's not to like?

Only now neuroscientists are starting to recognize a difference between "reward" and "pleasure", or call it "wanting" and "liking". The two are usually closely correlated. You want something, you get it, then you feel happy. The simple principle behind our entire consumer culture. But do neuroscience and our own experience really support that?

continue reading »

In conclusion: in the land beyond money pumps lie extreme events

4 Stuart_Armstrong 23 November 2009 03:03PM

In a previous article I've demonstrated that you can only avoid money pumps and arbitrage by using the von Neumann-Morgenstern axioms of expected utility. I argued in this post that even if you're not likely to face a money pump on one particular decision, you should still use expected utility (and sometimes expected money), because of the difficulties of combining two decision theories and constantly being on the look-out for which one to apply.

Even if you don't care about (weak) money pumps, expected utility sneaks in under much milder conditions. If you have a quasi-utility function (i.e. you have an underlying utility function, but you also care about the shape of the probability distribution), then this post demonstrates that you should generally stick with expected utility anyway, just by aggregating all your decisions.

So the moral of looking at money pumps, arbitrage and aggregation is that you should use expected utility for nearly all your decisions.

But the moral says exactly what it says, and nothing more.

continue reading »

Consequences of arbitrage: expected cash

5 Stuart_Armstrong 13 November 2009 10:32AM

I prefer the movie Twelve Monkeys to Akira. I prefer Akira to David Attenborough's Life in the Undergrowth. And I prefer David Attenborough's Life in the Undergrowth to Twelve Monkeys.

I have intransitive preferences. But I don't suffer from this intransitivity. Up until the moment I'm confronted by an avatar of the money pump, juggling the three DVD boxes in front of me with a greedy gleam in his eye. He'll arbitrage me to death unless I snap out of my intransitive preferences and banish him by putting my options in order.

Arbitrage, in the broadest sense, means picking up free money - money that is free because of other people's preferences. Money pumps are a form of arbitrage, exploiting the lack of consistency, transitivity or independence in people's preferences. In most cases, arbitrage ultimately destroys itself: people either wise up to the exploitation and get rid of their vulnerabilities, or lose all their money, leaving only players who are not vulnerable to arbitrage. The crash and burn of the Long-Term Capital Management hedge fund was due in part to the diminishing returns of their arbitrage strategies.

Most humans to not react to the possibility of being arbitraged by changing their whole preference systems. Instead they cling to their old preferences as much as possible, while keeping a keen eye out to avoid being taken advantage of. They keep their inconsistent, intransitive, dependent systems but end up behaving consistently, transitively and independently in their most common transactions.

The weaknesses of this approach are manifest. Having one system of preferences but acting as if we had another is a great strain on our poor overloaded brains. To avoid the arbitrage, we need to scan present and future deals with great keenness and insight, always on the lookout for traps. Since transaction costs shield us from most of the negative consequences of imperfect decision theories, we have to be especially vigilant as transaction costs continue to drop, meaning that opportunities to be arbitraged will continue to rise in future. Finally, how we exit the trap of arbitrage depends on how we entered it: if my juggling Avatar had started me on Life in the Undergrowth, I'd have ended up with Twelve Monkeys, and refused the next trade. If he'd started me on Twelve Monkeys, I've had ended up with Akira. These may not have been the options I'd have settled on if I'd taken the time to sort out my preferences ahead of time.

continue reading »

Money pumping: the axiomatic approach

12 Stuart_Armstrong 05 November 2009 11:23AM

This post gets somewhat technical and mathematical, but the point can be summarised as:

  • You are vulnerable to money pumps only to the extent to which you deviate from the von Neumann-Morgenstern axioms of expected utility.

In other words, using alternate decision theories is bad for your wealth.

But what is a money pump? Intuitively it is a series of trades that I propose to you, that end up bringing you back to where you started. All the trades must be indifferent or advantageous to you, so that you will accept them. And if even one of those trades is advantageous, then this is a money pump: I can charge you a tiny amount for that trade, making free money out of you. You are now strictly poorer than if you had not accepted the tradesat all.

A strict money pump happens when every deal is advantageous to you, not simply indifferent. In most situations, there is no difference between a money pump and a strict money pump: I can offer you a tiny trinket at each indifferent deal to make it advantageous, and get these back later. There are odd preference systems out there, though, so the distinction is needed.

The condition "bringing you back to where you started" needs to be examined some more. Thus define:

A strong money pump is a money pump which returns us both to exactly the same situations as when we started: in possession of the same assets and lotteries, with none of them having come due in the meantime.

A weak money pump is a money pump that returns us to the same situation that would have happened if we had never traded at all. Lotteries may have come due in the course of the trades.

continue reading »

Post retracted: If you follow expected utility, expect to be money-pumped

0 Stuart_Armstrong 29 October 2009 12:06PM

This post has been retracted because it is in error. Trying to shore it up just involved a variant of the St Petersburg Paradox and a small point on pricing contracts that is not enough to make a proper blog post.

I apologise.

Edit: Some people have asked that I keep the original up to illustrate the confusion I was under. I unfortunately don't have a copy, but I'll try and recreate the idea, and illustrate where I went wrong.

The original idea was that if I were to offer you a contract L that gained £1 with 50% probability or £2 with 50% probability, then if your utility function wasn't linear in money, you would generally value L at having a value other that £1.50. Then I could sell or buy large amounts of these contracts from you at your stated price, and use the law of large number to ensure that I valued each contract at £1.50, thus making a certain profit.

The first flaw consisted in the case where your utility is concave in cash ("risk averse"). In that case, I can't buy L from you unless you already have L. And each time I buy it from you, the mean quantity of cash you have goes down, but your utility goes up, since you do not like the uncertainty inherent in L. So I get richer, but you get more utility, and once you've sold all L's you have, I cannot make anything more out of you.

If your utility is convex in cash ("risk loving"), then I can sell you L forever, at more than £1.50. And your money will generally go down, as I drain it from you. However, though the median amount of cash you have goes down, your utility goes up, since you get a chance - however tiny - of huge amounts of cash, and the utility generated by this sum swamps the fact you are most likely ending up with nothing. If I could go on forever, then I can drain you entirely, as this is a biased random walk on a one-dimensional axis. But I would need infinite ressources to do this.

The major error was to reason like an investor, rather than a utility maximiser. Investors are very interested in putting prices on objects. And if you assign the wrong price to L while investing, someone will take advantage of you and arbitrage you. I might return to this in a subsequent post; but the issue is that even if your utility is concave or convex in money, you would put a price of £1.50 on L if L were an easily traded commodity with a lot of investors also pricing it at £1.50.

Expected utility without the independence axiom

9 Stuart_Armstrong 28 October 2009 02:40PM

John von Neumann and Oskar Morgenstern developed a system of four axioms that they claimed any rational decision maker must follow. The major consequence of these axioms is that when faced with a decision, you should always act solely to increase your expected utility. All four axioms have been attacked at various times and from various directions; but three of them are very solid. The fourth - independence - is the most controversial.

To understand the axioms, let A, B and C be lotteries - processes that result in different outcomes, positive or negative, with a certain probability of each. For 0<p<1, the mixed lottery pA + (1-p)B implies that you have p chances of being in lottery A, and (1-p) chances of being in lottery B. Then writing A>B means that you prefer lottery A to lottery B, A<B is the reverse and A=B means that you are indifferent between the two. Then the von Neumann-Morgenstern axioms are:

  • (Completeness) For every A and B either A<B, A>B or A=B.
  • (Transitivity) For every A, B and C with A>B and B>C, then A>C.
  • (Continuity) For every A>B>C then there exist a probability p with B=pA + (1-p)C.
  • (Independence) For every A, B and C with A>B, and for every 0<t≤1, then tA + (1-t)C > tB + (1-t)C.

In this post, I'll try and prove that even without the Independence axiom, you should continue to use expected utility in most situations. This requires some mild extra conditions, of course. The problem is that although these conditions are considerably weaker than Independence, they are harder to phrase. So please bear with me here.

The whole insight in this post rests on the fact that a lottery that has 99.999% chance of giving you £1 is very close to being a lottery that gives you £1 with certainty. I want to express this fact by looking at the narrowness of the probability distribution, using the standard deviation. However, this narrowness is not an intrinsic property of the distribution, but of our utility function. Even in the example above, if I decide that receiving £1 gives me a utility of one, while receiving zero gives me a utility of minus ten billion, then I no longer have a narrow distribution, but a wide one. So, unlike the traditional set-up, we have to assume a utility function as being given. Once this is chosen, this allows us to talk about the mean and standard deviation of a lottery.

Then if you define c(μ) as the lottery giving you a certain return of μ, you can use the following axiom instead of independence:

  • (Standard deviation bound) For all ε>0, there exists a δ>0 such that for all μ>0, then any lottery B with mean μ and standard deviation less that μδ has B>c((1-ε)μ).

This seems complicated, but all that it says, in mathematical terms, is that if we have a probability distribution that is "narrow enough" around its mean μ, then we should value it are being very close to a certain return of μ. The narrowness is expressed in terms of its standard deviation - a lottery with zero SD is a guaranteed return of μ, and as the SD gets larger, the distribution gets wider, and the chances of getting values far away from μ increases. So risk, in other words, scales (approximately) with the SD.

continue reading »

Extreme risks: when not to use expected utility

4 Stuart_Armstrong 23 October 2009 02:40PM

Would you prefer a 50% chance of gaining €10, one chance in a million off gaining €5 million, or a guaranteed €5? The standard position on Less Wrong is that the answer depends solely on the difference between cash and utility. If your utility scales less-than-linearly with money, you are risk averse and should choose the last option; if it scales more-than-linearly, you are risk-loving and should choose the second one. If we replaced €’s with utils in the example above, then it would simply be irrational to prefer one option over the others.

 

There are mathematical proofs of that result, but there are also strong intuitive arguments for it. What’s the best way of seeing this? Imagine that X1 and X2 were two probability distributions, with mean u1 and u2 and variances v1 and v2. If the two distributions are independent, then the sum X1 + X2 has mean u1 + u2, and variance v1 + v2.

 

Now if we multiply the returns of any distribution by a constant r, the mean scales by r and variance scales by r2. Consequently if we have n probability distributions X1, X2, ... , Xn representing n equally expensive investments, the expected average return is (Σni=1 ui)/n, while the variance of this average is (Σni=1 vi)/n2. If the vn are bounded, then once we make n large enough, that variance must tend to zero. So if you have many investments, your averaged actual returns will be, with high probability, very close to your expected returns.

 

continue reading »

View more: Prev | Next