You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Against Expected Utility

-3 Houshalter 23 September 2015 09:21PM

Expected utility is optimal as the number of bets you take approaches infinity. You will lose bets on some days, and win bets on other days. But as you take more and more bets, the day to day randomness cancels out.

Say you want to save as many lives as possible. You can plug "number of lives saved" into an expected utility maximizer. And as the amount of bets it takes increases, it will start to save more lives than any other method.

But the real world obviously doesn't have an infinite number of bets. And following this algorithm in practice will get you worse results. It is not optimal.

In fact, as Pascal's Mugging shows, this could get arbitrarily terrible. An agent following expected utility would just continuously make bets with muggers and worship various religions, until it runs out of resources. Or worse, the expected utility calculations don't even converge, and the agent doesn't make any decisions.

So how do we fix it? Well we could just go back to the original line of reasoning that led us to expected utility, and fix it for finite cases. Instead of caring what method does the best on infinite bets, we might say we want the one that does the best the most on finite cases. That would get you median utility.

For most things, median utility will approximate expected utility. But for very very small risks, it will ignore them. It only cares that it does the best in most possible worlds. It won't ever trade away utility from the majority of your possible worlds to very very unlikely ones.

A naive implementation of median utility isn't actually viable, because at different points in time, the agent might make inconsistent decisions. To fix this, it needs to decide on policies instead of individual decisions. It will pick a decision policy which it believes will lead to the highest median outcome.

This does complicate making a real implementation of this procedure. But that's what you get when you generalize results, and try to make things work on the messy real world. Instead of idealized infinite worlds. The same issue occurs in the multi-armed bandit problem. Where the optimal infinite solution is simple, but finite solutions are incredibly complicated (or simple but require brute force.)

But if you do this, you don't need the independence axiom. You can be consistent and avoid money pumping without it. By not making decisions in isolation, but considering the entire probability space of decisions you will ever make. And choosing the best policies to navigate them.

It's interesting to note this actually solves some other problems. Such an agent would pick a policy that one-boxes on Newcomb's problems, simply because that is the optimal policy. Whereas a straightforward implementation of expected utility doesn't care.


But what if you really like the other mathematical properties of expected utility? What if we can just keep it and change something else? Like the probability function or the utility function?

Well the probability function is sacred IMO. Events should have the same probability of happening (given your prior knowledge), regardless what utility function you have, or what you are trying to optimize. And it's probably inconsistent too. An agent could exploit you. By giving you bets in the areas where your beliefs are forced to be different from reality.

The utility function is not necessarily sacred though. It is inherently subjective, with the goal of just producing the behavior we want. Maybe there is some modification to it that could fix these problems.

It seems really inelegant to do this. We had a nice beautiful system where you could just count the number of lives saved, and maximize that. But assume we give up on that. How can we change the utility function to make it work?

Well you could bound utility to get out of mugging situations. After a certain level, your utility function just stops. It can't get any higher.

But then you are stuck with a bound. If you ever reach it, then you suddenly stop caring about saving any more lives. Now it's possible that your true utility function really is bounded. But it's not a fully general solution for all utility functions. And I don't believe that human utility is actually bounded, but that will have to be a different post.

You could transform the utility function so it asymptotic. But this is just a continuous bound. It doesn't solve much. It still makes you care less and less about obtaining more utility, the closer you get to it.

Say you set your asymptote around 1,000. It can be much larger, but I need an example that is manageable. Now, what happens if you find yourself to exist in a world where all utilities are multiplied by a large number? Say 1,000. E.g. you save a 1,000 lives in situations where before, you would have saved only 1.

An example asymptoting function that is capped at 1,000. Notice how 2,000 is only slightly higher than 1,000, and everything after that is basically flat.

Now the utility of each additional life is diminishing very quickly. Saving 2,000 lives might have only 0.001% more utility than 1,000 lives.

This means that you would not take a 1% risk of losing 1,000 people, for a 99% chance at saving 2,000.

This is the exact opposite situation of Pascal's mugging! The probability of the reward is very high. Why are we refusing such an obviously good trade?

What we wanted to do was make it ignore really low probability bets. What we actually did was just make it stop caring about big rewards, regardless of the probability.

No modification to it can fix that. Because the utility function is totally indifferent to probability. That's what the decision procedure is for. That's where the real problem is.


In researching this topic I've seen all kinds of crazy resolutions to Pascal's Mugging. Some try to attack the exact thought experiment of an actual mugger. And miss the general problem of low probability events with large rewards. Others try to come up with clever arguments why you shouldn't pay the mugger. But not any general solution to the problem. And not one that works under the stated premises, where you care about saving human lives equally, and where you assign the mugger less than 1/3↑↑↑3 probability.

In fact Pascal's Mugger was originally written just to be a formalization of Pascal's original wager. Pascal's wager was dismissed for reasons like involving infinite utilities, and the possibility of an "anti-god" that exactly cancels the benefits out. Or that God wouldn't reward fake worshippers. People mostly missed the whole point about whether or not you should take low probability, high reward bets.

Pascal's Mugger showed that, no, it works fine in finite cases, and the probabilities do not have to exactly cancel each other out

Some people tried to fix the problem by adding hacks on top of the probability or utility functions. I argued against these solutions above. The problem is fundamentally with the decision procedure of expected utility.

I've spoken to someone who decided to just bite the bullet. He accepted that our intuition about big numbers is probably wrong, and we should just do what the math tells us.

But even that doesn't work. One of the points made in the original Pascal's Mugging post is that EU doesn't even converge. There is a hypothesis which has even less probability than the mugger, but promises 3↑↑↑↑3 utility. And a hypothesis even smaller than that which promises 3↑↑↑↑↑3 utility, and so on. Expected utility is utterly dominated by increasingly more improbable hypotheses. The expected utility of all actions approaches positive or negative infinity.

Expected utility is at the heart of the problem. We don't really want the average of our utility function over all possible worlds. No matter how big the numbers are or improbable they may be. We don't really want to trade away utility from the majority of our probability mass to infinitesimal slices of it.

The whole justification for EU being optimal in the infinite case, doesn't apply to the finite real world. The axioms that imply you need it to be consistent aren't true if you don't assume independence. So it's not sacred, and we can look at alternatives.

Median utility is just a first attempt at an alternative. We probably don't really want to maximize median utility either. Stuart Armstrong suggests using the mean of quantiles. There are probably better methods too. In fact there is an entire field of summary statistics and robust statistics, that I've barely looked at yet.

We can generalize and think of agents has having two utility functions. The regular utility function, which just gives a numerical value representing how preferable an outcome is. And a probability preference function, which gives a numerical value to each probability distribution of utilities.

Imagine we want to create an AI which acts the same as the agent would, given the same knowledge. Then we would need to know both of these functions. Not just the utility function. And they are both subjective, with no universally correct answer. Any function, so long as it converges (unlike expected utility), should produce perfectly consistent behavior.

Mean of quantiles

1 Stuart_Armstrong 09 September 2015 06:55PM

In a previous post, I looked at some of the properties of using the median rather than the mean.

Inspired by Househalter's comment, it seems we might be able to take a compromise between median and mean. It seems to me that simply taking the mean of the lower quartile, median, and upper quartile would also have the nice features I described, and would likely be closer to the mean.

Furthermore, there's no reason to stop there. We can take the mean of the n-1 n-quantiles.

Two questions:

  1. As n increases, does this quantity tend to the mean if it exists? (I suspect yes).
  2. For some distributions (eg Cauchy distribution) this quantity will tend to a limit as n increases, even if there is no mean. Is this an effective way of extending means to distributions that don't possess them?

Note the unlike the median approach, for large enough n, this maximiser will pay Pascal's mugger.

Median utility rather than mean?

6 Stuart_Armstrong 08 September 2015 04:35PM

tl;dr A median maximiser will expect to win. A mean maximiser will win in expectation. As we face repeated problems of similar magnitude, both types take on the advantage of the other. However, the median maximiser will turn down Pascal's muggings, and can say sensible things about distributions without means.

Prompted by some questions from Kaj Sotala, I've been thinking about whether we should use the median rather than the mean when comparing the utility of actions and policies. To justify this, see the next two sections: why the median is like the mean, and why the median is not like the mean.

 

Why the median is like the mean

The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones.

It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independence doesn't apply.

So you've picked the policy πm with the highest median value - note that to do this, you need only know an ordinal ranking of worlds, not their cardinal values. In what way is this like maximising expected utility? Essentially, the more options and choices you have - or could hypothetically have - the closer this policy must be to expected utility maximalisation.

Assume u is a utility function compatible with your ordinal ranking of the worlds. Then πu = 'maximise the expectation of u' is also a policy choice. If we choose πm, we get a distribution dmu of possible values of u. Then E(u|πm) is within the absolute deviation (using dmu) of the median value of dmu. This absolute deviation always exists for any distribution with an expectation, and is itself bounded by the standard deviation, if it exists.

Thus maximising the median is like maximising the mean, with an error depending on the standard deviation. You can see it as a risk averse utility maximising policy (I know, I know - risk aversion is supposed to go in defining the utility, not in maximising it. Read on!). And as we face more and more choices, the standard deviation will tend to fall relative to the mean, and the median will cluster closer and closer to the mean.

For instance, suppose we consider the choice of whether to buckle our seatbelt or not. Assume we don't want to die in a car accident that a seatbelt could prevent; assume further that the cost of buckling a seatbelt is trivial but real. To simplify, suppose we have an independent 1/Ω chance of death every time we're in a car, and that a seatbelt could prevent this, for some large Ω. Furthermore, we will be in a car a total of ρΩ, for ρ < 0.5. Now, it seems, the median recommends a ridiculous policy: never wear seatbelts. Then you pay no cost ever, and your chance of dying is less than 50%, so this has the top median.

And that is indeed a ridiculous result. But it's only possible because we look at seatbelts in isolation. Every day, we face choices that have small chances of killing us. We could look when crossing the street; smoke or not smoke cigarettes; choose not to walk close to the edge of tall buildings; choose not to provoke co-workers to fights; not run around blindfolded. I'm deliberately including 'stupid things no-one sensible would ever do', because they are choices, even if they are obvious ones. Let's gratuitously assume that all these choices also have a 1/Ω chance of killing you. When you collect together all the possible choices (obvious or not) that you make in your life, this will be ρ'Ω choice, for ρ' likely quite a lot bigger than 1.

Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident). Now median-maximisation will recommend taking safety precautions for roughly (ρ'-0.5)Ω of these choices. This means that the decision of a median maximiser will be close to those of a utility maximiser - they take almost the same precautions - though the outcomes are still pretty far apart: the median maximiser accepts a 49.99999...% chance of death.

But now add serious injury to the mix (still assume the costs are incommensurable). This has a rather larger probability, and the median maximiser will now only accept a 49.99999...% chance of serious injury. Or add light injury - now they only accept a 49.99999...% chance of light injury. If light injuries are additive - two injuries are worse than one - then the median maximiser becomes even more reluctant to take risks. We can now relax the assumption of incommensurablility as well; the set of policies and assessments becomes even more complicated, and the median maximiser moves closer to the mean maximiser.

The same phenomena tends to happen when we add lotteries of decisions, chained decisions (decisions that depend on other decisions), and so on. Existential risks are interesting examples: from the selfish point of view, existential risks are just other things that can kills us - and not the most unlikely ones, either. So the median maximiser will be willing to pay a trivial cost to avoid an xrisk. Will a large group of median maximisers be willing to collectively pay a large cost to avoid an xrisk? That gets into superrationality, which I haven't considered yet in this context.

But let's turn back to the mystical utility function that we are trying to maximise. It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so. Since we should, people on this list tend to often assume that we actually have one, skipping over the process of constructing it. But how would that process go? Let's assume we've managed to make our preferences transitive, already a major good achievement. How should we go about making them independent as well? We can do so as we go along. But if we do it ahead of time, chances are that we will be comparing hypothetical situations ("Do I like chocolate twice as much as sex? What would I think of a 50% chance of chocolate vs guaranteed sex? Well, it depends on the situation...") and thus construct a utility function. This is where we have to make decisions about very obscure and unintuitive hypothetical tradeoffs, and find a way to fold all our risk aversion/risk love into the utility.

When median maximising, we do exactly the same thing, except we constrain ourselves to choices that are actually likely to happen to us. We don't need a full ranking of all possible lotteries and choices; we just need enough to decide in the situations we are likely to face. You could consider this a form of moral learning (or preference learning). From our choices in different situations (real or possible), we decide what our preferences are in these situations, and this determines our preferences overall.

 

Why the median is not like the mean

Ok, so the previous paragraph argues that median maximising, if you have enough choices, functions like a clunky version of expected utility maximising. So what's the point?

The point is those situations that are not faced sufficiently often, or that have extreme characteristics. A median maximiser will reject Pascal's mugging, for instance, without any need for extra machinery (though they will accept Pascal's muggings if they face enough independent muggings, which is what we want - for stupidly large values of "enough"). They cope fine with distributions that have no means - such as the Cauchy distribution or a utility version of the St Petersburg paradox. They don't fall into paradox when facing choices with infinite (but ordered) rewards.

In a sense, median maximalisation is like expected utility maximalisation for common choices, but is different for exceptionally unlikely or high impact choices. Or, from the opposite perspective, expected utility maximising gives high probability of good outcomes for common choices, but not for exceptionally unlikely or high impact choices.

Another feature of the general idea (which might be seen as either a plus or a minus) is that it can get around some issues with total utilitarianism and similar ethical systems (such as the repugnant conclusion). What do I mean by this? Well, because the idea is that only choices that we actually expect to make matter, we can say, for instance, that we'd prefer a small ultra happy population to a huge barely-happy one. And if this is the only choice we make, we need not fear any paradoxes: we might get hypothetical paradoxes, just not actual ones. I won't put too much insistence on this point, I just thought it was an interesting observation.

 

For lack of a Cardinal...

Now, the main issue is that we might feel that there are certain rare choices that are just really bad or really good. And we might come to this conclusion by rational reasoning, rather than by experience, so this will not show up in the median. In these cases, it feels like we might want to force some kind of artificial cardinal order on the worlds, to make the median maximiser realise that certain rare events must be considered beyond their simple ordinal ranking.

In this case, maybe we could artificially add some hypothetical choices to our system, making us address these questions more than we actually would, and thus drawing them closer to the mean maximising situation. But there may be other, better ways of doing this.

 

Anyway, that's my first pass at constructing a median maximising system. Comments and critics welcome!

 

EDIT: We can use the absolute deviation (technically, the mean absolute deviation around the mean) to bound the distance between median and mean. This itself is bounded by the standard deviation, if it exists.