Quick puzzle about utility functions under affine transformations
Here's a puzzle based on something I used to be confused about:
It is known that utility functions are equivalent (i.e. produce the same preferences over actions) up to a positive affine transformation: u'(x) = au(x) + b where a is positive.
Suppose I have u(vanilla) = 3, u(chocolate) = 8. I prefer an action that yields a 50% chance of chocolate over an action that yields a 100% chance of vanilla, because 0.5(8) > 1.0(3).
Under the positive affine transformation a = 1, b = 4; we get that u'(vanilla) = 7 and u'(chocolate) = 12. Therefore I now prefer the action that yields a 100% chance of vanilla, because 1.0(7) > 0.5(12).
How to resolve the contradiction?
Iterated Gambles and Expected Utility Theory
The Setup
I'm about a third of the way through Stanovich's Decision Making and Rationality in the Modern World. Basically, I've gotten through some of the more basic axioms of decision theory (Dominance, Transitivity, etc).
As I went through the material, I noted that there were a lot of these:
Decision 5. Which of the following options do you prefer (choose one)?
A. A sure gain of $240
B. 25% chance to gain $1,000 and 75% chance to gain nothing
The text goes on to show how most people tend to make irrational choices when confronted with decisions like this; most strikingly was how often irrelevant contexts and framing effected people's decisions.
But I understand the decision theory bit; my question is a little more complicated.
When I was choosing these options myself, I did what I've been taught by the rationalist community to do in situations where I am given nice, concrete numbers: I shut up and I multiplied, and at each decision choose the option with the highest expected utility.
Granted, I equated dollars to utility, which Stanovich does mention that humans don't do well (see Prospect Theory).
The Problem
In the above decision, option B clearly has the higher expected utility, so I chose it. But there was still a nagging doubt in my mind, some part of me that thought, if I was really given this option, in real life, I'd choose A.
So I asked myself: why would I choose A? Is this an emotion that isn't well-calibrated? Am I being risk-averse for gains but risk-taking for losses?
What exactly is going on?
And then I remembered the Prisoner's Dilemma.
A Tangent That Led Me to an Idea
Now, I'll assume that anyone reading this has a basic understanding of the concept, so I'll get straight to the point.
In classical decision theory, the choice to defect (rat the other guy out) is strictly superior to the choice to cooperate (keep your mouth shut). No matter what your partner in crime does, you get a better deal if you defect.
Now, I haven't studied the higher branches of decision theory yet (I have a feeling that Eliezer, for example, would find a way to cooperate and make his partner in crime cooperate as well; after all, rationalists should win.)
Where I've seen the Prisoner's Dilemma resolved is, oddly enough, in Dawkin's The Selfish Gene, which is where I was first introduced to the idea of an Iterated Prisoner's Dilemma.
The interesting idea here is that, if you know you'll be in the Prisoner's Dilemma with the same person multiple times, certain kinds of strategies become available that weren't possible in a single instance of the Dilemma. Partners in crime can be punished for defecting by future defections on your own behalf.
The key idea here is that I might have a different response to the gamble if I knew I could take it again.
The Math
Let's put on our probability hats and actually crunch the numbers:
Format - Probability: $Amount of Money | Probability: $Amount of Money
Assuming one picks A over and over again, or B over and over again.
Iteration A--------------------------------------------------------------------------------------------B
1 $240-----------------------------------------------------------------------------------------1/4: $1,000 | 3/4: $0
2 $480----------------------------------------------------------------------1/16: $2,000 | 6/16: $1,000 | 9/16: $0
3 $720---------------------------------------------------1/64: $3,000 | 9/64: $2,000 | 27/64: $1,000 | 27/64: $0
4 $960------------------------1/256: $4,000 | 12/256: $3,000 | 54/256: $2,000 | 108/256: $1,000 | 81/256: $0
5 $1,200----1/1024: $5,000 | 15/1024: $4,000 | 90/256: $3,000 | 270/1024: $2,000 | 405/1024: $1,000 | 243/1024: $0
And so on. (If I've ma de a mistake, please let me know.)
The Analysis
It is certainly true that, in terms of expected money, option B outperforms option A no matter how many times one takes the gamble, but instead, let's think in terms of anticipated experience - what we actually expect to happen should we take each bet.
The first time we take option B, we note that there is a 75% chance that we walk away disappointed. That is, if one person chooses option A, and four people choose option B, on average three out of those four people will underperform the person who chose option A. And it probably won't come as much consolation to the three losers that the winner won significantly bigger than the person who chose A.
And since nothing unusual ever happens, we should think that, on average, having taken option B, we'd wind up underperforming option A.
Now let's look at further iterations. In the second iteration, we're more likely than not to have nothing having taken option B twice than we are to have anything.
In the third iteration, there's about a 57.8% chance that we'll have outperformed the person who chose option A the whole time, and a 42.2% chance that we'll have nothing.
In the fourth iteration, there's a 73.8% chance that we'll have matched or done worse than the person who has chose option A four times (I'm rounding a bit, $1,000 isn't that much better than $960).
In the fifth iteration, the above percentage drops to 63.3%.
Now, without doing a longer analysis, I can tell that option B will eventually win. That was obvious from the beginning.
But there's still a better than even chance you'll wind up with less, picking option B, than by picking option A. At least for the first five times you take the gamble.
Conclusions
If we act to maximize expected utility, we should choose option B, at least so long as I hold that dollars=utility. And yet it seems that one would have to take option B a fair number of times before it becomes likely that any given person, taking the iterated gamble, will outperform a different person repeatedly taking option A.
In other words, of the 1025 people taking the iterated gamble:
we expect 1 to walk away with $1,200 (from taking option A five times),
we expect 376 to walk away with more than $1,200, casting smug glances at the scaredy-cat who took option A the whole time,
and we expect 648 to walk away muttering to themselves about how the whole thing was rigged, casting dirty glances at the other 377 people.
After all the calculations, I still think that, if this gamble was really offered to me, I'd take option A, unless I knew for a fact that I could retake the gamble quite a few times. How do I interpret this in terms of expected utility?
Am I not really treating dollars as equal to utility, and discounting the marginal utility of the additional thousands of dollars that the 376 win?
What mistakes am I making?
Also, a quick trip to google confirms my intuition that there is plenty of work on iterated decisions; does anyone know a good primer on them?
I'd like to leave you with this:
If you were actually offered this gamble in real life, which option would you take?
Against Expected Utility
Expected utility is optimal as the number of bets you take approaches infinity. You will lose bets on some days, and win bets on other days. But as you take more and more bets, the day to day randomness cancels out.
Say you want to save as many lives as possible. You can plug "number of lives saved" into an expected utility maximizer. And as the amount of bets it takes increases, it will start to save more lives than any other method.
But the real world obviously doesn't have an infinite number of bets. And following this algorithm in practice will get you worse results. It is not optimal.
In fact, as Pascal's Mugging shows, this could get arbitrarily terrible. An agent following expected utility would just continuously make bets with muggers and worship various religions, until it runs out of resources. Or worse, the expected utility calculations don't even converge, and the agent doesn't make any decisions.
So how do we fix it? Well we could just go back to the original line of reasoning that led us to expected utility, and fix it for finite cases. Instead of caring what method does the best on infinite bets, we might say we want the one that does the best the most on finite cases. That would get you median utility.
For most things, median utility will approximate expected utility. But for very very small risks, it will ignore them. It only cares that it does the best in most possible worlds. It won't ever trade away utility from the majority of your possible worlds to very very unlikely ones.
A naive implementation of median utility isn't actually viable, because at different points in time, the agent might make inconsistent decisions. To fix this, it needs to decide on policies instead of individual decisions. It will pick a decision policy which it believes will lead to the highest median outcome.
This does complicate making a real implementation of this procedure. But that's what you get when you generalize results, and try to make things work on the messy real world. Instead of idealized infinite worlds. The same issue occurs in the multi-armed bandit problem. Where the optimal infinite solution is simple, but finite solutions are incredibly complicated (or simple but require brute force.)
But if you do this, you don't need the independence axiom. You can be consistent and avoid money pumping without it. By not making decisions in isolation, but considering the entire probability space of decisions you will ever make. And choosing the best policies to navigate them.
It's interesting to note this actually solves some other problems. Such an agent would pick a policy that one-boxes on Newcomb's problems, simply because that is the optimal policy. Whereas a straightforward implementation of expected utility doesn't care.
But what if you really like the other mathematical properties of expected utility? What if we can just keep it and change something else? Like the probability function or the utility function?
Well the probability function is sacred IMO. Events should have the same probability of happening (given your prior knowledge), regardless what utility function you have, or what you are trying to optimize. And it's probably inconsistent too. An agent could exploit you. By giving you bets in the areas where your beliefs are forced to be different from reality.
The utility function is not necessarily sacred though. It is inherently subjective, with the goal of just producing the behavior we want. Maybe there is some modification to it that could fix these problems.
It seems really inelegant to do this. We had a nice beautiful system where you could just count the number of lives saved, and maximize that. But assume we give up on that. How can we change the utility function to make it work?
Well you could bound utility to get out of mugging situations. After a certain level, your utility function just stops. It can't get any higher.
But then you are stuck with a bound. If you ever reach it, then you suddenly stop caring about saving any more lives. Now it's possible that your true utility function really is bounded. But it's not a fully general solution for all utility functions. And I don't believe that human utility is actually bounded, but that will have to be a different post.
You could transform the utility function so it asymptotic. But this is just a continuous bound. It doesn't solve much. It still makes you care less and less about obtaining more utility, the closer you get to it.
Say you set your asymptote around 1,000. It can be much larger, but I need an example that is manageable. Now, what happens if you find yourself to exist in a world where all utilities are multiplied by a large number? Say 1,000. E.g. you save a 1,000 lives in situations where before, you would have saved only 1.

An example asymptoting function that is capped at 1,000. Notice how 2,000 is only slightly higher than 1,000, and everything after that is basically flat.
Now the utility of each additional life is diminishing very quickly. Saving 2,000 lives might have only 0.001% more utility than 1,000 lives.
This means that you would not take a 1% risk of losing 1,000 people, for a 99% chance at saving 2,000.
This is the exact opposite situation of Pascal's mugging! The probability of the reward is very high. Why are we refusing such an obviously good trade?
What we wanted to do was make it ignore really low probability bets. What we actually did was just make it stop caring about big rewards, regardless of the probability.
No modification to it can fix that. Because the utility function is totally indifferent to probability. That's what the decision procedure is for. That's where the real problem is.
In researching this topic I've seen all kinds of crazy resolutions to Pascal's Mugging. Some try to attack the exact thought experiment of an actual mugger. And miss the general problem of low probability events with large rewards. Others try to come up with clever arguments why you shouldn't pay the mugger. But not any general solution to the problem. And not one that works under the stated premises, where you care about saving human lives equally, and where you assign the mugger less than 1/3↑↑↑3 probability.
In fact Pascal's Mugger was originally written just to be a formalization of Pascal's original wager. Pascal's wager was dismissed for reasons like involving infinite utilities, and the possibility of an "anti-god" that exactly cancels the benefits out. Or that God wouldn't reward fake worshippers. People mostly missed the whole point about whether or not you should take low probability, high reward bets.
Pascal's Mugger showed that, no, it works fine in finite cases, and the probabilities do not have to exactly cancel each other out
Some people tried to fix the problem by adding hacks on top of the probability or utility functions. I argued against these solutions above. The problem is fundamentally with the decision procedure of expected utility.
I've spoken to someone who decided to just bite the bullet. He accepted that our intuition about big numbers is probably wrong, and we should just do what the math tells us.
But even that doesn't work. One of the points made in the original Pascal's Mugging post is that EU doesn't even converge. There is a hypothesis which has even less probability than the mugger, but promises 3↑↑↑↑3 utility. And a hypothesis even smaller than that which promises 3↑↑↑↑↑3 utility, and so on. Expected utility is utterly dominated by increasingly more improbable hypotheses. The expected utility of all actions approaches positive or negative infinity.
Expected utility is at the heart of the problem. We don't really want the average of our utility function over all possible worlds. No matter how big the numbers are or improbable they may be. We don't really want to trade away utility from the majority of our probability mass to infinitesimal slices of it.
The whole justification for EU being optimal in the infinite case, doesn't apply to the finite real world. The axioms that imply you need it to be consistent aren't true if you don't assume independence. So it's not sacred, and we can look at alternatives.
Median utility is just a first attempt at an alternative. We probably don't really want to maximize median utility either. Stuart Armstrong suggests using the mean of quantiles. There are probably better methods too. In fact there is an entire field of summary statistics and robust statistics, that I've barely looked at yet.
We can generalize and think of agents has having two utility functions. The regular utility function, which just gives a numerical value representing how preferable an outcome is. And a probability preference function, which gives a numerical value to each probability distribution of utilities.
Imagine we want to create an AI which acts the same as the agent would, given the same knowledge. Then we would need to know both of these functions. Not just the utility function. And they are both subjective, with no universally correct answer. Any function, so long as it converges (unlike expected utility), should produce perfectly consistent behavior.
Mean of quantiles
In a previous post, I looked at some of the properties of using the median rather than the mean.
Inspired by Househalter's comment, it seems we might be able to take a compromise between median and mean. It seems to me that simply taking the mean of the lower quartile, median, and upper quartile would also have the nice features I described, and would likely be closer to the mean.
Furthermore, there's no reason to stop there. We can take the mean of the n-1 n-quantiles.
Two questions:
- As n increases, does this quantity tend to the mean if it exists? (I suspect yes).
- For some distributions (eg Cauchy distribution) this quantity will tend to a limit as n increases, even if there is no mean. Is this an effective way of extending means to distributions that don't possess them?
Note the unlike the median approach, for large enough n, this maximiser will pay Pascal's mugger.
Median utility rather than mean?
tl;dr A median maximiser will expect to win. A mean maximiser will win in expectation. As we face repeated problems of similar magnitude, both types take on the advantage of the other. However, the median maximiser will turn down Pascal's muggings, and can say sensible things about distributions without means.
Prompted by some questions from Kaj Sotala, I've been thinking about whether we should use the median rather than the mean when comparing the utility of actions and policies. To justify this, see the next two sections: why the median is like the mean, and why the median is not like the mean.
Why the median is like the mean
The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones.
It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independence doesn't apply.
So you've picked the policy πm with the highest median value - note that to do this, you need only know an ordinal ranking of worlds, not their cardinal values. In what way is this like maximising expected utility? Essentially, the more options and choices you have - or could hypothetically have - the closer this policy must be to expected utility maximalisation.
Assume u is a utility function compatible with your ordinal ranking of the worlds. Then πu = 'maximise the expectation of u' is also a policy choice. If we choose πm, we get a distribution dmu of possible values of u. Then E(u|πm) is within the absolute deviation (using dmu) of the median value of dmu. This absolute deviation always exists for any distribution with an expectation, and is itself bounded by the standard deviation, if it exists.
Thus maximising the median is like maximising the mean, with an error depending on the standard deviation. You can see it as a risk averse utility maximising policy (I know, I know - risk aversion is supposed to go in defining the utility, not in maximising it. Read on!). And as we face more and more choices, the standard deviation will tend to fall relative to the mean, and the median will cluster closer and closer to the mean.
For instance, suppose we consider the choice of whether to buckle our seatbelt or not. Assume we don't want to die in a car accident that a seatbelt could prevent; assume further that the cost of buckling a seatbelt is trivial but real. To simplify, suppose we have an independent 1/Ω chance of death every time we're in a car, and that a seatbelt could prevent this, for some large Ω. Furthermore, we will be in a car a total of ρΩ, for ρ < 0.5. Now, it seems, the median recommends a ridiculous policy: never wear seatbelts. Then you pay no cost ever, and your chance of dying is less than 50%, so this has the top median.
And that is indeed a ridiculous result. But it's only possible because we look at seatbelts in isolation. Every day, we face choices that have small chances of killing us. We could look when crossing the street; smoke or not smoke cigarettes; choose not to walk close to the edge of tall buildings; choose not to provoke co-workers to fights; not run around blindfolded. I'm deliberately including 'stupid things no-one sensible would ever do', because they are choices, even if they are obvious ones. Let's gratuitously assume that all these choices also have a 1/Ω chance of killing you. When you collect together all the possible choices (obvious or not) that you make in your life, this will be ρ'Ω choice, for ρ' likely quite a lot bigger than 1.
Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident). Now median-maximisation will recommend taking safety precautions for roughly (ρ'-0.5)Ω of these choices. This means that the decision of a median maximiser will be close to those of a utility maximiser - they take almost the same precautions - though the outcomes are still pretty far apart: the median maximiser accepts a 49.99999...% chance of death.
But now add serious injury to the mix (still assume the costs are incommensurable). This has a rather larger probability, and the median maximiser will now only accept a 49.99999...% chance of serious injury. Or add light injury - now they only accept a 49.99999...% chance of light injury. If light injuries are additive - two injuries are worse than one - then the median maximiser becomes even more reluctant to take risks. We can now relax the assumption of incommensurablility as well; the set of policies and assessments becomes even more complicated, and the median maximiser moves closer to the mean maximiser.
The same phenomena tends to happen when we add lotteries of decisions, chained decisions (decisions that depend on other decisions), and so on. Existential risks are interesting examples: from the selfish point of view, existential risks are just other things that can kills us - and not the most unlikely ones, either. So the median maximiser will be willing to pay a trivial cost to avoid an xrisk. Will a large group of median maximisers be willing to collectively pay a large cost to avoid an xrisk? That gets into superrationality, which I haven't considered yet in this context.
But let's turn back to the mystical utility function that we are trying to maximise. It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so. Since we should, people on this list tend to often assume that we actually have one, skipping over the process of constructing it. But how would that process go? Let's assume we've managed to make our preferences transitive, already a major good achievement. How should we go about making them independent as well? We can do so as we go along. But if we do it ahead of time, chances are that we will be comparing hypothetical situations ("Do I like chocolate twice as much as sex? What would I think of a 50% chance of chocolate vs guaranteed sex? Well, it depends on the situation...") and thus construct a utility function. This is where we have to make decisions about very obscure and unintuitive hypothetical tradeoffs, and find a way to fold all our risk aversion/risk love into the utility.
When median maximising, we do exactly the same thing, except we constrain ourselves to choices that are actually likely to happen to us. We don't need a full ranking of all possible lotteries and choices; we just need enough to decide in the situations we are likely to face. You could consider this a form of moral learning (or preference learning). From our choices in different situations (real or possible), we decide what our preferences are in these situations, and this determines our preferences overall.
Why the median is not like the mean
Ok, so the previous paragraph argues that median maximising, if you have enough choices, functions like a clunky version of expected utility maximising. So what's the point?
The point is those situations that are not faced sufficiently often, or that have extreme characteristics. A median maximiser will reject Pascal's mugging, for instance, without any need for extra machinery (though they will accept Pascal's muggings if they face enough independent muggings, which is what we want - for stupidly large values of "enough"). They cope fine with distributions that have no means - such as the Cauchy distribution or a utility version of the St Petersburg paradox. They don't fall into paradox when facing choices with infinite (but ordered) rewards.
In a sense, median maximalisation is like expected utility maximalisation for common choices, but is different for exceptionally unlikely or high impact choices. Or, from the opposite perspective, expected utility maximising gives high probability of good outcomes for common choices, but not for exceptionally unlikely or high impact choices.
Another feature of the general idea (which might be seen as either a plus or a minus) is that it can get around some issues with total utilitarianism and similar ethical systems (such as the repugnant conclusion). What do I mean by this? Well, because the idea is that only choices that we actually expect to make matter, we can say, for instance, that we'd prefer a small ultra happy population to a huge barely-happy one. And if this is the only choice we make, we need not fear any paradoxes: we might get hypothetical paradoxes, just not actual ones. I won't put too much insistence on this point, I just thought it was an interesting observation.
For lack of a Cardinal...
Now, the main issue is that we might feel that there are certain rare choices that are just really bad or really good. And we might come to this conclusion by rational reasoning, rather than by experience, so this will not show up in the median. In these cases, it feels like we might want to force some kind of artificial cardinal order on the worlds, to make the median maximiser realise that certain rare events must be considered beyond their simple ordinal ranking.
In this case, maybe we could artificially add some hypothetical choices to our system, making us address these questions more than we actually would, and thus drawing them closer to the mean maximising situation. But there may be other, better ways of doing this.
Anyway, that's my first pass at constructing a median maximising system. Comments and critics welcome!
EDIT: We can use the absolute deviation (technically, the mean absolute deviation around the mean) to bound the distance between median and mean. This itself is bounded by the standard deviation, if it exists.
Pascal's Mugging - Penalizing the prior probability?
Eliezer Yudkowsky wrote that Robin Hanson solved the Pascal's mugging thought experiment:
Robin Hanson has suggested penalizing the prior probability of hypotheses which argue that we are in a surprisingly unique position to affect large numbers of other people who cannot symmetrically affect us. Since only one in 3^^^^3 people can be in a unique position to ordain the existence of at least 3^^^^3 other people who are not symmetrically in such a situation themselves, the prior probability would be penalized by a factor on the same order as the utility.
I don't quite get it, is there a post that discusses this solution in more detail?
To be more specific, if a stranger approached me, offering a deal saying, "I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years." Why exactly would I penalize this offer by the amount of copies being offered to be simulated? I thought the whole point was that the utility, of having 3^^^^3 copies of myself experiencing maximal happiness, does outweigh the low probability of it actually happening and the disuility of doing what the stranger asks for?
I would love to see this problem being discussed again and read about the current state of knowledge.
I am especially interested in the following questions:
- Is the Pascal's mugging thought experiment a "reduction to the absurd" of Bayes’ Theorem in combination with the expected utility formula and Solomonoff induction?1
- Could the "mugger" be our own imagination?2
- At what point does an expected utility calculation resemble a Pascal's mugging scenario and should consequently be ignored?3
1 If you calculate the expected utility of various outcomes you imagine impossible alternative actions. The alternatives are impossible because you already precommited to choosing the outcome with the largest expected utility. Problems: 1.) You swap your complex values for a certain terminal goal with the highest expected utility, indeed your instrumental and terminal goals converge to become the expected utility formula. 2.) Your decision-making is eventually dominated by extremely small probabilities of obtaining vast utility.
2 Insignificant inferences might exhibit hyperbolic growth in utility: 1.) There is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. 2.) The extrapolation of counterfactual alternatives is unbounded, logical implications can reach out indefinitely without ever requiring new empirical evidence.
3 Extrapolations work and often are the best we can do. But since there are problems like 'Pascal's Mugging', that we perceive to be undesirable and that lead to an infinite hunt for ever larger expected utility, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics. We agree that we are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that agent wants. We might also agree that we are not going to stop loving our girlfriend just because there are many people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being married. Therefore we already informally established some upper and lower bounds. But when do we start to take our heuristics seriously and do whatever they prove to be the optimal decision?
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)