Then E(u|πm) is within one standard deviation (using dmu) of the median value of dmu.
As the Wikipedia says, "If the distribution has finite variance". That's not necessarily a good assumption.
Consider a policy with three possible outcomes: one pony; two ponies; the universe is converted to paperclips. What's the median outcome? One pony. Don't you want a pony?
The median is a robust estimator meaning that it's harder for outliers to screw you up. The price for that, though, is indifference to the outliers which I am not sure is advisable in the utility context.
It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so.
Given a choice between "change people" and "change axioms", I'd be inclined to change axioms.
This seems to be a case of trying to find easy solutions to hard abstract problems at the cost of failing to be correct on easy and ordinary ones. It's also fairly trivial to come up with abstract scenarios where this fails catastrophically, so it's not like this wins on the abstract scenarios front either. It just fails on a new and different set of problems - ones that aren't talked about because no-one's ever found a way to fail on them before.
Also, all of the problems you list it solving are problems which I would consider to be satisfactorily solved a...
...The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones. It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independenc
I posted this exact idea a few months ago. There was a lot of discussion about it which you might find interesting. We also discussed it recently on the irc channel.
Median utility by itself doesn't work. I came up with an algorithm that compromises between them. In everyday circumstances it behaves like expected utility. In extreme cases, it behaves like median utility. And it has tunable parameters:
...sample n counterfactuals from your probability distribution. Then take the average of these n outcomes, [EDIT: and do this an infinite amount of times, and t
I don't understand your argument that the median utility maximizer would buckle its seat belt in the real world. It seemed kind of like you might be trying to argue that median utility maximizers and expected utility maximizers would always approximate each other under realistic conditions, but since you then argue that the alleged difference in their behavior on the Pascal's mugging problem is a reason to prefer median utility maximizers (implying that Pascal's mugging-type problems should be accepted as realistic, or at least that getting them correct is...
Median expected behavior is simple which makes it easy to calculate.
As an electrical engineer when I design circuits I start off by assuming that all my parts behave exactly as rated. If a resistor says it's 220+10% Ohms then I use 220 for my initial calculations. Assuming median behavior works wonderfully in telling me what my circuit probably will do.
In fact that's good enough info for me to base my design decision on for a lot of purposes (given a quick verification of functionality, of course).
But what about that 10%? What if it might matter? On...
"Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident)."
Suppose you had a choice: die in a plane crash, or listen to those plane safety announcements one million times. I choose dying in a plane crash.
I don't understand your argument that the median utility maximizer would buckle its seat belt in the real world. It seemed kind of like you might be trying to argue that median utility maximizers and expected utility maximizers would always approximate each other under realistic conditions, but since you then argue that the alleged difference in their behavior on the Pascal's mugging problem is a reason to prefer median utility maximizers (implying that Pascal's mugging-type problems should be accepted as realistic, or at least that getting them correct is important in a way that getting "buckle my seatbelt, given that this is the only decision I will ever make" right isn't), so I guess that's not it.
But anyway, even if you are right that median utility maximizers buckle their seatbelts in the context of a realistic collections of choices, you concede that they do not buckle their seatbelts when the decision is isolated, and that this is the incorrect decision. I think you should take the fact that your proposal gets a really easy problem wrong much more seriously. If it can't get the seatbelt problem right, it is a bad algorithm, and bad algorithms should not be expected to perform well in real-world problems. I would give an example of a real-world problem that it performs poorly on, but I would have said something like the seatbelt problem, and since I don't understand your argument that it gets that right in the real world, I don't know what must be done in order to construct an example to which your argument does not apply.
Furthermore, I am unimpressed that median utility maximizers reject Pascal's mugging. If you take a random function from decision problems to decisions, there is about a 50% chance it will reject Pascal's mugging, but that doesn't make it a good decision theory. And median utility maximizers do not reject Pascal's mugging for correct reasons. To see this, note that if the seatbelt problem is considered in isolation, it looks exactly like the Pascal's mugging problem, in terms of all the information that median utility maximizers pay attention to, so median utility maximizers do analogous actions in each problem (don't bother putting your seatbelt on, and don't pay the mugger, respectively). However, there are important differences between the problems that make it correct to put your seatbelt on but not pay the mugger. Since a median utility maximizer does not consider these differences, its decision not to pay the mugger does not take into account the reasons that it is a good idea not to pay the mugger. It appears to me that you are not even really trying to come up with a way to make the right decisions for the right reasons, and instead you are merely trying to find a way to make the right decisions. I think that this approach is misguided, because the space of possible failure modes for a decision theory is vast, so if you successfully kludge together a decision procedure into performing well on a certain reasonably finite collection of decision problems, without ensuring that it arrives at its decisions in ways that make sense, the chances that it performs well on all decision problems, or even most of them, is vanishingly small.
Since you brought up the iterated Pascal's mugging, perhaps part of your motivation for this was to find something that would not pay in the isolated Pascal's mugging, but pay each time in the iterated Pascal's mugging? First of all, as literally stated, paying each time in the iterated Pascal's mugging isn't even an available option (I don't have $5 billion, so I can't pay off 1 billion muggers), so it is trivially false that the correct action is to pay every time. However, it is true that there are interpretations of what you could mean under which I would agree that paying is the correct action. But in those cases, an expected utility maximizer with a reasonable bounded utility function will pay, even while not paying in the standard Pascal's mugging problem. (The naive model of the situation in which iterating the problem does not change how an expected utility maximizer handles it does not correctly model the interpretation of "iterated Pascal's mugging" in which it makes sense to pay. I'd say what I mean, but actually keeping track of everything relevant to the problem makes it somewhat tedious to explain.)
I don't understand your argument that the median utility maximizer would buckle its seat belt in the real world.
It derives from the fact that median maximalisation doesn't consider decisions independently, even if their gains and losses are independent.
For illustration, compare the following deal: you pay £q, and get £1 with probability p. There are n independent deals (assume your utility is linear in £).
If n=1, the median maximiser accepts the deal iff q0.5. Not a very good performance! Now let's look at larger n. For m < n, accepting m deals gets ...
tl;dr A median maximiser will expect to win. A mean maximiser will win in expectation. As we face repeated problems of similar magnitude, both types take on the advantage of the other. However, the median maximiser will turn down Pascal's muggings, and can say sensible things about distributions without means.
Prompted by some questions from Kaj Sotala, I've been thinking about whether we should use the median rather than the mean when comparing the utility of actions and policies. To justify this, see the next two sections: why the median is like the mean, and why the median is not like the mean.
Why the median is like the mean
The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones.
It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independence doesn't apply.
So you've picked the policy πm with the highest median value - note that to do this, you need only know an ordinal ranking of worlds, not their cardinal values. In what way is this like maximising expected utility? Essentially, the more options and choices you have - or could hypothetically have - the closer this policy must be to expected utility maximalisation.
Assume u is a utility function compatible with your ordinal ranking of the worlds. Then πu = 'maximise the expectation of u' is also a policy choice. If we choose πm, we get a distribution dmu of possible values of u. Then E(u|πm) is within the absolute deviation (using dmu) of the median value of dmu. This absolute deviation always exists for any distribution with an expectation, and is itself bounded by the standard deviation, if it exists.
Thus maximising the median is like maximising the mean, with an error depending on the standard deviation. You can see it as a risk averse utility maximising policy (I know, I know - risk aversion is supposed to go in defining the utility, not in maximising it. Read on!). And as we face more and more choices, the standard deviation will tend to fall relative to the mean, and the median will cluster closer and closer to the mean.
For instance, suppose we consider the choice of whether to buckle our seatbelt or not. Assume we don't want to die in a car accident that a seatbelt could prevent; assume further that the cost of buckling a seatbelt is trivial but real. To simplify, suppose we have an independent 1/Ω chance of death every time we're in a car, and that a seatbelt could prevent this, for some large Ω. Furthermore, we will be in a car a total of ρΩ, for ρ < 0.5. Now, it seems, the median recommends a ridiculous policy: never wear seatbelts. Then you pay no cost ever, and your chance of dying is less than 50%, so this has the top median.
And that is indeed a ridiculous result. But it's only possible because we look at seatbelts in isolation. Every day, we face choices that have small chances of killing us. We could look when crossing the street; smoke or not smoke cigarettes; choose not to walk close to the edge of tall buildings; choose not to provoke co-workers to fights; not run around blindfolded. I'm deliberately including 'stupid things no-one sensible would ever do', because they are choices, even if they are obvious ones. Let's gratuitously assume that all these choices also have a 1/Ω chance of killing you. When you collect together all the possible choices (obvious or not) that you make in your life, this will be ρ'Ω choice, for ρ' likely quite a lot bigger than 1.
Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident). Now median-maximisation will recommend taking safety precautions for roughly (ρ'-0.5)Ω of these choices. This means that the decision of a median maximiser will be close to those of a utility maximiser - they take almost the same precautions - though the outcomes are still pretty far apart: the median maximiser accepts a 49.99999...% chance of death.
But now add serious injury to the mix (still assume the costs are incommensurable). This has a rather larger probability, and the median maximiser will now only accept a 49.99999...% chance of serious injury. Or add light injury - now they only accept a 49.99999...% chance of light injury. If light injuries are additive - two injuries are worse than one - then the median maximiser becomes even more reluctant to take risks. We can now relax the assumption of incommensurablility as well; the set of policies and assessments becomes even more complicated, and the median maximiser moves closer to the mean maximiser.
The same phenomena tends to happen when we add lotteries of decisions, chained decisions (decisions that depend on other decisions), and so on. Existential risks are interesting examples: from the selfish point of view, existential risks are just other things that can kills us - and not the most unlikely ones, either. So the median maximiser will be willing to pay a trivial cost to avoid an xrisk. Will a large group of median maximisers be willing to collectively pay a large cost to avoid an xrisk? That gets into superrationality, which I haven't considered yet in this context.
But let's turn back to the mystical utility function that we are trying to maximise. It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so. Since we should, people on this list tend to often assume that we actually have one, skipping over the process of constructing it. But how would that process go? Let's assume we've managed to make our preferences transitive, already a major good achievement. How should we go about making them independent as well? We can do so as we go along. But if we do it ahead of time, chances are that we will be comparing hypothetical situations ("Do I like chocolate twice as much as sex? What would I think of a 50% chance of chocolate vs guaranteed sex? Well, it depends on the situation...") and thus construct a utility function. This is where we have to make decisions about very obscure and unintuitive hypothetical tradeoffs, and find a way to fold all our risk aversion/risk love into the utility.
When median maximising, we do exactly the same thing, except we constrain ourselves to choices that are actually likely to happen to us. We don't need a full ranking of all possible lotteries and choices; we just need enough to decide in the situations we are likely to face. You could consider this a form of moral learning (or preference learning). From our choices in different situations (real or possible), we decide what our preferences are in these situations, and this determines our preferences overall.
Why the median is not like the mean
Ok, so the previous paragraph argues that median maximising, if you have enough choices, functions like a clunky version of expected utility maximising. So what's the point?
The point is those situations that are not faced sufficiently often, or that have extreme characteristics. A median maximiser will reject Pascal's mugging, for instance, without any need for extra machinery (though they will accept Pascal's muggings if they face enough independent muggings, which is what we want - for stupidly large values of "enough"). They cope fine with distributions that have no means - such as the Cauchy distribution or a utility version of the St Petersburg paradox. They don't fall into paradox when facing choices with infinite (but ordered) rewards.
In a sense, median maximalisation is like expected utility maximalisation for common choices, but is different for exceptionally unlikely or high impact choices. Or, from the opposite perspective, expected utility maximising gives high probability of good outcomes for common choices, but not for exceptionally unlikely or high impact choices.
Another feature of the general idea (which might be seen as either a plus or a minus) is that it can get around some issues with total utilitarianism and similar ethical systems (such as the repugnant conclusion). What do I mean by this? Well, because the idea is that only choices that we actually expect to make matter, we can say, for instance, that we'd prefer a small ultra happy population to a huge barely-happy one. And if this is the only choice we make, we need not fear any paradoxes: we might get hypothetical paradoxes, just not actual ones. I won't put too much insistence on this point, I just thought it was an interesting observation.
For lack of a Cardinal...
Now, the main issue is that we might feel that there are certain rare choices that are just really bad or really good. And we might come to this conclusion by rational reasoning, rather than by experience, so this will not show up in the median. In these cases, it feels like we might want to force some kind of artificial cardinal order on the worlds, to make the median maximiser realise that certain rare events must be considered beyond their simple ordinal ranking.
In this case, maybe we could artificially add some hypothetical choices to our system, making us address these questions more than we actually would, and thus drawing them closer to the mean maximising situation. But there may be other, better ways of doing this.
Anyway, that's my first pass at constructing a median maximising system. Comments and critics welcome!
EDIT: We can use the absolute deviation (technically, the mean absolute deviation around the mean) to bound the distance between median and mean. This itself is bounded by the standard deviation, if it exists.