AlexMennen comments on Median utility rather than mean? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (86)
I don't understand your argument that the median utility maximizer would buckle its seat belt in the real world. It seemed kind of like you might be trying to argue that median utility maximizers and expected utility maximizers would always approximate each other under realistic conditions, but since you then argue that the alleged difference in their behavior on the Pascal's mugging problem is a reason to prefer median utility maximizers (implying that Pascal's mugging-type problems should be accepted as realistic, or at least that getting them correct is important in a way that getting "buckle my seatbelt, given that this is the only decision I will ever make" right isn't), so I guess that's not it.
But anyway, even if you are right that median utility maximizers buckle their seatbelts in the context of a realistic collections of choices, you concede that they do not buckle their seatbelts when the decision is isolated, and that this is the incorrect decision. I think you should take the fact that your proposal gets a really easy problem wrong much more seriously. If it can't get the seatbelt problem right, it is a bad algorithm, and bad algorithms should not be expected to perform well in real-world problems. I would give an example of a real-world problem that it performs poorly on, but I would have said something like the seatbelt problem, and since I don't understand your argument that it gets that right in the real world, I don't know what must be done in order to construct an example to which your argument does not apply.
Furthermore, I am unimpressed that median utility maximizers reject Pascal's mugging. If you take a random function from decision problems to decisions, there is about a 50% chance it will reject Pascal's mugging, but that doesn't make it a good decision theory. And median utility maximizers do not reject Pascal's mugging for correct reasons. To see this, note that if the seatbelt problem is considered in isolation, it looks exactly like the Pascal's mugging problem, in terms of all the information that median utility maximizers pay attention to, so median utility maximizers do analogous actions in each problem (don't bother putting your seatbelt on, and don't pay the mugger, respectively). However, there are important differences between the problems that make it correct to put your seatbelt on but not pay the mugger. Since a median utility maximizer does not consider these differences, its decision not to pay the mugger does not take into account the reasons that it is a good idea not to pay the mugger. It appears to me that you are not even really trying to come up with a way to make the right decisions for the right reasons, and instead you are merely trying to find a way to make the right decisions. I think that this approach is misguided, because the space of possible failure modes for a decision theory is vast, so if you successfully kludge together a decision procedure into performing well on a certain reasonably finite collection of decision problems, without ensuring that it arrives at its decisions in ways that make sense, the chances that it performs well on all decision problems, or even most of them, is vanishingly small.
Since you brought up the iterated Pascal's mugging, perhaps part of your motivation for this was to find something that would not pay in the isolated Pascal's mugging, but pay each time in the iterated Pascal's mugging? First of all, as literally stated, paying each time in the iterated Pascal's mugging isn't even an available option (I don't have $5 billion, so I can't pay off 1 billion muggers), so it is trivially false that the correct action is to pay every time. However, it is true that there are interpretations of what you could mean under which I would agree that paying is the correct action. But in those cases, an expected utility maximizer with a reasonable bounded utility function will pay, even while not paying in the standard Pascal's mugging problem. (The naive model of the situation in which iterating the problem does not change how an expected utility maximizer handles it does not correctly model the interpretation of "iterated Pascal's mugging" in which it makes sense to pay. I'd say what I mean, but actually keeping track of everything relevant to the problem makes it somewhat tedious to explain.)
It derives from the fact that median maximalisation doesn't consider decisions independently, even if their gains and losses are independent.
For illustration, compare the following deal: you pay £q, and get £1 with probability p. There are n independent deals (assume your utility is linear in £).
If n=1, the median maximiser accepts the deal iff q<1 and p>0.5. Not a very good performance! Now let's look at larger n. For m < n, accepting m deals gets you an expected reward of m(p-q). The median is a bit more complicated (see https://en.wikipedia.org/wiki/Binomial_distribution#Mode_and_median ), but it's within £1 of the mean reward.
So if p<q, the mean maximiser will reject all deals, and if p>q, it will accept all n deals.
For p<q, the median maximiser will accept at most 1/(q-p) deals. And for p>q, it will accept at least n - 1/(p-q) deals. In all cases, its expected loss, compared with the mean maximiser, is less than £1.
There's a similar effect going on when considering the seat-belt situation. Aggregation concentrates the distribution in a way that moved median and mean towards each other.
You appear to now be making an argument that you already conceded was incorrect in OP:
You then go on to say that if the agent also faces many decisions of a different nature, it won't do that. That's where I get lost.
The median maximiser accepts a 49.99999...% chance of death, only because "death", "trivial cost" and "no cost" are the only options here. If I add "severe injury" and "light injury" to the outcomes, the maximiser will now accept less than a 49.9999...% chance of light injury. If we make light injury additive, and make the trivial cost also additive and not incomparable to light injuries, we get something closer to my illustrative example above.
Suppose it comes up with 2 possible policies, one of which involves a 49% chance of death and no chance of injury, and another which involves a 49% chance of light injury, and no chance of heavy injury or death. The median maximizer sees no reason to prefer the second policy if they have the same effects the other 51% of the time.
Er, yes, constructing single choice examples when the median behaves oddly/wrongly is trivial. My whole point is about what happens to median when you aggregate decisions.
You were claiming that in a situation where a median-maximizing agent has a large number of trivially inconvenient action that prevent small risks of death, heavy injury, or light injury, then it would accept a 49% chance of light injury, but you seemed to imply that it would not accept a 49% chance of death. I was trying to point out that this appears to be incorrect.
I'm not entirely sure what your objection is; we seem to be talking at cross purposes.
Let's try it simpler. If we assume that the cost of buckling seat belts is incommensurable (in practice) with light injury (and heavy injury, and death), then the median maximising agent will accept a 49.99..% chance of (light injury or heavy injury or death), over their lifetime. Since light injury is much more likely than death, this in effect forces the probability of death down to a very low amount.
It's just an illustration of the general point that median maximising seems to perform much better in real-world problems than its failure in simple theoretical ones would suggest.
No, it doesn't. That does not address the fact that the agent will not preferentially accept light injury over death. Adopting a policy of immediately committing suicide once you've been injured enough to force you into the bottom half of outcomes does not decrease median utility. The agent has no incentive to prevent further damage once it is in the bottom half of outcomes. As a less extreme example, the value of house insurance to a median maximizer is 0, just because loosing your house is a bad outcome even if you get insurance money for it. This isn't a weird hypothetical that relies on it being an isolated decision; it's a real-life decision that a median maximizer would get wrong.
A more general way of stating how multiple decisions improve median maximalisation: the median maximaliser is indifferent of outcomes not at the median (eg suicide vs light injury). But as the decision tree grows and the number of possible situations does as well, the probability increases that outcomes not at the median in a one shot, will affect the median in the more complex situation.
Look, we're arguing past each other here. My logical response here would be to add more options to the system, which would remove the problem you identified (and I don't understand your house insurance example - this is just the seat-belt decision again as a one-shot, and I would address it by looking at all the financial decisions you make in your life - and if that's not enough, all the decisions, including all the "don't do something clearly stupid and pointless" ones).
What I think is clear is:
a) Median maximalisation makes bad decisions in isolated problems.
b) If we combine all the likely decisions that a median maximiser will have to make, the quality of the decisions increase.
If you want to argue against it, either say that a) is bad enough we should reject the approach anyway, even if it decides well in practice, or find examples where a real world median maximaliser will make bad decisions even in the real world (if you would pay Pascal's mugger, then you could use that as an example).
How do you know that it's right to buckle your seatbelt? If you are only going to ride in a car once, never again. And there are no other risks to your life, and so no need to make a general policy against taking small risks?
I'm not confident that it's actually the wrong choice. And if it is, it shouldn't matter much. 99.99% of the time, the median will come out with higher utility than the EU maximizer.
This is generalizable. If there was a "utility competition" between different decision policies in the same situations, the median utility would usually come out on top. As the possible outcomes become more extreme and unlikely, expected utility will do worse and worse. With pascal's mugging at the extreme.
That's because EU trades away utility from the majority of possible outcomes, to really really unlikely outcomes. Outliers can really skew the mean of a distribution, and EU is just the mean.
Of course median can be exploited too. Perhaps there is some compromise between them that gets the behavior we want. There are an infinite number of possible policies for deciding which distribution of utilities to prefer.
EU was chosen because it is the only one that meets a certain set of conditions and is perfectly consistent. But if you allow for algorithms that select overall policies instead of decisions, like OP does, then you can make many different algorithms consistent.
So there is no inherent reason to prefer mean over median. It just comes down to personal preference, and subjective values. What probability distribution of utilities do you prefer?
I do think that the isolation of the decision is a red herring, but for the sake of the point I was trying to make, it is probably easier to replace the example with a structurally similar one in which the right answer is obvious: suppose you have the opportunity to press a button that will kill you will 49% probability, and give you $5 otherwise. This is the only decision you will ever make. Should you press the button?
As I was saying in my previous comment, I think that's the wrong approach. It isn't enough to kludge together a decision procedure that does what you want on the problems you thought of, because then it will do something you don't want on something you haven't thought of. You need a decision procedure that will reliably do the right thing, and in order to get that, you need it to do the right thing for the right reasons. EU maximization, applied properly, will tell you to do the correct things, and will do so for the correct reasons.
Actually, there is: https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem
Yes I said that median utility is not optimal. I'm proposing that there might be policies better than both EU or median.
Please reread the OP and my comment. If you allow selection over policies instead of individual decisions, you can be perfectly consistent. EU and median are both special cases of ways to pick policies, based on the probability distribution of utility they produce.
There is no law of the universe that some procedures are correct and others aren't. You just have to pick one that you like, and your choice is going to be arbitrary.
If you go with EU you are pascal muggable. If you go with median you are muggable in certain cases as well (though you should usually, with >50% probability, end up with better outcomes in the long run. Whereas EU could possibly fail 100% of the time. So it's exploitable, but it's less exploitable at least.)
I don't see how selecting policies instead of actions removes the motivation for independence.
Ultimately, it isn't the policy that you care about; it's the outcome. So you should pick a policy because you like the probability distributions over outcomes that you get from implementing it more than you like the probability distributions over outcomes that you would get from implementing other policies. Since there are many decision problems to use your policy on, this quite heavily constrains what policy you choose. In order to get a policy that reliably picks the actions that you decide are correct in the situations where you can tell what the correct action is, it will have to make those decisions for the same reason you decided that it was the best action (or at least something equivalent to or approximating the same reason). So no, the choice of policy is not at all arbitrary.
That is not true. EU maximizers with bounded utility functions reject Pascal's wager.
There are two reasons to like independence. First of all, you might like it for philosophical/aesthetic reasons: "these things really should be independent, these really should be irrelevant". Or you could like it because it prevents you from being money pumped.
When considering policies, money pumping is (almost) no longer an issue, because a policy that allows itself to be money-pumped is (almost) certainly inferior to one that doesn't. So choosing policies removes one of the motivations for independence, to my mind the important one.
While it's true that this does not tell you to pay each time to switch the outcomes around in a circle over and over again, it still falls prey to one step of a similar problem. Suppose their are 3 possible outcomes: A, B, and C, and there are 2 possible scenarios: X and Y. In scenario X, you get to choose between A and B. In scenario Y, you can attempt to choose between A and B, and you get what you picked with 50% probability, and you get outcome C otherwise. In each scenario, this is the only decision you will ever make. Suppose in scenario X, you prefer A over B, but in scenario Y, you prefer (B+C)/2 over (A+C)/2. But suppose you had to pay to pick A in scenario X, and you had to pay to pick (B+C)/2 in scenario Y, and you still make those choices. If Y is twice as likely as X a priori, then you are paying to get a probability distribution over outcomes that you could have gotten for free by picking B given X, and (A+C)/2 given Y. Since each scenario only involves you ever getting to make one decision, picking a policy is equivalent to picking a decision.
Your example is difficult to follow, but I think you are missing the point. If there is only one decision, then it's actions can't be inconsistent. By choosing a policy only once - one that maximizes it's desired probability distribution of utility outcomes - it's not money pumpable, and it's not inconsistent.
Now by itself it still sucks because we probably don't want to maximize for the best median future. But it opens up the door to more general policies for making decisions. You no longer have to use expected utility if you want to be consistent. You can choose a tradeoff between expected utility and median utility (see my top level comment), or a different algorithm entirely.
If there is only one decision point in each possible world, then it is impossible to demonstrate inconsistency within a world, but you can still be inconsistent between different possible worlds.
Edit: as V_V pointed out, the VNM framework was designed to handle isolated decisions. So if you think that considering an isolated decision rather than multiple decisions removes the motivation for the independence axiom, then you have misunderstood the motivation for the independence axiom.
I understand the two motivations for the independence axiom, and the practical one ("you can't be money pumped") is much more important that the theoretical one ("your system obeys this here philosophically neat understanding of irrelevant information").
But this is kind of a moot point, because humans don't have utility functions. And therefore we will have to construct them. And the process of constructing them is almost certainly going to depend on facts about the world, making the construction process almost certainly inconsistent between different possible worlds.
It can't be inconsistent within a world no matter how many decisions points there are. If we agree it's not inconsistent, then what are you arguing against?
I don't care about the VNM framework. As you said, it is designed to be optimal for decisions made in isolation. Because we don't need to make decisions in isolation, we don't need to be constrained by it.