Kindly comments on Open thread, Mar. 2 - Mar. 8, 2015 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (155)
I don't agree. Utility is a separate concept from expected value maximization. Utility is a way of ordering and comparing different outcomes based on how desirable they are. You can say that one outcome is more desirable than another, or even quantify how many times more desirable it is. This is a useful and general concept.
Expected utility does have some nice properties being completely consistent. However I argued above that this isn't a necessary property. It adds complexity, sure, but if you self modify your decision making algorithm or predetermine your actions, you can force your future self to be consistent with your present self's desires.
Expected utility is perfectly rational as the number of "bets" you take goes to infinity. Rewards will cancel out the losses in the limit, and so any agent would choose to follow EU regardless of their decision making algorithm. But as the number of bets becomes finite, it's less obvious that this is the most desirable strategy.
Pascal's Mugging isn't "weird", it's perfectly typical. There are probably an infinite number of pascal's mugging type situations. Hypotheses with exceedingly low probability but high utility.
If we built an AI today, based on pure expected utility, it would most likely fail spectacularly. These low probability hypotheses would come to totally dominate it's decisions. Perhaps it would start to worship various gods and practice rituals and obeying superstitions. Or something far more absurd we haven't even thought of.
And if you really believe in EU, you can't say that this behavior is wrong or undesirable. This is what you should be doing, if you could, and you are losing a huge amount of EU by not doing it. You should want more than anything in existence, the ability to exactly calculate these hypotheses so you can collect that EU.
I don't want that though. I want a decision rule such that I am very likely to end up in a good outcome. Not one where I will mostly likely end up in a very suboptimal outcome, with an infinitesimal probability of winning the infinite utility lottery.
That's not the way in which maximizing expected utility is perfectly rational.
The way it's perfectly rational is this. Suppose you have any decision making algorithm; if you like, it can have an internal variable called "utility" that lets it order and compare different outcomes based on how desirable they are. Then either:
the algorithm has some ugly behavior with respect to a finite collection of bets (for instance, there are three bets A, B, and C such that it prefers A to B, B to C, and C to A), or
the algorithm is equivalent to one which maximizes the expected value of some utility function: maybe the one that your internal variable was measuring, maybe not.
The first condition is not true, since it gives a consistent value to any probability distribution of utilities. The second condition is not true other since the median function is not merely a transform of the mean function.
I'm not sure what the "ugly" behavior you describe is, and I bet it rests on some assumption that's too strong. I already mentioned how inconsistent behavior can be fixed by allowing it to predetermine it's actions.
You can find the Von Neumann--Morgenstern axioms for yourself. It's hard to say whether or not they're too strong.
The problem with "allowing [the median algorithm] to predetermine its actions" is that in this case, I no longer know what the algorithm outputs in any given case. Maybe we can resolve this by considering a case when the median algorithm fails, and you can explain what your modification does to fix it. Here's an example.
Suppose I roll a single die.
Bet A loses you $5 on a roll of 1 or 2, but wins you $1 on a roll of 3, 4, 5, or 6.
Bet B loses you $5 on a roll of 5 or 6, but wins you $1 on a roll of 1, 2, 3, or 4.
Bet A has median utility of U($1), as does bet B. However, combined they have a median utility of U(-$4).
So the straightforward median algorithm pays money to buy Bet A, pays money to buy Bet B, but will then pay money to be rid of their combination.
I think I've found the core of our disagreement. I want an algorithm that considers all possible paths through time. It decides on a set of actions, not just for the current time step, but for all possible future time steps. It chooses such that the final probability distribution of possible outcomes, at some point in the future, is optimal according to some metric. I originally thought of median, but it can work with any arbitrary metric.
This is a generalization of expected utility. The VNM axioms require an algorithm to make decisions independently and Just In Time. Whereas this method lets it consider all possible outcomes. It may be less elegant than EU, but I think it's closer to what humans actually want.
Anyway your example is wrong, even without predetermined actions. The algorithm would buy bet A, but then not buy bet B. This is because it doesn't consider bets in isolation like EU, but considers it's entire probability distribution of possible outcomes. Buying bet B would decrease it's expected median utility, so it wouldn't take it.
No, they don't.
Assuming the bet has a fixed utility, then EU gives it a fixed estimate right away. Whereas my method considers it along with all other bets that it's made or expects to make, and it's estimate can change over time. I should have said that it's not independent or fixed, but that is what I meant.
In the VNM scheme where expected utility is derived at a consequence of the axioms, the way that a bet's utility changes over time is that its utility is not fixed. Nothing at all stops you from changing the utility you attach to a 50:50 gamble of getting a kitten versus $5 if your utility for a kitten (or for $5) changes: for example, if you get another kitten or win the lottery.
Generalizing to allow the value of the bet to change when the value of the options did not change seems strange to me.
I am lost, this is just EU in a longitudinal setting? You can average over lots of stuff. Maximizing EU is boring, it's specifying the right distribution that's tricky.
It's not EU, since it can implement arbitrary algorithms to specify the desired probability distribution of outcomes. Averaging utility is only one possibility, another I mentioned was median utility.
So you would take the median utility of all the possible outcomes. And then select the action (or series of actions in this case) that leads to the highest median utility.
No method of specifying utilities would let EU do the same thing, but you can trivially implement EU in it, so it's strictly more general than EU.
So, I think you might be interested in UDT. (I'm not sure what the current best reference for that is.) I think that this requires actual omniscience, and so is not a good place to look for decision algorithms.
(Though I should add that typically utilities are defined over world-histories, and so any decision algorithm typically identifies classes of 'equivalent' actions, i.e. acknowledges that this is a thing that needs to be accepted somehow.)
UDT is overkill. The idea that all future choices can be collapsed into a single choice appears in the work of von Neumann and Morgenstern, but is probably much older.
Oh, I see. I didn't take that problem into account, because it doesn't matter for expected utility, which is additive. But you're right that considering the entire probability distribution is the right thing to do, and under than assumption we're forced to be transitive.
The actual VNM axiom violated by median utility is independence: If you prefer X to Y, then a gamble of X vs Z is preferable to the equivalent gamble of Y vs Z. Consider the following two comparisons:
Taking bet A, as above, versus the status quo.
A 2/3 chance of taking bet A and a 1/3 chance of losing $5, versus a 2/3 chance of the status quo and a 1/3 chance of losing $5.
In the first case, bet A has median utility U($1) and the status quo has U($0), so you pick bet A. In the second case, a gamble with a possibility of bet A has median utility U(-$5) and a gamble with a possibility of the status quo still has U($0), so you pick the second gamble.
Of course, independence is probably the shakiest of the VNM axioms, and it wouldn't surprise me if you're unconvinced by it.