Expected utility and repeated choices

Marco Discendenti

Maybe this is a well known kind of problem but I am a novice and it looks puzzling to me.

Here is a lottery: I have these two choices:

(a) get 0.5$ for sure
(b) win 1$ with probability $\frac{2}{3}$ or nothing with probability $\frac{1}{3}$

My utility function is $U (x) = \sqrt{x}$ .

What should I choose?

Let's compute the expected utilities:

expected utility for one single game is for (b) $\frac{2}{3} \cdot \sqrt{1} = \frac{2}{3} \approx 0.67$ while for (a) is $\sqrt{0.5} \approx 0.7$ so I have maximized expected utility with choice (a)
if I compute expected utility for two games I get a different prescription:

utility for chosing (a) two times is $U (1) = \sqrt{1} = 1$
the expected utility for chosing (b) two times is

P (2 wins) U (2) + P (1 win) U (1) = {(\frac{2}{3})}^{2} \cdot \sqrt{2} + 2 \cdot \frac{2}{3} \cdot \frac{1}{3} \cdot \sqrt{1}

This last computation is equal to $\frac{4}{9} (\sqrt{2} + 1) \approx 1.07$ which is greater than the utility of double (a) (i.e. 1) so in order to maximize expected utility I should actually prefer to play (b) two times rather than playing (a) two times.

So we have this apparent inconsistency:

for one single game it's better to choose (a)
for two games it's better to choose (b) both times

This result is puzzling to me because I would expect that utility maximization for one single game should be enough in order to take the decision regardless of what I am allowed to do in future choices. It seems instead that the mere possibility that I could play this same lottery another time changes the convenience of the choices about what to play in the first game. If this is the case then utility theory seems almost useless: I would be forced to put in my computation the whole list of my possible future choices!

Am I missing something or is this an actual problem?

The intuitive result you would expect only holds for utility function which are linear in x (I believe..), since we could then apply the utility function at each step and it would yield the same value as if applied to the whole amount.

Another case would be if you were to receive your utility immediately after playing each game (like in a reinforcement learning algorithm). In those cases $U$ is also applied to each outcome separately and would yield the result you would expect.

Also: (b) has a better EV in terms of raw $ and due to law of large numbers we would expect the actual amount of money won by repeatedly playing (b) to approach that EV. So for many games we should expect any monotonic increasing utility function to favor (b) over (a) as the number of games approaches infinity. The only reason your U favors (a) over (b) for a single game is that it is risk-averse, i.e. sub-linear in x. As the amount of games approaches infinity the risk of choosing to play b becomes less and less until it is the choice between (essentially) winning 0.5$ for sure or 0.67$ for sure in every game. If you think about it in these terms it becomes more intuitive why the behaviour observed by you is reasonable.

In other words: Yes! You do have to think about the amount of games you play if your utility function is not linear (or you have a strong discount factor).

Thank you for your insights! You say: " Yes! You do have to think about the amount of games you play if your utility function is not linear"

Let's consider the case of rational agents acting in a temporal framework where they are faced with daily decisions. If they need to consider all their future possible choices in order to decide for a single present choice then it seems they are always completely unable to make any single decision (the computation to be made seems almost never ending) and this principle of expected utility maximization would turn out to be useless. How do we make rational decisions then?

1Tom Lieberum6y

Well, if you assume these agents do not employ time-discounting then you indeed cannot compare trajectories, since all of them might have infinite utility (and are computationally intractable as you say) if they don't terminate. We do run into the same problem if we assume realistic action spaces, i.e. consider all the things we could possibly do, as there are too many even for a single time step. RL algorithms "solve" this by working with constrained action spaces and discounting future utility.. and also by often having terminating trajectories. Humans also work on (highly) constrained action spaces and have strong time discounting [citation needed], and every model of a rational human should take that into account. I admit those points are more like hacks we've come up with for practical situations, but I suppose the computational intractability is a reason why we can't already have all the nice things ;-)

To maximize utility when you can play any N number of games, I believe you just need to calculate the EV (not EU) through playing every possible strategy. Then, you pass all those values through your U function and go with the strategy associated with the highest utility.

There's an unstated assumption here that you start with $0. Suppose instead you start with $0.5: then while $E (U (b)) = \frac{2}{3} \sqrt{1.5} + \frac{1}{3} \sqrt{0.5} \approx 1.05$ . So if you play game (a) first, you'd then prefer to play game (b) second.

But this doesn't fully resolve the question, because you'd still prefer (b, b) over (a, b).

LESSWRONG
LW

LESSWRONG
LW

9

[ Question ]

Expected utility and repeated choices

9

9

2 Answers sorted by
top scoring

Dec 27, 2019

Dec 29, 2019*

9

9

[ Question ]

Expected utility and repeated choices

9

9

2 Answers sorted by top scoring

Dec 27, 2019

Dec 29, 2019*

9

2 Answers sorted by
top scoring