The idea is to compare not the results of actions, but the results of decision algorithms. The question that the agent should ask itself is thus:
"Suppose everyone1 who runs the same thinking procedure like me uses decision algorithm X. What utility would I get at the 50th percentile (not: what expected utility should I get), after my life is finished?"
Then, he should of course look for the X that maximizes this value.
Now, if you formulate a turing-complete "decision algorithm", this heads into an infinite loop. But suppose that "decision algorithm" is defined as a huge table for lots of different possible situations, and the appropriate outputs.
Let's see what results such a thing should give:
- If the agent has the possibility to play a gamble, and the probabilities involved are not small, and he expects to be allowed to play many gambles like this in the future, he should decide exactly as if he was maximizing expected utility: If he has made many decisions like this, he will get a positive utility difference in the 50th percentile if and only if his expected utility from playing the gamble is positive.
- However, if Pascal's mugger comes along, he will decline: The complete probability of living in a universe where people like this mugger ought to be taken seriously is small. In the probability distribution over expected utility at the end of the agent's lifetime, the possibility of getting tortured will manifest itself only very slightly at the 50th percentile - much less than the possibility of losing 5 Dollars.
The reason why humans will intuitively decline to give money to the mugger might be similar: They imagine not the expected utility with both decisions, but the typical outcome of giving the mugger some money, versus declining to.
1I say this to make agents of the same type cooperate in prisoner-like dilemmas.
Yes, if we have large populations of "all-in bettors" and Kelly bettors, then as the number of bets increase the all-in bettors lead in total wealth increases exponentially, while the probability of an all-in bettor being ahead of a Kelly bettor falls exponentially. And as you go to infinity the wealth multiplier of the all-in bettors goes to infinity, while the probability of an all-in bettor leading a Kelly bettor goes to zero. And that was the originally cited reasoning.
Now, one might be confused by the "beats any other constant bankroll allocation (but see the bottom paragraph) with probability 1" and think that it implies "bettors with this strategy will make more money on average than those using other strategies," as it would in a finite case if every bettor using one strategy did better than any bettor using any other strategy.
But absent that confusion, why favor probability of being ahead over wealth unless one has an appropriate utility function? One route is log utility (for which Kelly is optimal), and I argued against it as psychologically unrealistic, but I agree there are others. Bounded utility functions would also prefer the Kelly outcome to the all-in outcome in the infinite limit, and are more plausible than log utility.
Also, consider strategies that don't allocate a constant proportion in every bet, e.g. first do an all-in bet, then switch to Kelly. If the first bet has a 60% chance of tripling wealth and a 40% chance of losing everything, then the average, total, and median wealth of these mixed-strategy bettors will beat the Kelly bettors for any number of bets in a big population. These don't necessarily come to mind when people hear loose descriptions of Kelly.
Sure, I don't see anything here to disagree with.