Comment Permalink

Right, this seems really good! (It deserves more upvotes than it has; it's suitably mind blowing ;p)

There are some arbitrary choices (this is a generalization of expectation maximization, not an argument from expectation maximization, so it's not surprising that it's not a unique solution), but the only really arbitrary seeming port is the choice about how to order the limits. And as discussed in the other comment thread here, Eigil's comment about lim(EV) vs EV(lim) makes your choice of ordering seem like the more appropriate one -- your version matches up with the fact that EV of the all-in strategy for the infinite game is zero, while allowing us to evaluate strategies for the cases where EV of the infinite game is not well-defined.

The alternate intuition is that, since EV of infinite games is problematic, we should just compare the EV of strategies on very large numbers of iterations. This is basically your alternate limit ordering, and Eigil's lim(EV) as opposed to EV(lim). And "the boring option".

I think the boring option has some a priori superiority, but loses on net provided you're right about the version of the game which has a small chance of ending at each round. I think it's analogous to the following argument about Prisoner's Dilemma. The argument is between a Strawman Economist and a Steelman Douglas Hofstadter.

ECONOMIST: The normatively correct thing to do in Prisoner's Dilemma (PD) is to defect.

DOUGLAS: But in iterated PD, players can use the tit-for-tat strategy. If they do, it's rational for both of them to cooperate, and for both of them to continue using tit-for-tat. And most real PDs can be considered as iterated.

E: Ahh, true, but no game is really infinitely iterated. We know it stops at some point. At that point, there's no remaining incentive to cooperate. So both players should defect. Knowing this, players should actually think of the tit-for-tat chain as stopping one step earlier than this. But then the second-to-last move also becomes defect. And so on. The tit-for-tat strategy unravels all the way back to the beginning, and we're back at total defection.

D: Ahh, true, but in practice we're uncertain about when the game will end! Depending on our uncertainty, this can rescue tit-for-tat. So what we really get is a specific crossover point. If we're sufficiently certain about when the game will end, you are correct. If we're sufficiently uncertain, then I'll be correct instead.

E: Damn, you're right!

Similarly with straw economist and steel Kelly:

E: The rational way to evaluate bets is by taking the expectation. If a bet is worthwhile at all, it's worth going all-in, if the other side will accept that large of a bet.

K: Wait, look! In an infinitely iterated game, Bunthut's generalization of expectation maximization says to use my Kelly Criterion. And betting really is an iterated game. You shouldn't consider each bet in isolation.

E: Why are the limits ordered that way?

K: As Eigil commented elsewhere, lim(EV) doesn't equal EV(lim). And in fact the EV of the all-in strategy in the infinitely iterated case is zero. So this ordering of limits is the one that generalizes EV. The other ordering prefers the all-in strategy, even for the infinite game, so it can't be a valid generalization of EV.

E: OK true, but consider this: I'm only going to make a finite number of bets in my life. Maybe I play the stocks for several decades, but then I retire; the size of my nest egg at retirement is what I care about. Your formula agrees with EV maximization in finite cases, so it must agree that I should use the all-in strategy here.

K: Suuure, but consider this: you don't generally know when you'll make your last bet. You probably won't stop playing the stocks when you retire, and few anticipate the exact day they die. If we incorporate that uncertainty, we get behavior resembling EV maximization when we're sufficiently certain of the game's end, but we get behavior resembling Kelly when we're sufficiently uncertain.

E: Damn, you're right!

So my takeaway is: your argument about the 1%-probability-of-ending case is a crux for me. It makes the difference between this being a clever but rarely-applicable analysis of an infinite game, vs a frequently-applicable analysis of games with uncertain end. I'd really like to see how that works out.

I'm also curious whether this can be applied to other problems, like the St. Petersburg Lottery.

Reply

Showing 3 of 5 replies (Click to show all)

1Bunthut4y

Defecting one round earlier dominates pure tit-for-tat, but defecting five rounds earlier doesn't dominate pure tit-for-tat. Pure tit-for-tat is better against pure tit-for-tat. So there might be a nash equilibrium containing only strategies that play tit-for-tat until the last few rounds. I looked at his paper on the petersburg paradox and I think he gets the correct result for the iterated game. He doesn't do fractional betting, but he has a variable for players wealth - implicitly, price/wealth is a betting fraction (and since payoffs are fixed, price is implicitly offered odds). Also, and this is quite confusing, even though in the beginning it sounds like he wants to repeat the game with the price fixed but wealth changing over time, his actual calculation assumes the wealth (or distribution over growth rates) is the same each time. He talks about this at the bottom of page 11 and argues that its fine because of commutativity. I'm not sure if that commutativity argument works out, but it means the part before is effectively calculating the growth rate of a betting fraction. And if theres no death in the game, then the highest growth rate does indeed optimize my criterion. Conceptually though there are differences: Peters totally rejects ensemble averaging. This works in infinite games with no chance of death, because then one player will with certainty experience events at frequencies reflecting the true odds - so it works in ordinary kelly, and it works in this petersburg-bet-in-a-kelly, but it wouldn't work on the versions with ending chance. (Also what I said about buying multiples in the last comment was confused - that would be different from one bigger bet.) Probably not, no. And provably not for the triple payoff version, so it wouldnt avoid the paradox anyway.

3abramdemski4y

Defecting in the last x rounds is dominated by defecting in the last x+1, so there is no pure-strategy equilibrium which involves cooperating in any rounds. But perhaps you mean there could be a mixed strategy equilibrium which involves switching to defection some time near the end, with some randomization. Clearly such a strategy must involve defecting in the final round, since there is no incentive to cooperate. But then, similarly, it must involve defecting on the second-to-last round, etc. So it should not have any probability of cooperating -- at least, not in the game-states which have positive probability. Right? I think my argument is pretty clear if we assume subgame-perfect equilibria (and so can apply backwards induction). Otherwise, it's a bit fuzzy, but it still seems to me like the equilibrium can't have a positive probability of cooperating on any turn, even if players would hypothetically play tit-for-tat according to their strategies. (For example, one equilibrium is for players to play tit-for-tat, but with both players' first moves being to defect.)

Bunthut4y10

Yeah you're right. I just realized that what I had in mind originally already implicitly had superationality.

Reply

See in context

24 A non-logarithmic argument for Kelly

by Bunthut

4th Mar 2021

AI Alignment Forum

2 min read

10

24 Ω 6

This post is a response to abramdemski's post, Kelly *is* (just) about logarithmic utility.

any argument in favor of the Kelly formula has to go through an implication that your utility is logarithmic in money, at some point. If it seems not to, it's either:
mistaken
cleverly hiding the implication
some mind-blowing argument I haven't seen before.

Challenge accepted. This is essentially a version of time-averageing which gets rid of the infinity-problem.

Consider the Kelly-betting game: Each round, you can bet any fraction of your wealth on a fair coinflip, which will be tripled if you win. You play this game for an infinite number of rounds. Your utility is linear in money.

The first thing to note is that this game does not have expected utility maximization recommend betting everything each round. This is true for any finite version of the game, but this version has various infinite payoffs, or no well-defined payoffs at all, since it doesn't end. We will get around this by, instead of computing expectations for strategies and comparing them based on expectation size, comparing them directly.

First, consider the formal specification of expected utility maximisation: for S the set of strategies. Or written slightly unconventionally: $s_{m a x} = a r g m a x_{s \in S} ({lim}_{n \to \infty} 1 / n n \sum i = 0 (U (g a m e (s; r (i)))))$ with r as a source of randomness. This spells out the expected value as the average of a sample of size going to infinity. We can turn this into a comparison between strategies:

s_{1} \geq s_{2} ⟺ lim n \to \infty [1 / n n \sum i = 0 (U (g a m e (s_{1}; r (i))))] \geq lim m \to \infty [1 / m m \sum j = 0 (U (g a m e (s_{2}; r (j))))]

with the idea of then picking the strategy that is maximal under this order. We then try to pull the comparison inside the limit:

s_{1} \geq s_{2} ⟺ lim n \to \infty [1 / n n \sum i = 0 (U (g a m e (s_{1}; r (i)))) \geq 1 / n n \sum j = 0 (U (g a m e (s_{2}; r (j))))]

but this doesn't quite work, because we have a truth value inside the limit. Replace that with a propability (and dropping the normalizers, since they dont matter):

s_{1} \geq s_{2} ⟺ lim n \to \infty [P [n \sum i = 0 (U (g a m e (s_{1}; r (i)))) \geq n \sum j = 0 (U (g a m e (s_{2}; r (j))))]] > 0

and for the games where classic utility maximization was well-defined this should give the same results.

Now we can properly define our infinite game: $g a m e (s; r)$ gets a third parameter indicating the number of rounds played: $g a m e (s, r, t)$ stands for playing the kelly-game for t rounds instead of infinitely long. The full game is then the limit of this. Then I define the criterion for limiting games of this type as:

s_{1} \geq s_{2} ⟺ lim n \to \infty lim t \to \infty [P [n \sum i = 0 (U (g a m e (s_{1}; r (i); t))) \geq n \sum j = 0 (U (g a m e (s_{2}; r (j); t)))]] > 0

which we can easily see reproduces Kelly-behaviour: For any n for any d as t goes to infinity the odds that any of the bettors in the sample has a percentage of heads so far that differs form 50% by more than d go to 0, so whichever strategy does better when it gets exactly 50% heads will have higher payoff at $t = \infty$ , and since this is true for any n it's also true as n goes to infinity. This is precisely the Kelly-strategy.

Does it make sense to look at a game with infinitely many rounds? Perhaps not. You could also say that the game has a 1% chance of ending each round: Then it would end in finitely many rounds with propability one. I can't solve this analytically, but I think it would end up looking very close to Kelly behaviour.

Notice that if the order of the n- and t-limits is switched, we get the all-in strategy. This is how I think the intuition that utility maximization implies all-in is generated, and this switch is why I put it into the "ergodic" category. Either version would give results consistent with expected utility maximization for games which are finite (encoded as $\exists t_{1} \forall t > t_{1} \forall s \forall r [g a m e (s; r; t) = g a m e (s; r; t_{1})]$ ).

BettingDecision theoryKelly CriterionRationality

Frontpage

24 Ω 6

Mentioned in

47Superrational Agents Kelly Bet Influence!

23Forecasting Newsletter: March 2021

22The Case for Frequentism: Why Bayesian Probability is Fundamentally Unsound and What Science Does Instead

New Comment

10 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:36 PM

[-]SarahNibs4y80

Eigil recently noted that you can't just say lim(EV(fn)), as you did above calling it "written slightly unconventionally", since that's not equal to EV(lim(fn)) which is what we actually want. https://www.lesswrong.com/posts/DfZtwtGD6ymFtXmdA/kelly-is-just-about-logarithmic-utility?commentId=gtwdXs6jArFaFAjqe

Reply

[-]Bunthut4y10

If I understand that context correctly, thats not what I'm doing. The unconventional writing doesn't pull a lim outside an EV, it replaces an EV with a lim construction. In fact, that comment seems somewhat in support of my point: he's saying that doesn't properly represent an infinite game. And if the replacing of E with the n-lim that I'm doing works out, then thats saying that the order of limits that results in Kelly is the right one. Its similar to what I said (less formally) about expected utility maximization not recommending all-in for infinitely many rounds.

Reply

[-]SarahNibs4y30

So in the triple-your-even-odds-bet situation, the normal setup is to take the expectation of f={(1,1,1,...): inf, otherwise: 0}, and EV(f)=0. But you're saying we should change that game from f:Ω->[0,inf] to g:Ω,?R?->[0,inf] where ?R? is a domain I don't really understand, a "source of randomness", and then we can try many times, averaging, and take the limit?

I'm suspicious that I don't understand how the "source of randomness" actually operates with infinities and limits, and it seems like it's important to make it formal to make sure nothing's being swept under the rug. Do you have a link to something that shows how "source of randomness" is generally formalized, or if not, how you're thinking it works more explicitly?

Reply

[-]Bunthut4y20

But you're saying we should change that game from f:Ω->[0,inf] to g:Ω,?R?->[0,inf]

No. The key change I'm making is from assigning every strategy an expected value (normally real, but including infinity as you do should be possible) to having the essential math thing be a comparison between two strategies. With your version, all we can say is that all-in has EV 0, don't bet has EV 1, and everything else has EV infinity - but by doing the comparison inside the limits, we get some more differentiation there.

R isn't distinct from Ω. The EV function "binds" the randomness inside what its applied to, so when I roll it out I need to have it occur explicitly inside the limit. I think its fine to say the Rs are normal random variables. Lets say that each r(i) is uniformly distributed in [0;1) iid. then uses that for its randomness. For the game at hand, we could say that the binary digit expansion becomes the sequence of heads and tails thrown.

As you might have noticed I wrote the post in a bit of a hurry, so sorry if not everything is hammered out.

Reply

[-]abramdemski4y50