[SEQ RERUN] The Truly Iterated Prisoner's Dilemma

MinibearRex

What I had in mind as the "best" strategy is the strategy that scores highest against the strategy that is designed to score highest against it. TFT scores 198 points (paperclips, million lives saved, whatever) against 99C1D, which is the best way to deal with TFT. However, I should probably just give you the $500, since it turns out 198 is way too low even for a purely deterministic strategy. I'm now ~100% certain (haha!) that the highest possible score is 248; and if you consider non-deterministic strategies, the score increases to 248.5-e. And sadly, there's no way for you to score higher than 101 respectively 100.5+e against this strategy.

You seem to have left this as an exercise to the reader, so I've worked it out with proof :) If we can publicly precommit and the opponent can't, we should commit to the strategy: "Defect for the first fifty rounds. If the opponent fails to cooperate even once during that time, defect for the last fifty as well. Otherwise play TFT for the last fifty rounds."

Taking the other guy's side for a moment, if we know the opponent is using this strategy, how do we get the most points? If we bow to the implied threat and cooperate for the first fifty rounds, then cooperate for the last fifty as well, we get 100 points, all during the last fifty rounds. We can bump this up to 101 by defecting on the last round. Meanwhile the opponent wins 150+98 = 248 points. If we defect at least once during the first fifty we might as well defect for all 100 rounds, since that is what the opponent will do, and this gets only 100 points, so this strategy is not as good. So as you claimed this blackmail strategy wins 248 points against the strategy the wins the most points against it.

I think the following is a proof that 248 is optimal:

Suppose there are A cases of we-cooperate-opponent-defects, B cases of we-cooperate-opponent-cooperates, and C cases of we-defect-opponent-defects, and 100-A-B-C cases of we-defect-opponent-cooperates. Then the opponent wins 3A+2B+C points. This must be >= 100 because otherwise the opponent can win more by defecting all the time instead of doing what we want him to do. Meanwhile we win 2B+C+3*(100-A-B-C) = 300-3A-B-2C points, and we want to maximize this number subject to the aforementioned constraint. Additional constraints are that A+B+C <= 100 and that A, B, C are all > 0. This is a linear programming problem, for which the optimal solution always lies at one of the corners of the allowed domain. The corners of the allowed domain are at

A = 0; B = 50; C = 0 [50 mutual cooperations, 50 we-defect-opponent-cooperates] - this wins 250 points for us, and the opponent wins 100 points.
A = 33.3; B = 0; C = 0 [We cooperate while allowing the opponent to defect 33.3 times, then defect while the opponent cooperates for 66.6 rounds] - we only win 200 points
A = 100; B = 0; C = 0 [We always cooperate; opponent always defects] - we win nothing.
A = 0; B = 100; C = 0 [Everyone always cooperates] - we only win 200 points.
A = 0; B = 0; C = 100 [everyone always defects] - we only win 100 points

So the best we can hope for is 250 points. However we can't really guarantee getting 250 because in the case where we get 250 points the opponent gets only 100, so he is indifferent between allowing us the 250, or defecting all the time and only allowing us 100. To ensure that the opponent follows our plans, we actual need to give him at least 101 points.

Suppose we defect 49 times while the opponent cooperates, followed by 51 mutual cooperations. This would give us 493 + 51\2 = 249 points, and the opponent 51*2 = 102 points. However we can never force the opponent to cooperate on the last move, since we have no ability to punish him for defecting, so we can't actually make this happen. The opponent is going to defect on the last move no matter what our strategy is. So 249 points is out of reach as well.

We can win 248 points by force against a rational self-interested opponent using the strategy in the first paragraph, so that is indeed the maximum, as you claimed.

I haven't thought about nondeterministic strategies though.

7

[SEQ RERUN] The Truly Iterated Prisoner's Dilemma

7

7

7

[SEQ RERUN] The Truly Iterated Prisoner's Dilemma

7

7