What I had in mind as the "best" strategy is the strategy that scores highest against the strategy that is designed to score highest against it. TFT scores 198 points (paperclips, million lives saved, whatever) against 99C1D, which is the best way to deal with TFT. However, I should probably just give you the $500, since it turns out 198 is way too low even for a purely deterministic strategy. I'm now ~100% certain (haha!) that the highest possible score is 248; and if you consider non-deterministic strategies, the score increases to 248.5-e. And sadly, there's no way for you to score higher than 101 respectively 100.5+e against this strategy.
You seem to have left this as an exercise to the reader, so I've worked it out with proof :) If we can publicly precommit and the opponent can't, we should commit to the strategy: "Defect for the first fifty rounds. If the opponent fails to cooperate even once during that time, defect for the last fifty as well. Otherwise play TFT for the last fifty rounds."
Taking the other guy's side for a moment, if we know the opponent is using this strategy, how do we get the most points? If we bow to the implied threat and cooperate for the first fifty rounds, then cooperate for the last fifty as well, we get 100 points, all during the last fifty rounds. We can bump this up to 101 by defecting on the last round. Meanwhile the opponent wins 150+98 = 248 points. If we defect at least once during the first fifty we might as well defect for all 100 rounds, since that is what the opponent will do, and this gets only 100 points, so this strategy is not as good. So as you claimed this blackmail strategy wins 248 points against the strategy the wins the most points against it.
I think the following is a proof that 248 is optimal:
Suppose there are A cases of we-cooperate-opponent-defects, B cases of we-cooperate-opponent-cooperates, and C cases of we-defect-opponent-defects, and 100-A-B-C cases of we-defect-opponent-cooperates. Then the opponent wins 3A+2B+C points. This must be >= 100 because otherwise the opponent can win more by defecting all the time instead of doing what we want him to do. Meanwhile we win 2B+C+3*(100-A-B-C) = 300-3A-B-2C points, and we want to maximize this number subject to the aforementioned constraint. Additional constraints are that A+B+C <= 100 and that A, B, C are all > 0. This is a linear programming problem, for which the optimal solution always lies at one of the corners of the allowed domain. The corners of the allowed domain are at
So the best we can hope for is 250 points. However we can't really guarantee getting 250 because in the case where we get 250 points the opponent gets only 100, so he is indifferent between allowing us the 250, or defecting all the time and only allowing us 100. To ensure that the opponent follows our plans, we actual need to give him at least 101 points.
Suppose we defect 49 times while the opponent cooperates, followed by 51 mutual cooperations. This would give us 493 + 51\2 = 249 points, and the opponent 51*2 = 102 points. However we can never force the opponent to cooperate on the last move, since we have no ability to punish him for defecting, so we can't actually make this happen. The opponent is going to defect on the last move no matter what our strategy is. So 249 points is out of reach as well.
We can win 248 points by force against a rational self-interested opponent using the strategy in the first paragraph, so that is indeed the maximum, as you claimed.
I haven't thought about nondeterministic strategies though.
Today's post, The Truly Iterated Prisoner's Dilemma was originally published on 04 September 2008. A summary (taken from the LW wiki):
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was The True Prisoner's Dilemma, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.