About two weeks ago I announced an open competition for LessWrong readers inspired by Robert Axelrod's famous tournaments. The competitors had to submit a strategy which would play an iterated prisoner's dilemma of fixed length: first in the round-robin tournament where the strategy plays a hundred-turn match against each of its competitors exactly once, and second in the evolutionary tournament where the strategies are randomly paired against each other and their gain is translated in number of their copies present in next generation; the strategy with the highest number of copies after generation 100 wins. More details about the rules were described in the announcement. This post summarises the results.
The Zoo of Strategies
I have received 25 contest entries containing 21 distinct strategies. Those I have divided into six classes based on superficial similarities (except the last class, which is a catch-all category for everything which doesn't belong anywhere else, something like adverbs within the classification of parts of speech or now defunct vermes in the animal kingdom). The first class is formed by Tit-for-tat variants, probably the most obvious choice for a potentially successful strategy. Apparently so obvious that at least one commenter declared high confidence that tit-for-tat will make more than half of the strategy pool. That was actually a good example of misplaced confidence, since the number of received tit-for-tat variants (where I put anything which behaves like tit-for-tat except for isolated deviations) was only six, two of them being identical and thus counted as one. Moreover there wasn't a single true tit-for-tatter among the contestants; the closest we got was
A (-, -): On the first turn of each match, cooperate. On every other turn, with probability 0.0000004839, cooperate; otherwise play the move that the opponent played on the immediately preceding turn.
(In the presentation of strategies, the letter in bold serves as a unique identificator. The following parentheses include the name of the strategy — if the author has provided one — and the name of the author. I use the author's original description of the strategy when possible. If that's too long, an abbreviated paraphrase is given. If I found the original description ambiguous, I may give a slightly reformulated version based on subsequent clarifications with the author.) The author of A was the only one who requested his/her name should be withheld and the strategy is nameless, so both arguments in the bracket are empty. The reason for the obscure probability was to make the strategy unique. The author says:
I wish to enter a trivial variation on the tit-for-tat strategy. (The trivial variation is to force the strategy to be unique; I wish to punish defectorish strategies by having lots of tit-for-tat-style strategies in the pool.)
This was perhaps a slight abuse of rules, but since I am responsible for failing to make the rules immune to abuse, I had to accept the strategy as it is. Anyway, it turned out that the trivial variation was needless for the stated purpose.
The remaining strategies from this class were more or less standard with B being the most obvious choice.
B (-, Alexei): Tit-for-Tat, but always defect on last turn.
C (-, Caerbannog): Tit-for-tat with 20% chance of forgiving after opponent's defection. Defect on the last turn.
D (-, fubarobfusco and DuncanS): Tit-for-tat with 10% chance of forgiving.
E (-, Jem): First two turns cooperate. Later tit-for-tat with chance of forgiving equal to 1/2x where x is equal to number of opponent's defections after own cooperations. Last turn defect.
The next category of strategies I call Avengers. The Avengers play a nice strategy until the opponent's defective behaviour reaches a specified threshold. After that they switch to irrevocable defection.
F (-, Eugine_Nier): Standard Tit-for-Tat with the following modifications: 1) Always defect on the last move. 2) Once the other player defects 5 times, switch to all defect.
G (-, rwallace): 1. Start with vanilla tit-for-tat. 2. When the opponent has defected a total of three times this match, switch to always defect for the rest of the match. 3. If the number of rounds is known and fixed, defect on the last round.
H (-, Michaelos): Tit for Tat, with two exceptions: 1: Ignore all other considerations and always defect on the final iteration. 2: After own defection and opponent's cooperation, 50 percent of the time, cooperate. The other 50 percent of the time, always defect for the rest of the game.
I (-, malthrin): if OpponentDefectedBefore(7) then MyMove=D else if n>98 then MyMove=D else MyMove=OpponentLastMove
J (Vengeful Cheater, Vaniver): Set Rage = 0. Turn 1: cooperate. Turn 2-99: If the opponent defected last round, set Rage to 1. (If they cooperate, no change.) If Rage is 1, defect. Else, cooperate. Turn 100: defect.
K (Grim Trigger, shinoteki): Cooperate if and only if the opponent has not defected in the current match.
Then we have a special "class" of DefectBots (a name stolen from Orthonormal's excellent article) consisting of only one single strategy. But this is by far the most popular strategy — I have received it in four copies. Unfortunately for DefectBots, according to the rules they have to be included in the pool only once. The peculiar name for the single DefectBot comes from wedrifid:
L (Fuck them all!, MileyCyrus and wedrifid and vespiacic and peter_hurford): Defect.
Then, we have a rather heterogenous class of Lists, whose only common thing is that their play is specified by a rather long list of instructions.
M (-, JesusPetry): Turn 1: cooperate. Turns 2 to 21: tit-for-tat. Turn 22: defect. Turn 23: cooperate. Turn 24: if opponent cooperated in turns 21 and 22, cooperate; otherwise, do whatever the opponent did on previous turn. Then tit-for-tat, the scenario from turns 22-24 repeats itself in turns 35-37, 57-59, 73-75. Turns 99 and 100: defect.
N (-, FAWS): 1. Cooperate on the first turn. 2. If the opponent has defected at least 3 times: Defect for the rest of the match. 3. Play tit for tat unless specified otherwise. 4. If the opponent has cooperated on the first 20 turns: Randomly choose a turn between 21 and 30. Otherwise continue with 11. 5. If the opponent defects after turn 20, but before the chosen turn comes up: Play CDC the next three turns, then continue with 12. 6. Defect on the chosen turn if the opponent hasn't defected before. 7. If the opponent defects on the same turn: Cooperate, then continue with 12. 8. If the opponent defects on the next turn: Cooperate, then continue with 11. 9. If the opponent defects on the turn after that (i. e. the second turn after we defected): Cooperate, then continue with 12. 10. If the opponent cooperates on every turn up to and including two turns after the defection: Defect until the opponent defects, then cooperate twice and continue with 11. 11. Defect on the last two turns (99 and 100). 12. Defect on the last two turns unless the opponent defected exactly once.
O (Second Chance, benelliott): Second Chance is defined by 5 rules. When more than one rule applies to the current situation, always defer to the one with the lower number. Rules: 1. Co-operate on move 1, defect on moves 98, 99 and 100. 2. If Second Chance has co-operated at least 4 times (excluding the most recent move) and in every case co-operation has been followed by defection from the other strategy, then always defect from now on. 3. If Second Chance has co-operated at least 8 times, and defected at least 10 times (excluding the previous move), then calculate x, the proportion of the time that co-operation has been followed by co-operation and y, the proportion of the time that defection has been followed by co-operation. If 4x < 6y + 1 then defect until that changes. 4. If the number of times the opposing strategy has defected (including the most recent move) is a multiple of 4, co-operate. 5. Do whatever the opposing strategy did on the previous move.
The next class are CliqueBots (once more a name from the Orthonormal's post). CliqueBots try to identify whether their opponent is a copy of themselves and then cooperate. If they identify a foreign opponent, they usually become pretty nasty.
P (Simple Identity ChecK, red75): Move 1: cooperate. Moves 2 to 57: tit-for-tat. Move 58 - defect. Move 59 - if moves 0-57 were coop-coop and 58 was defect-defect then coop else defect. Moves 60-100: if my move 59 was coop then tit-for-tat else defect.
Q (EvilAlliance, ArisKatsaris): Start with 5 defections. On later turns, if the opponent had cooperated at least once in the first 5 turns or defected ever since, defect; else cooperate.
The remaining strategies are Unclassified, since they have little in common. Here they come:
R (Probe & Punish, Eneasz): 1. Cooperate for a turn. 2. Cooperate for a second turn. If opponent cooperated on the second turn, continue to cooperate in every subsequent turn until the opponent defects. 3. If/when the opponent defects, defect for 12 consecutive rounds. 4. Return to 1, repeat.
S (Win-stay lose-shift, Nerzhin): It cooperates if and only if it and the opponent both played the same strategy on the last round.
The author of S says he's adopted the strategy from Nowak and Sigmund, Nature 364 pp. 56-58.
T (Tit for Two Tats, DataPacRat): As the name suggests.
U (RackBlockShooter, Nic_Smith): The code is available here.
This one is a real beast, with code more that twice as long as that of its competitors. It actually models the opponent's behaviour as a random process specified by two parameters: probability of cooperation when the second strategy (which is the RackBlockShooter) also cooperates, and probability of cooperation when RBS defects. This was acausal: the parameters depend on RBS's behaviour in the same turn, not in the preceding one. Moreover, RBS stores whole probability distributions of those parameters and correctly updates them in an orthodox Bayesian manner after each turn. To further complicate things, if RBS thinks that the opponent is itself, it cooperates in the CliqueBottish style.
And finally, there's a strategy included by default:
Z (Fully Random, default): Cooperate with 50% probability.
The Round-Robin Tournament
Each strategy had played 21 matches where a total maximum of 14,700 points could theoretically be won. A more reasonable threshold was 8,400 points: an award for any member of a pool consisting solely of nice strategies (i.e. those which never defect first). No strategy reached this optimum. The standings are given in the following table (columns from left: strategy identifier, matches won, matches drawn, matches lost, total points).
The winner is malthrin's unnamed strategy. Second and third place are occupied by Eugine_Nier's modified tit-for-tat and benelliott's Second Chance, respectively. Congratulations!
Here are results of all matches (view the picture in full size by right-clicking).
Is there something interesting to say about the results? Well, there were three strategies which performed worse than random: the DefectBot L, the CliqueBot Q (which effectively acted like a DefectBot) and the complex strategy U. As expected, very nasty strategies were unsuccessful. On the other hand, totally nice strategies weren't much successful either. No member of the set {A, D, K, R, S, T} got into the first five. Not defecting on the last turn was a mistake, defecting on the last two turns seemed even a better choice. As for the somewhat arbitrary classes, the average gains were as such:
- Tit-for-tat variants 7197.4
- Avengers 7144.3
- Lists 6551.3
- Unclassified 5963.8
- CliqueBots 4371.5
- DefectBots 3174.0
Another probably obvious point is that in a non-zero sum game it doesn't matter too much how many opponents you beat in direct confrontation. The only undefeated strategies that won all their matches except one draw, L and Q, finished last.
The Evolutionary1 Tournament
The initial population consisted of 1,980 strategies, 90 of each kind. But that didn't last for long. The very nasty strategies started to die out rapidly and by generation 6 there were no representants of L, Q, U and Z. (I have to say I was very glad to see U extinct so early because with non-negligible number of Us in the pool the simulation run very, very slowly.) By generation 40, M, N and P were also gone. Unfortunately, 100 was too low as a number of generations to see any interesting effects associated with environmental change as different strategies become extinct (but see the next section for a longer simulation); the strategies successful in the round-robin tournament were generally successful here as well.
The graph shows the history of subpopulations (TfT variants are blue, Avengers red, Lists green, CliqueBots purple, Unclassified strategies yellow, the DefectBot gray and FullyRandom is black). The final standings (orderd by number of copies in generation 100) are summarised in the following table.
The gold and silver medals are once again awarded to malthrin and Eugine_Nier. There is a change at the third place which is now occupied by Caerbannog's 20%-forgiving tit-for-tat.
One remarkable fact is the absolute failure of CliqueBots which were without doubt designed to succeed in the evolutionary tournament. The idea was, perhaps, that CliqueBots win by cooperating among themselves and punish the others by defecting. That may be a working meta-strategy once you are dominant in the population, but for players starting as small minority it is suicidal. It would be a pretty bad idea even if most contestants were CliqueBots: while TfT variants generally cooperate with each other regardless of slight differences between their source codes, CliqueBots are losing points in defectionful matches even against other CliqueBots except those of their own species.
The Control Group
I run the same experiment with readers of my blog and thus I have a control group to match against the LW competitors. The comparison is done simply by putting all strategies together in an even larger tournament. Before showing the results, let me introduce the control group strategies.
C1: Let p be the relative frequency of opponent's cooperations (after my cooperation if I have cooperated last turn / after my defection if I have defected). If the relative frequency can't be calculated, let p = 1/2. Let Ec = 5p - 7 and Ed = 6p - 6. Cooperate with probability exp Ec / (exp Ec + exp Ed).
This strategy was in many respects similar to strategy U: mathematically elegant, but very unsuccessful.
C2: Defect if and only if the opponent had defected more than twice or if it defected on the first turn.
C3: If the opponent played the same move on the last and last but one turn, play that. Else, play the opposite what I have played last turn. On turns 1 and 2 play defect, cooperate.
C4: Cooperate on the first three turns, defect on the last two. Else, cooperate if the opponent has cooperated at least 85% of the time.
C5: Cooperate on the first three turns, defect on the last turn. Else, cooperate with probability equal to the proportion of opponent's cooperations.
C6: Tit-for-tat, but cooperate after the opponent's first defection.
C7: On the first turn play randomly, on turns 2n and 2n+1 play what the opponent played on turn n.
C8: Defect, if the opponent defected on two out of three preceding turns, or if I have defected on the last turn and cooperated on the last but one turn, or if I haven't defected in preceding 10 turns. On turns 20, 40, 60 and 80 cooperate always.
C9: Have two substrategies: 1. tit for two tats and 2. defect after opponent's first defection. Begin with 1. After each tenth turn change the substrategy if the gain in the last ten turns was between 16 and 34 points. If the substrategy is changed, reset all defection counters.
C10: Turns 1 and 2: cooperate. Turns 3-29: tit for tat. Turns 30-84: defect if the opponent defected more than seven times, else tit for tat. Turn 85: defect. Turns 86 and 87: defect if the opponent defected more than seven times, else cooperate. Turns 88-97: If the opponent defected on turn 87 or more than four times during the match, defect. Else cooperate. Turns 98-100: defect.
C11: Turn 1: cooperate. Turn 2: tit for tat. Turn 100: defect. Else, if the opponent defected on the last and (I have cooperated or the opponent has defected on the last but one turn) defect, else cooperate.
Two round-robin tournament standings are given in the following tables (two tournaments were played to illustrate the rôle of randomness).
You can see how the new strategies had changed the standings. The original winner, I, now hovers around rather unremarkable 15th position, 10th among LW strategies. The average score for the LW strategies (calculated from the second tournament) is 9702.9, for the control group only 9296.9. How much this means that reading LW improves game-theoretical skills I leave to the readers' judgement.
In the evolutionary setting the strategy I recovered back its leading position. Here are the 100th generation populations and the graph:
(In the graph, LW strategies are green and the control group strategies are red.)
It is relatively clear that the situation after hudred generations is not equilibrium and further changes would ensue if the simulation continued. Therefore I have run one more simulation, this time lasting for thousand of generations. This was more interesting, since more strategies died out. The first was C10 which was gone after generation 125. Around generation 300 there were only four surviving strategies from the control group (two of them moribund) and twelve from the original LW pool (two of them seriously endangered, too.) The leaders were I (with population 539), C4 (432) and C11 (183). The last one went through a steady slow decline and was soon overtaken by O, which assumed the third position around generation 420. At generation 600, only three strategies remained in the pool: I (932), C4 (692) and O (374). Not long before that the populations of I and C4 peaked and started to decline, while O continued to rise. In direct confrontation, O always beats both I and C4 397:390 which proved to be decisive when other strategies were eliminated. After the thousandth generation, there were 1374 copies of O and only 625 copies of both its competitors combined.
Final Remarks
I have organised the prisoner's dilemma tournament mainly for fun, but hopefully the results can illustrate few facts about non-zero sum games. At least in the final simulation we can see how short-term success needn't imply long-term survival. The failure of CliqueBots shows that it isn't much prudent to be too restrictive about whom one cooperates with. There is one more lesson that I have learned: the strategies which I considered most beautiful (U and C1) played poorly and were the first to be eliminated from the pool. Both U and C1 tried to experiment with the opponent's behaviour and use the results to construct a working model thereof. But that didn't work in this setting: while those strategies were losing points experimenting, dumb mechanical tit-for-tats were maximising their gain. There are situations when the cost of obtaining knowledge is higher than the knowledge is worth, and this was one of such situations. (Of course, both discussed strategies could have used the knowledge they had more efficiently, but that wouldn't significantly help.)
I have deliberately run fixed-length matches to make the competition more interesting. Random-length iterated PDs are domain of tit-for-tats and it would be hard to devise a better strategy; in the fixed-length case the players had to think about the correct turn when to defect first when playing a nice strategy and the answer isn't trivial. The most successful strategies started defecting on the 98th or 99th turns.
The question whether reading LessWrong improves performance in similar games remains unsolved as the evidence gathered in the tournament was very weak. The difference between the LW and control groups wasn't big. Other problem is that I don't know much about the control group authors (except one of them whom I know personally); it's not clear what part of population the control group represents. The judgement is more complicated by the apparent facts that not all participants were trying to win: the nameless author of A has explicitly declared a different goal (namely, to punish defectorish strategies) and it's likely that L was intended to signal author's cynicism (I am specifically thinking about one of the four DefectBot's originators) rather than to succeed in the tournament. Furthermore, many strategies were send by new users and only 11 out of 26 strategy authors have karma over 500, which means that the LW group doesn't faithfully represent LessWrong active participants.
1 Evolutionary may be a misnomer. The tournament models natural selection, but no changes and therefore evolution occurs. I have kept the designation to maintain consistency with the earlier post.
I just hacked up something like variant 3; haven't tried to do anything interesting with it yet.
Awesome! The only suggestion I have is to pass in a putative history and/or tournament parameters to an agent in the evaluation function so the agent can do simple things like implement tit-for-tat on the history, or do complicated things like probing the late-game behavior of other agents early in the game. (E.G. "If you think this is the last round, what do you do?")