Something that bothers me about this tournament: I feel like a competitive tournament doesn't actually reward the kind of strategy that is meant to do well in Prisoner's Dilemna. As a (highly oversimplified) example, consider three bots who have the scores:
A: 10 B: 9 C: 2
Here, A is 'winning.' Suppose B can make a move that costs A 3 points and costs itself 1 point, leading to:
A: 7 B: 8 C: 2
B's payoff function has dropped. However, from a 'winning the tournament' approach, B has gone from 2nd to 1st, and so this outcome is now better for B. This feels wrong.
I doubt this was a really big issue here, but just on general principles I feel like competition by comparing scores is incompatible with a desire to explore the Prisoner's Dilemma, since you're turning a non-zero-sum game into a zero-sum game.
Followup to: Announcing the 2014 program equilibrium iterated PD tournament
In August, I announced an iterated prisoner's dilemma tournament in which bots can simulate each other before making a move. Eleven bots were submitted to the tournament. Today, I am pleased to announce the final standings and release the source code and full results.
All of the source code submitted by the competitors and the full results for each match are available here. See here for the full set of rules and tournament code.
Before we get to the final results, here's a quick rundown of the bots that competed:
AnderBot
AnderBot follows a simple tit-for-tat-like algorithm that eschews simulation:
CheeseBot
CheeseBot tries several simulation-based strategies, and uses the first one that applies to the current situation.
DefectBot
Defect!
This bot was submitted publicly by James Miller.
DMRB
DavidMonRoBot (DMRB) takes a more cautious approach to simulating: It spends a few hundred milliseconds simulating its opponent to figure out what the rest of the round will look like given a Cooperate or Defect on the current round, and then picks the outcome that leads to the highest total number of points for DMRB.
This allows DMRB to gauge whether its opponent is "dumb," i.e. does not punish defectors. If the opponent is dumb, DMRB reasons that the best move is to defect; otherwise, if DMRB thinks that its opponent will punish defection, it simply plays tit-for-tat. DMRB spends only a small amount of time simulating so that other simulation-based bots will be less likely to have their simulations time out while simulating DMRB.
Pip
This one is a behemoth. At almost 500 very dense lines of Haskell (including comments containing quotes from "The Quantum Thief"), Pip uses a complex series of simulations to classify the opponent into a large set of defined behaviors, such as "CooperateOnLastInTheFaceOfCooperation", "Uncompromising" and "Extortionist." Then, it builds out a decision tree and selects the outcome that leads to the highest score at the end of the match. If I'm being vague here, it's because the inner workings of Pip are still mostly a mystery to me.
SimHatingTitForTat
SimHatingTitForTat, as the name implies, plays tit-for-tat and attempts to punish bots that use simulation to exploit it, namely by defecting against bots that deviated from tit-for-tat on any previous round. Its strategy is as follows:
SimpleTFTseerBot
SimpleTFTseer uses a modified version of tit-for-tat that ruthlessly punishes defectors and uses one simulation per round to look at what its opponent will do on the last round.
SwitchBot
SwitchBot also uses a modified tit-for-tat algorithm:
TatForTit
TatForTit follows a complex simulation-based strategy:
TwoFacedTit
TwoFacedTit simulates its opponent playing against mirrorBot; if it takes more than 10 milliseconds to respond, TwoFacedTit plays Cooperate. Otherwise, TwoFacedTit plays tit-for-tat and then Defects on the last round.
VOFB
Like SimpleTFTseerBot, VeryOpportunisticFarseerBot (VOFB) uses a very aggressive defection-punishment strategy: If either player defected in a previous round, Defect. Otherwise, VOFB simulates the next round to determine what the opponent will do given a Cooperate or Defect on the current round. If the opponent does not punish defection or the simulation does not terminate, VOFB Defects. On the final round, VOFB uses additional simulations to detect whether its opponent defects against backstabbers, and if not, plays Defect.
Tournament results
After 1000 round-robin elimination matches, the final standings are:
1st place (tied): CheeseBot and DavidMonRoBot
3rd place: VeryOpportunisticFarseerBot
4th place: TatForTit
The win frequencies for each bot:
If it were a sporting event, this tournament would not be particularly exciting to watch, as most games were nearly identical from start to finish. The graph below shows each bot's frequency of surviving the first round-robin round:
In other words, the same half of the field consistently swept the first round; AnderBot, DefectBot, Pip, and SimpleTFTseer never survived to see a second round. In general, this is because high-scoring bots almost always cooperated with each other (with the occasional backstab at the end of the round), and defected against AnderBot, DefectBot, Pop, and SimpleTFTseerBot, as these bots either did not consistently retaliate when defected against or pre-emptively defected, triggering a chain of mutual defections. Interestingly, the bots that continued on to the next round did not do so by a large margin:
However, the variance in these scores was very low, primarily due to the repeated matchups of mostly-deterministic strategies consistently resulting in the same outcomes.
In addition, all of the matches progressed in one of the following 5 ways (hat tip lackofcheese):
This suggests that this game does have some kind of equilibrium, because these top three bots use very similar strategies: Simulate my opponent and figure out if defections will be punished; if so, Cooperate, and otherwise, defect. This allows bots following this strategy to always cooperate with each other, consistently providing them with a large number of points in every round, ensuring that they outcompete backstabbing or other aggressive strategies. In this tournament, this allowed the top three bots to add a guaranteed 600 points per round, more than enough to consistently keep them from being eliminated.
The tournament was slightly more interesting (and far more varied) on a matchup-by-matchup basis. Last-round and second-to-last round defections after mutual cooperation were common. TatForTit frequently used this technique against VOFB, CheeseBot, DMRB, and vice versa; this tactic allowed it to steal 32 wins in the final round. Other bots, particularly AnderBot and Pip, behaved very differently between matches. Pip, in particular, sometimes cooperated and sometimes defected for long stretches, and AnderBot's randomness also led to erratic behavior. Ultimately, though, this did not net these bots a large number of points, as their opponents generally defected as soon as they stopped cooperating.
For those interested in the gritty details, I've formatted the output of each match to be human-readable, so you can easily read through the play-by-play of each match (and hopefully get some enjoyment out of it, as well). See the github repo for the full logs.
Postscript
In the course of running the tournament, I received a number of suggestions and idea for how things could be improved; some of these ideas include:
If anyone would like me to run this tournament again at some unspecified time in the future, with or without modifications, feel free to let me know in the comments. If you would like to fork the project and run it on your own, you are more than welcome to do so.
Many thanks to everyone who participated!