Thanks very much for the info.
Actually, the tournament format allowing a 1v1 equilibrium strategy to lose is rather a trivial fact---DefectBot is a 1v1 equilibrium strategy and it loses pretty badly indeed. Also, what do you mean by 1v1 equilibrium, specifically? Are you talking about 1v1 where the goal is to maximise your expected score, or 1v1 where the goal is to have a higher score than your opponent?
I should have mentioned that I was talking about a 1v1 equilibrium of strategies that maximizes your expected score.
DefectBot is such an equilibrium strategy for the vanilla iterated-PD with fixed time horizon, but not if you allow simulation.
In the tournament context, a tournament where everyone submits DefectBot is a Nash equilibrium for the entire tournament.
Yes, in fact this was one of the reasons why I was skeptical of the tournament format (and James Miller was as well, IIRC). In the end it turned out Miller was the only one to submit a DefectBot.
Eventually, I predicted that there were going to be lots of Tit-for-tat-like bots to cooperate with and exploit in the last rounds, and that some bots that used a strategy similar to mine (play a 1v1 max-payoff equilibrium strategy while attempting to exploit "dumb" bots), hence I went for it hoping for a tie at the top position. It almost worked.
More generally, you're wrong about the existence of Nash equilibria not being guaranteed; as long as there are finitely many players and finitely many strategies, and mixed strategies are allowed, there has to be at least one Nash equilibrium.
Ok, I recalled incorrectly. :)
I should have mentioned that I was talking about a 1v1 equilibrium of strategies that maximizes your expected score. DefectBot is such an equilibrium strategy for the vanilla iterated-PD with fixed time horizon, but not if you allow simulation.
DefectBot maximises expected score against DefectBot, because you can't do better vs DefectBot than defect every round and get 100 points. As such, DefectBot vs DefectBot is still a Nash equilibrium.
You need to be more specific as to what "maximizes your expected score" means, because depending on your definition I could come up with some very surprising strategies you might not be expecting.
Followup to: Announcing the 2014 program equilibrium iterated PD tournament
In August, I announced an iterated prisoner's dilemma tournament in which bots can simulate each other before making a move. Eleven bots were submitted to the tournament. Today, I am pleased to announce the final standings and release the source code and full results.
All of the source code submitted by the competitors and the full results for each match are available here. See here for the full set of rules and tournament code.
Before we get to the final results, here's a quick rundown of the bots that competed:
AnderBot
AnderBot follows a simple tit-for-tat-like algorithm that eschews simulation:
CheeseBot
CheeseBot tries several simulation-based strategies, and uses the first one that applies to the current situation.
DefectBot
Defect!
This bot was submitted publicly by James Miller.
DMRB
DavidMonRoBot (DMRB) takes a more cautious approach to simulating: It spends a few hundred milliseconds simulating its opponent to figure out what the rest of the round will look like given a Cooperate or Defect on the current round, and then picks the outcome that leads to the highest total number of points for DMRB.
This allows DMRB to gauge whether its opponent is "dumb," i.e. does not punish defectors. If the opponent is dumb, DMRB reasons that the best move is to defect; otherwise, if DMRB thinks that its opponent will punish defection, it simply plays tit-for-tat. DMRB spends only a small amount of time simulating so that other simulation-based bots will be less likely to have their simulations time out while simulating DMRB.
Pip
This one is a behemoth. At almost 500 very dense lines of Haskell (including comments containing quotes from "The Quantum Thief"), Pip uses a complex series of simulations to classify the opponent into a large set of defined behaviors, such as "CooperateOnLastInTheFaceOfCooperation", "Uncompromising" and "Extortionist." Then, it builds out a decision tree and selects the outcome that leads to the highest score at the end of the match. If I'm being vague here, it's because the inner workings of Pip are still mostly a mystery to me.
SimHatingTitForTat
SimHatingTitForTat, as the name implies, plays tit-for-tat and attempts to punish bots that use simulation to exploit it, namely by defecting against bots that deviated from tit-for-tat on any previous round. Its strategy is as follows:
SimpleTFTseerBot
SimpleTFTseer uses a modified version of tit-for-tat that ruthlessly punishes defectors and uses one simulation per round to look at what its opponent will do on the last round.
SwitchBot
SwitchBot also uses a modified tit-for-tat algorithm:
TatForTit
TatForTit follows a complex simulation-based strategy:
TwoFacedTit
TwoFacedTit simulates its opponent playing against mirrorBot; if it takes more than 10 milliseconds to respond, TwoFacedTit plays Cooperate. Otherwise, TwoFacedTit plays tit-for-tat and then Defects on the last round.
VOFB
Like SimpleTFTseerBot, VeryOpportunisticFarseerBot (VOFB) uses a very aggressive defection-punishment strategy: If either player defected in a previous round, Defect. Otherwise, VOFB simulates the next round to determine what the opponent will do given a Cooperate or Defect on the current round. If the opponent does not punish defection or the simulation does not terminate, VOFB Defects. On the final round, VOFB uses additional simulations to detect whether its opponent defects against backstabbers, and if not, plays Defect.
Tournament results
After 1000 round-robin elimination matches, the final standings are:
1st place (tied): CheeseBot and DavidMonRoBot
3rd place: VeryOpportunisticFarseerBot
4th place: TatForTit
The win frequencies for each bot:
If it were a sporting event, this tournament would not be particularly exciting to watch, as most games were nearly identical from start to finish. The graph below shows each bot's frequency of surviving the first round-robin round:
In other words, the same half of the field consistently swept the first round; AnderBot, DefectBot, Pip, and SimpleTFTseer never survived to see a second round. In general, this is because high-scoring bots almost always cooperated with each other (with the occasional backstab at the end of the round), and defected against AnderBot, DefectBot, Pop, and SimpleTFTseerBot, as these bots either did not consistently retaliate when defected against or pre-emptively defected, triggering a chain of mutual defections. Interestingly, the bots that continued on to the next round did not do so by a large margin:
However, the variance in these scores was very low, primarily due to the repeated matchups of mostly-deterministic strategies consistently resulting in the same outcomes.
In addition, all of the matches progressed in one of the following 5 ways (hat tip lackofcheese):
This suggests that this game does have some kind of equilibrium, because these top three bots use very similar strategies: Simulate my opponent and figure out if defections will be punished; if so, Cooperate, and otherwise, defect. This allows bots following this strategy to always cooperate with each other, consistently providing them with a large number of points in every round, ensuring that they outcompete backstabbing or other aggressive strategies. In this tournament, this allowed the top three bots to add a guaranteed 600 points per round, more than enough to consistently keep them from being eliminated.
The tournament was slightly more interesting (and far more varied) on a matchup-by-matchup basis. Last-round and second-to-last round defections after mutual cooperation were common. TatForTit frequently used this technique against VOFB, CheeseBot, DMRB, and vice versa; this tactic allowed it to steal 32 wins in the final round. Other bots, particularly AnderBot and Pip, behaved very differently between matches. Pip, in particular, sometimes cooperated and sometimes defected for long stretches, and AnderBot's randomness also led to erratic behavior. Ultimately, though, this did not net these bots a large number of points, as their opponents generally defected as soon as they stopped cooperating.
For those interested in the gritty details, I've formatted the output of each match to be human-readable, so you can easily read through the play-by-play of each match (and hopefully get some enjoyment out of it, as well). See the github repo for the full logs.
Postscript
In the course of running the tournament, I received a number of suggestions and idea for how things could be improved; some of these ideas include:
If anyone would like me to run this tournament again at some unspecified time in the future, with or without modifications, feel free to let me know in the comments. If you would like to fork the project and run it on your own, you are more than welcome to do so.
Many thanks to everyone who participated!