I wish I had been around when the initial Diplomacy game was organized. Diplomacy is a hobby of mine. And that project would have been much better organized (and recorded) on the website where I play, WebDiplomacy.
Furthermore, if you wanted a more easily analyzed metric, you can play "gunboat" diplomacy, where you're not allowed to communicate between countries, and the only communication are by moves - you can use support holds to indicate a request for an alliance, etc. I particularly enjoy this form of diplomacy, as it's far more like seven player chess.
A lot of the people on WebDiplomacy are math/science people; the two communities (if LessWrong could get over the HUGE number of trolls on WebDip) would actually get along really well. There are a number of metrics that they collect, but below is a link to a project that one person did, wherein, among other things, he analyzed the most statistically successful openings for the various countries. Also, below that, I have a link to my personal statistics, if you'd find those interesting.
http://tinyurl.com/DiplomacyReport
http://tinyurl.com/SmileysDipStats
I figure I've promoted LessWrong on WebDip enough, I ought to promote WebDip here too... :-)
There was another game, mostly including the NY members, which took place on WebDiplomacy. The game is here; Zvi Mowshowitz won.
Errrrr.... Points Per Supply Center...
Certainly excusable for LWers, but any serious play on WebDip is WTA. :-P
This is a topic highly debated on WebDiplomacy, (I could go on) but Babak said it best when he said (paraphrasing) "Playing for a strong second in PPSC in diplomacy is like shooting it in your own goal in soccer just because you want to score a goal."
And that project would have been much better organized (and recorded) on the website where I play, WebDiplomacy.
Perhaps it would, but Yvain's moderation and reports after each turn gave it a unique flavour which standard WebDip games lack. Anyway, it may be easy to organise another game for LWers on WebDip.
There will definitely be tons of interest. Since I claimed I was willing to set up another game after winning one of the LW games, I probably ought to go ahead and make this happen.
Moreover, simulations I ran using your rules for evolutionary tournament show that one strategy quickly dominates and others go extinct. Defectbot is among strategies which are fastest to go extinct (even in presence of cooperatebot) as it feeds off overaltruist strategies, which in turn fail to compete with tit-for-tat. So I doubt that at least evolutionary tournament will converge into Nash.
I predict that strategy that tit-for-tats 99 turns and defects on 100-th one will win in evolutionary tournament, given that tit-for-tat is also in the population.
ETA: I've sent another strategy.
The strategy can be described in any comprehensible language. If I find problems in understanding, which will probably happen if the strategy is described in Lisp or Basque, but can happen even if it is written in English, I will ask.
You're going to be translating these all into computer programs so you can run them, right? You should specify which language it's going to be, so we can save you some work.
I have already a functional code written in Wolfram Mathematica which simulates the tournaments.
If the number of strategies will not be too big, it is easier for me to code them than to write down the instructions needed for the readers to know how I represent the data, not to speak about help for those unfamiliar with Mathematica's syntax. Simple strategies are usually one-liners, e.g. tit-for-tat has 29 characters.
In the meantime I've run my own simulation, studying a group of strategies, which perform as tit-for-tat except that at specific turn they defect and then they use result of this turn to switch to defect stone or continue tit-for-tatting. Thus they recognize copies of itself and cooperate with them. Such strategy can be exploited by switching to defect stone before it does, or by mimicking its behavior (second defect check after first. This case I didn't analyze).
It leads to interesting results in evolutionary tournament. Second fairest (second longest period of tit-for-tatting) strategy wins. It outperforms less fair strategies by longest cooperation with fairest strategy. And it outperforms fairest strategy by exploiting it.
Define strategy S[n] as TfT until turn n and defect ever since. In the limit of infinite population having non-zero initial number of S[n] for each n, S[0], i.e. DefectBot, eventually dominates. Starting with equal subpopulations, initially most successful is S[99] which preys on S[100] and finally drives it to extinction. But then, S[98] gains advantage over S[99] and so on.
With not so big population however, the more defectorish strategies die out sooner than the environment becomes suitable for them. (I have done it with population of 2000 strategies and the lowest surviving after several hundred generations was S[80] or so).
Last year Yvain had organised a Diplomacy game between LessWrong users to test how well we perform in practical application of game theory. At least two games had been played, but as far as I know no analysis was made afterwards. One reason is probably that few games involving complex interactions between players constitute at most anecdotal evidence for whatever hypothesis one may test. The second one is lack of comparison to outside players. Although the games were fun, their value as a game theory experiment remains rather low. Could we test our game theoretic skills in a statistically more significant way?
Only recently I have learned about Robert Axelrod's experiment in which he run a competition of different strategies playing iteraded prisoner's dilemma, and got an idea to replicate it. I have already run a similar experiment with five contestants (all being my friends) and now a second run is being prepared, with at least nine strategies in the pool. I am interested in a third run, this time with strategies nominated by LessWrongers. The contestants of the second run which has identical rules are readers of my blog and neither of them is probably familiar with specific LW ideas. Therefore, they would serve as a fairly good control group to test LW's applied rationality skills (or a subset of). After matching the strategies in both groups separately, I plan to put all of them together and see who wins.
So, if you want to participate in this contest, feel free to send me your strategy. The rules are following.
The simulation will probably not be run before at least eight strategies are collected and before the beginning of September. The competition is closed, no new strategies are accepted at this moment. 21 different strategies were accepted, their implementations are now being tested. Results will be probably posted on Sunday 4th September.
[Edit: Found inconsistency in using words round and turn to denote the same thing. Now turn is used everywhere.]