The prisoner's dilemma tournament is over. There were a total of 21 entries. The winner is Margaret Sy, with a total of 39 points. 2nd and 3rd place go to rpglover64 and THE BLACK KNIGHT, with scores of 38 and 36 points respectively. There were some fairly intricate strategies in the tournament, but all three of these top scorers submitted programs that completely ignored the source code of the other player and acted randomly, with the winner having a bias towards defecting.

You can download a chart describing the outcomes here, and the source codes for the entries can be downloaded here.

I represented each submission with a single letter while running the tournament. Here is a directory of the entries, along with their scores: (some people gave me a term to refer to the player by, while others gave me a term to refer to the program. I went with whatever they gave me, and if they gave me both, I put the player first and then the program)

A: rpglover64 (38)
B: Watson Ladd (27)
c: THE BLACK KNIGHT (36)
D: skepsci (24)
E: Devin Bayer (30)
F: Billy, Mimic-- (27)
G: itaibn (34)
H: CooperateBot (24)
I: Sean Nolan (28)
J: oaz (26)
K: selbram (34)
L: Alexei (25)
M: LEmma (25)
N: BloodyShrimp (34)
O: caa (32)
P: nshepperd (25)
Q: Margaret Sy (39)
R: So8res, NateBot (33)
S: Quinn (33)
T: HonoreDB (23)
U: SlappedTogetherAtTheLastMinuteBot (20)


New Comment
124 comments, sorted by Click to highlight new comments since: Today at 11:19 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Yay, I wasn't last!

Still, I'm not surprised that laziness did not pay off. I wrote a simple bot, then noticed that it cooperated against defectbot and defected against itself. I thought to myself, "This is not a good sign." Then I didn't bother changing it.

Frankly, I find this hilarious.

Can I suggest running a second tournement, including all these bots and new bots that we can examine these bots in writing? After a few iterations, we might beat random. Or we might develop all sorts of signaling quirks, but that would also be interesting.

I like this plan. I'd be willing to run it, unless AlexMennan wants to.

When I first announced the tournament, people came up with many suggestions for modifications to the rules, most of which I did not take for the sake of simplicity. Maybe we should try to figure out what rules will make it easier to write more effective bots, even if that would disqualify many of the bots already submitted to this tournament.

9Andreas_Giger11y
Am I the only one who sees a problem in that we're turning a non-zero-sum game into a winner-take-all tournament? Perhaps instead of awarding a limited resource like bitcoins to the "winner", each player should be awarded an unlimited resource such as karma or funny cat pictures according to their strategy's performance.
2wedrifid11y
No, not the only one. The same complaint was well received in the initial thread.
8SilasBarta11y
Maybe we should have a prisoner's dilemma meta-tournament in which we define a tournament-goodness metric and then everyone submits a tournament design that is run using the bot participants in the previous tournament, and then use the rankings of those designs. Wait: does anyone know a good way to design the meta-tournament?
7solipsist11y
I'll list some suggestions. 1. Conduct a statistically significant number of trials. With few trial, the highest ranked agents will be the ones with a high variance in possible score, rather than the agents with the highest average score. 2. Allow no libraries, except for a very small whitelist (e.g. random numbers). 3. No timing, no threads. They make analysis too hard. The payoff might have to be changed to make it never in your best interest to make your opponent diverge, e.g. (-1, -1). 4. The rule "This is about Prisoner's dilemma, not the language you're working in. If you violate the spirit of this rule, your agent will be disqualified." 5. (Possible) Change language to a restricted dialect of Python. You can analyze your opponent bytecode, if you wish, or run your opponent as a black-box. Programs with side effects (such as writing to global variables) and programs that use Pythonic trickiness (such as inspecting the stack) are verboten.
1BloodyShrimp11y
Most of these suggestions are okay, but don't move away from Lisp. A very restricted Scheme (e.g. one way to set variables, one looping construct (I really like foldl, as N shows), etc.) would be good; the lexical scoping and general immutability make it one of the best Lisps for our purposes.
-1solipsist11y
+1 for restricted Scheme. I suggest restricted Python because it is more accessible and would allow more people to participate, but personally I like Scheme a bit more. On the other hand, many many people were turned off by Scheme last time, and few of them will be reading this post. I'd like to give them a shot, if only to get a more diverse crowd. On looping constructs: Would that buy us anything? What if people use recursion for looping? Should we restrict recursion? Question for the audience: would there be a benefit to using a programming language that allows only primitive recursive functions?
1skepsci11y
I think a programming language that only allows primitive recursion is a bad idea. One common pattern (which I think we want to allow) was for bots to simulate their opponents, which entails the ability to simulate arbitrary valid code. This would not be possible in a language which restricts to primitive recursion.
1Decius11y
I think that the bots already submitted should be considered for grandfathering if the rules change to prohibit them. I also have a specific entry in mind, but lack the expertise to go from 'flowchart' to 'code'.
4Andreas_Giger11y
I think there would more people interested in playing if strategies could be submitted in pseudocode, so that would be great.
3Decius11y
For entries that only eval(opponent(x)), for simple x, that seems reasonable. Strategies that rely on deeper analysis would need to look for specific things about the code that I, as a non-programmer, cannot describe accurately.
2MalcolmOcean11y
I would agree that they should compete even if they're not awarded prizes for winning. Because we should at least have a comparison with performance against old bots. More data is pretty much always better.
0Decius11y
It should also be trivial to calculate the score both in this heat and against all previously submitted bots.
1Kawoomba11y
Bring it on! Just to elaborate, in an iterated 1v1, a defectbot will reliably beat a randombot (4 to 1 if the randombot is unweighted). Or rather, any bot which detects a randombot and then always defects. However, it's not even necessary to beat the few randombots in a 1v1 to win the tournament, it is sufficient to beat a majority of other bots (including some cooperatebots for easy wins) to get ahead of the pack enough so that the results against the randombots don't matter.

I went and ran this another 100 times, so I could see what it would look like without the variance. The mean scores are:

A: 32.03
B: 28.53
C: 32.48
D: 24.94
E: 28.75
F: 29.62
G: 28.42
H: 26.12
I :26.06
J: 26.10
K: 36.15
L: 27.21
M: 25.14
N: 34.37
O: 31.06
P: 26.55
Q: 34.95
R: 32.93
S: 37.08
T: 26.43
U: 24.24

If you're interested, here's the code for the test (takes a day to run) and the raw output for my run (an inconvenient format, but it shows the stats for the matchups).

And for the lazy, here these scores are in sorted order (with original scores in parentheses):

S: 37.08 - Quinn (33)
K: 36.15 - selbram (34)
Q: 34.95 - Margaret Sy (39)
N: 34.37 - BloodyShrimp (34)
R: 32.93 - So8res, NateBot (33)
C: 32.48 - THE BLACK KNIGHT (36)
A: 32.03 - rpglover64 (38)
O: 31.06 - caa (32)
F: 29.62 - Billy, Mimic-- (27)
E: 28.75 - Devin Bayer (30)
B: 28.53 - Watson Ladd (27)
G: 28.42 - itaibn (34)
L: 27.21 - Alexei (25)
P: 26.55 - nshepperd (25)
T: 26.43 - HonoreDB (23)
H: 26.12 - CooperateBot (24)
J: 26.1 - oaz (26)
I: 26.06 - Sean Nolan (28)
M: 25.14 - LEmma (25)
D: 24.94 - skepsci (24)
U: 24.24 - SlappedTogetherAtTheLastMinuteBot (20)

1Caspar Oesterheld6y
I tried to run this with racket and #lang scheme (as well as #lang racket) but didn't get it to work (though I didn't try for very long), perhaps because of backward compatibility issues. This is a bit unfortunate because it makes it harder for people interested in this topic to profit from the results and submitted programs of this tournament. Maybe you or Alex could write a brief description of how one could get the program tournament to run? Edit: I think after posting this, I did manage to get it to work (though I don't remember how).

Has someone gone through all the programs to figure out what strategy they're supposed to be implementing, then checked that they in fact correctly implement that strategy? Also, given that the bots were allowed to use randomness, it would've been a good idea to average a bunch of rounds together.

A tournament like this would be much more interesting if it involved multiple generations. Here, the results heavily depended upon the pool of submitted strategies, regardless of their actual competitiveness, while a multiple-generations tournament would measure success as performance against other successful strategies.

h, i, and j are identical and received noticeably different scores. Maybe next time, repeat the whole process a few times to even out the RNG?

Also, it would be interesting to see the head-to-head matchups, especially to see if R's complexity actually helped even though it seemed mostly aimed at special cases that never came up.

6AlexMennen11y
You can see the head-to-head matchups.
2Luke_A_Somers11y
Yes. I will be taking a look at that later. Sorry that I made an unnecessary implied request there.
[-][anonymous]11y150

Hmm. Considering the heavy amount of random strategies, it seems like it might be worth it iterating the tournament several more times and summing the point totals, particularly since C(one of the randoms) is above the three way tie at 4th place between G, K and N by only two points, and A and C also have a 2 point difference despite having functionally identical strategies (50% random Cooperate or Defect.) I feel like that implies standard deviations may make analysis of the results premature.

Nonetheless, analyzing is fun, so I'll do a little bit anyway on the fourth placers.

K appears to be functionally identical to a Defect bot. It has more complicated code, but that code never cooperated. (Side note: randomly, in this run, Margaret Sy's winning random favor defection at 80% strategy only cooperated with this bot and defected against all other bots.)

Also G and N, despite being in 4th place, didn't actually seem to take successfully advantage of any of the 3 pseduosimplistic cooperate bots in this run (H, I, J - Simulate something, then cooperate regardless.) by defecting against them, which would seem to be one of the key ways a mind reader could try to score points for a win. I ... (read more)

8BloodyShrimp11y
I think that kind of deviation was probably a large part of the motivation for those who submitted random-playing bots, since the contest rules specified one round. I haven't looked too closely at K, nor ran any tests, but I have a slight suspicion it sometimes cooperated in simulation, despite always defecting on its turn. As for the cooperatebots, there were multiple reasons I didn't write N to exploit them, but not source complexity--they don't even do anything besides cooperate; the middle term is just the function's argument list.
2hylleddin11y
I checked the behavior of all the bots that cooperated with K, and all but two (T and Q) would have always cooperated with a defectBot. Specifically the defect bot: (lambda (opp) 'D) Sometimes they cooperated for different reasons. For example, U cooperates with K because it has "quine" in the code, while it cooperates with defectBot because it doesn't have "quine", "eval", or "thread" in it. Q, of course, acts randomly. T is the only one that doesn't cooperate with defectBot but was tricked by K into cooperating. Though I'm having trouble figuring out why because I'm not sure what T is doing. Anyway, it looks like K is reasonable proxy for how defectBot would have done here.
3HonoreDB11y
T was supposed to do a bit more than it did, but it had some portability bugs so I hastily lobotomized it. All it's supposed to do now is simulate the opponent twice against an obfuscated defectbot, defect if it cooperates both times, otherwise play mimicbot. I didn't have the time to add some of the obvious safeguards. I'm not sure if K is exploiting me or just got lucky, but at a glance, what it might be doing is checking whether the passed-in bot can generate a perfect quine of itself, and cooperating only then. That would be pretty ingenious, since typically a quine chain will go "original -- functional copy -- identical copy -- identical copy", etc.
4selbram11y
You're right. K is a MimicBot with an additional check for proper quining. I primarily intended it to cause defection against CooperateBots, RandomBots, and others that don't simulate their opponents meaningfully. I expected a lot more MimicBot variants and mutual cooperations...
2[anonymous]11y
True. I'm not saying that the random people did a poor job of selecting strategies, but Error was depressed that random behavior dominated. My and Luke_A_Somers's points appears to be that the amount randomness dominates by, if it is dominant, would be easier to determine with more rounds.

I don't know quine at all and can't easily understand exactly what all these guys are doing but:

This is obviously NOT a stable metagame; in both senses.

If we ran this tournament again the random-bots would get utterly destroyed (since it's very easy/fast to check for them and defect). If we ran this tournament with multiple rounds, where successful strategies survive and unsuccessful die, and there was even one defect-against-randombot code in the mix, it will win if it doesn't die early.

My guess-happy summary of what happened is this: The core problem is very hard, so most people instead did something simple, often very simple (cooperate bot, defect bot, random bot - which won!) with a small number of people trying for real. The people trying for real, however, spent all their energy planning for other people trying for real and trying to fool them, rather than trying to act correctly against all the super-simple programs, because if you put in that much work you think other people will mostly put in at least some work (R has an entire long blog post of metagame and strategic analysis which is COMPLETELY wrong in exactly this way, in real life I do this a lot). That, combined wit... (read more)

1BloodyShrimp11y
Your summary seems pretty accurate. I don't think there were many programming errors outside of P's meltdown, though. Also, as has been touched upon elsewhere in these comments, some of the failures to maximally exploit simple bots were necessary side effects of the attempts to trick complex bots, not just failures to anticipate there being a significant number of simple bots at all. (Sort of a quantitative instead of qualitative prediction mistake--we just thought there'd be more complex bots than simple bots). One clue towards the general simplicity of the field is the generally atrocious formatting (e.g. close-parens on their own lines)--not many people seem to have had too much experience with Lisp, let alone Lisp projects as complex as this game can get. That's a shame, but it's not like any non-Lisp languages are remotely suitable for the kind of source analysis we all wanted to see in this game.
0Zvi11y
If the tournament were run a second time, my expectation would be that such programs would abound (since writing "if they showed signs of life randomly do something but usually defect, else cooperate" is pretty easy), there'd still be a bunch of simple guys running around, there'd be a small number of people trying to play higher-level exploit games that would have better code than last time, and you'd likely see a level-2 bot (one that beats the obvious level-1 bot that wins the first tournament) come out ahead.
0fractalman11y
for reference, here's what I was "planning"-I was all too aware i didn't have time to learn a new language to actually implement it in. a pity i didn't think to post this before the tournament results, even if it's just pseudocode... run(me,opponent){ //start point notethetime() if(me=opponent){return C} defectbot[]=a list of primitive defectbots with a junk string value added run5times and record results:(run(opponnent, (opponent, defectbot[i]) { if (opponent cooperated sometimes){ if(isooponentclearlymimicbot(ooponent)){cooperate()}else{defect()}}//mimic bot is assumed to be simple and easy to identify as such, but I'm not exactly sure how i'd test this. checktime()//if the opponent takes way too long to run against a short defect bot with barely any baggage, it's going to have trouble with you. Defect for both your sakes. if(opponent cooperates allways){if(opponentisclearly cooperate-bot){do I cooperate with C-bots in the hopes of cooperating with those who cooperate with C-bots or do I take advantage of the schmuck?}//a decision i really never made... mimicbot="...." result=(opponent,mimicbot) if(opponent defected{defect()} checktime(){}//but this time, cooperate if things are taking dangerously long-MAYBE. it's somewhat reasonable for an opponent to take a while against a mimicbot, especially if opponent is mimicbot with small epsilon. . psychbot1.0=("...."+mimicbot+"..."); /psychbot 1.0 sends you against mimic bot, and then does what you do. only, I get to see the final result. record(run( opponent,psychbot1.0)) checktime();-possibly in a seperate thread. depends on how quickly this whole thing goes. psychbot 1.1="...+defectbot". result=run(opponent, psychbot1.1); psychbot 1.1 also checks you against defect bot, like the main program. the purpose, if used, is to avoid cooperating with programs that defect on finding a "D" if(result=C){return C;} else{return D}. } } Yeah. pseudocode. for java, not lisp. edit: clarified a few things in the intro.
0Zvi11y
I find cooperation with cooperation-bot utterly insane in context; I would have predicted less than three, but a good chance of at least one. In the real world problem, or a tournament with a lot of rounds, it's potentially worth saying that since C-bots will die off quickly (although, if enough people cooperate with them, maybe they won't) they're effectively unlikely enough that you can safely use your response to them as signaling for other programs, but if that's true, then your opponent should presumably know that and throw non-trivial cases at you instead.
0fractalman11y
yeah...but I was more concerned about various TrollBots-including but not limited to HashBot, Randombot, DitherBot, cooperate-IFF you are a defect bot or cooperate bot, antimimicbot...and I considered "cooperates with C-Bot" to be extremely useless as a source of information given the unusual demographics of LessWrong while drawing up my plans.

One additional note on a second tournament: I notice bots weren't played against themselves. Should this be done? It seems useful as a sanity check; I feel like a bot that won't even cooperate against itself ought to get fewer points.

2Decius11y
Why? In what circumstance would you ever be offered a choice to cooperate or compete with your concurrent self?

Why? In what circumstance would you ever be offered a choice to cooperate or compete with your concurrent self?

Probably in a respectable fraction of the universes in which other agents have a copy of your source code.

2BloodyShrimp11y
This is the reason my bot attempted to recognize itself. Attention anyone who plays in a tournament that continues to use this bot pool: put the token LightninRoy in your source!
1Viliam_Bur11y
In Robin Hanson's em dystopia. :D It seems fair that if I have to play against a copy of myself, then you have to play against two copies of myself (because it means the universe contains two instances of me). But make it millions, and then I play against approximately as much copies of myself as you do.
0Kawoomba11y
Good news: You win! :-) Bad news: You lose! :-(
2Viliam_Bur11y
Best news: You cooperate. Twice! :D

Huh. I'm not entirely surprised that random behavior was dominant, but it's certainly depressing. Nevertheless, I'm glad you ran this.

This may be more a reflection that programming interesting strategies which run each others code and fully do what you want them to do is tough. It might be interesting to look at the strategies which did look at the other's code and see if any are responding in an obviously wrong fashion in that particular instance.

I wasn't expecting there to be so much randomness. If I had been, I guess I would have spent the time to figure out how to test for it in some kind of loop and defect if the opponent was random. Looping in lisp is unnecessarily complicated though, so I didn't bother...

That suggests that a large part of this behavior may have been due to limited amounts of programming time.

3fractalman11y
My own strategy-if i'd felt like joining-would have started by checking the opponent against defectbot; anybody who cooperates with even one of the 5 or so defect-varaints is clearly insane and not to be trusted; defect immediately. OR maybe take a detour and check if they're actually a mimicbot that got unlucky. well...I might have played Hawkbot instead; Hawkbot spends 9.999 seconds dithering before finally defecting, so it does insanely well against any opponent that tries to run it.
3hylleddin11y
Several of the bots using simulation also used multithreading to create timer processes so they can quit and defect against anyone who took to long to simulate. I was also thinking of doing something similar, which was to infinite loop if the opposing programs code was small enough, since that probably meant something more complex was simulating it against something.
2fractalman11y
Ah, so multithreading was allowed...so instead of dithering for 9.999 seconds, it would have called forkbomb(9.9seconds)
0Sniffnoy11y
Quinn appears to have submitted something similar. As far as I can tell, against CooperateBot it cooperates; otherwise, it waits 9 seconds before defecting. (The weird timing conditional in there should never return true if things are working properly, and checking the results, indeed, it defected against everything but the three CooperateBots.)
4Quinn11y
Unless I'm going insane, Luke_A_Somers' comment below is incorrect. If you defect and your opponent times out, you still get 1 point and they get 0, which is marginally better than both getting 1 point in the case of mutual defection. That was the purpose my (sleep 9). I figured anyone who was going to eval me twice against anything other than CooperateBot is going to figure out that I'm a jerk and defect against me, so I'm only getting one point from them anyway. Might as well lower their score by 1. My assumption might not have been correct though. In the original scoreboard, T (who times out against me) actually does cooperate with K, who defected against everybody! The bizarre-looking always-false conditional was a long shot at detecting simulation. I heard an idea at a LW meetup (maybe from So8res?) that players might remove sleeps from each other's code before calling eval. I figured I might as well fake such players out if there were any (there were not).
0wedrifid11y
Anybody who runs the other program without an interrupt timer and so times out is, as you put it, "clearly insane". Hawkbot gets 1 against any sane opponent that tries to run it.
0fractalman11y
I reread the payoff matrix some time after posting that....and the defect-bot test was only the first in a series of tests I had planned out.
0BloodyShrimp11y
Forget sanity--if your opponent fails to play, the most you can get is one point. Hawkbot is awful.
0Luke_A_Somers11y
That wouldn't work well since timing out causes both players to get 0. You'd do better just defecting.
0fractalman11y
It does? The payoff matrix had entries for.. ... double checks original contest rules OH. I see. I misread how the payoffs work when "other" occurs.
-1Kawoomba11y
Collaborateurbot: Times out against everyone except the bot it is secretly helping. If it's not against the letter of the rules, it's fair game from a "rational = winning" perspective.
1[anonymous]11y
I suspect that sort of thing is why the rules forbade multiple entrants.
-3Kawoomba11y
Are you saying my Grandma can't submit an entry? How will we tell her?
1Viliam_Bur11y
It is a nice mathematical model of the valley of bad rationality.

Fantastic work, thank you =)

For anyone else unpacking the zip file, note you'll want to create a new directory to unzip it into, not just unzip it in the middle of your home folder.

Wait, someone actually submitted CooperateBot? That wasn't just a default entry?

Thoughts on a possible second tournament:

  • I like Luke Somers's suggestion that each matchup should be run several times to smooth out randomness (obviously, bots don't get access to earlier runs, this isn't iterated prisoner's dilemma)
  • Should it include default entries? Perhaps a CooperateBot, a DefectBot, and a RandomBot? Or perhaps just a RandomBot?
  • Should we maybe restrict to a subset of Scheme to make things easier for those of us who want to do static analysis? :) (Ye
... (read more)
0Quinn11y
I was a bit irritated that no fewer than three players submitted CooperateBot, as I cooperated with it in hopes of tricking bots like submission B. EDIT: More offense was taken at my use of the word "trolls" than was intended.
8evec11y
Why does submitting CooperateBot to a competition that does not include it make someone a troll? Would submitting DefectBot make one a troll, too? (I believe the competition should have automatically included one CooperateBot and DefectBot each, and stated that this was the case at the beginning. I am sad there were three CooperateBots and no DefectBots.)
5Qiaochu_Yuan11y
I think the idea is that someone submitting CooperateBot is not trying to win. But I find this a poor excuse. (It's why I always give the maximum possible number whenever I play "guess 2/3rds of the average.") I agree that the competition should have been seeded with some reasonable default bots. Submitting DefectBot makes you a CDT agent, not a troll.

As one of the players who submitted a cooperatebot (yes, I see the irony), allow me to explain my reasoning for doing so. I scoped the comments to see what bots were being suggested (mimicbots, prudentbots, etc) and I saw much more focus on trying to enforce mutual cooperation than trying to exploit bots that can be exploited. I metagamed accordingly, hypothesizing that the other bots would cooperate with a cooperatebot but possibly fail to cooperate with each other. My hypothesis was incorrect, but worth testing IMO.

5Quinn11y
Thank you.
-2Eugine_Nier11y
There's a problem with that logic. Did you really expect the people designing exploitation bots to talk about them publicly?
2wubbles11y
How do we draw the line? Tit-for-Tat is very simple, yet does very well. Arguably before knowing how it performs it could be considered a troll.

Considering this was an experimental tournament, learning how certain strategies perform against others seems far more interesting to me than winning, and I can't imagine any strategy I would label as a troll submission. Even strategies solely designed to be obstacles are valid and valuable contributions, and the fact that random strategies skew the results is a fault of the tournament rules and not of the strategies themselves.

-2Quinn11y
Can you elaborate on this? You are right that I used the inflammatory t-word because CooperateBot submitters are probably not trying to win. I certainly expected to see DefectBots (or CliqueBots from optimists), and agree that the competition should have been seeded with both CooperateBots and DefectBots. But I don't understand this at all: To me, this looks like t-wording the people who play 0. Are we thinking of the same game, where the payout is constant?
7Qiaochu_Yuan11y
Yes, that game. My point is that complaining "that's not fair, X player wasn't playing to win" is a failure to think like reality. You know that you're playing against humans, and that humans do lots of things, including playing games in a way that isn't playing to win. You should be taking that into account when modeling the likely distribution of opponents you're going to face. This is especially true if there isn't a strong incentive to play to win.
1Quinn11y
Yes, and I agree with this. I'm familiar with Straw Vulcanism and accept that guessing incorrectly is my mistake, not others'. It seems anger and frustration were read into my comment, when in fact I was merely surprised, so I've edited out the offending t-word.
0hylleddin11y
I would have liked to see a proper DefectBot as well, however contenstant K defected every time and only one of the bots that cooperated with it would have defected against DefectBot, so it makes a fairly close proxy.
4wedrifid11y
Be sure to distinguish the "think you're wrong" and the "find it offensive" components of the responses.
2Quinn11y
Yes, the "here's why Quinn is wrong about CooperateBot being a troll submission" comments were valuable, so I don't regret provoking them. Presumably if my comment had said "players" instead of "trolls" from the outset, it would have been ignored for being inoffensive and content-free. But provoking a few good comments was a happy accident from my perspective. I will avoid casual use of the word "troll" on this site, unless I have a specific purpose in mind.
2Illano11y
As one of the people who submitted a CoopBot. I had three primary motivations. The first came from a simulation of the competition I created in Python, where I created copies of a lot of the strategies mentioned in the thread comments (excluding of course the ones with a nebulous perfectlyPredictOpponent() function). Given that a lot of the strategies discussed involved checking for cooperation with a coopBot, playing an actual coopBot ranked relatively highly in my test competitions, failing primarily against random or defectbots, and it seemed less exploitable by higher level bots, if they still wanted to be able to trick the lower level bots. The second motivation was to use the coopBot as a baseline to see how well the other bots really did perform. The third motivation was simple: I don't really know Scheme, so most of my more complex bots from my Python simulation proved too difficult to implement in the time I allocated to work on this.
1Decius11y
There exist LW readers who believe that Cooperate is the dominant strategy in PD.
1Kawoomba11y
Humans don't only play to win, coupled with a comparatively obscure choice of programming language, simple bots will be in the majority.
1Decius11y
Plus even a flowchart of the ideal strategy is hard to create without using black boxes or brute-force iterating every possible proof of something. EDIT: Oh, and the humans played to win a game different from the nominal game. That distinction is important, because you can only observe them in the actual game and not the nominal one. (At CFAR workshops, PD for low stakes resulted in universal cooperation when it was done openly; the first time it was done semi-anonymously, there was a defector. There were several other differences as well, but the nominal stakes were similar.)
0wedrifid11y
Can you provide links to comments that are examples of this? If so, I'd like to look at them to confirm and then to make sure I have downvoted and/or corrected them as necessary. Without seeing the examples I assign roughly 0.5 probability to them existing, with most of the remaining probability mass going to "There exist comments that defy CDT in a way that some readers may consider to be equivalent to declaring that Cooperate is dominant but is in fact not a claim about strategic dominance at all".
0Decius11y
I had that discussion in person, so I can't point to comments. There may have been some confusion about the claims made, such that the claim I understood them to be making is different from the claim that they made, and the specific game in question was 2-iteration PD with the nonstandard payoff matrix (50,50; 100,0; 0;100; 25,25), but I think they were using cached thoughts for the discussion, rather than something that applied to the specific case but not the general.
1wedrifid11y
Anyone who submits a strategy that cooperates with CooperateBot has no place shaming others as 'trolls' for cooperating. If you are trying to 'trick' others and guess their responses incorrectly then the joke is entirely on you!

:(

I didn't realise that when you said programs should assume eval would work normally, that meant I needed to make my simulated eval take a namespace as an optional argument (and use the proper namespace containing the standard library as a default). So my program crashed whenever it simulated something that uses (eval x). There goes my clever algorithm, lol

2Roxolan11y
What did you intend your program to do? I tried to work it out from the source code but it scared me off.
7nshepperd11y
Well, now the contest's over I can publish an expanded version of my source code with comments. Basically it was a sort-of variation of PrudentBot designed to get cooperation from So8res's stupid pseudo-mimics. The original variation had two conditions: if ((they cooperate with us if we cooperate) and (they defect against defectbot)) then cooperate; else defect. The second condition allows you to exploit stupid bots like cooperatebot, but since doing that causes all justicebots, etc etc to defect, I ended up removing that. (Though, looks like that was a bad idea in the end, since there were three cooperatebots submitted!) Well, it's not quite a true PrudentBot since PrudentBot tries to prove unconditional cooperation, rather than cooperation-if-I-cooperate.
2AlexMennen11y
Sorry about that. I was trying to warn people ahead of time about bugs so they could fix them, but I didn't run sophisticated enough tests to catch that.

Oh well. I don't think I would have won anyway. The problem seems to be that the REPL behaves very differently to non-REPL contexts, which I certainly wasn't expecting. Actually, it looks like eval in the REPL uses the REPL namespace as a default (so for example you can type (eval '(define a 2)) (write a) and see a 2 printed out) which seems rather unconscionable to me¹, especially since the REPL namespace includes a boatload of miscellaneous modules such as racket/pretty and racket/function and probably others as well.

Possibly next time it would be useful to publish a "test harness" ahead of time which runs contestants against each other in a clean (or at least well-specified) namespace, and make it clear that the actual contest would be run using this code.

¹ Actually, now that I've noticed this, it looks like you could use this to write a bot that corrupts the global (REPL) namespace, and therefore the namespace of anyone trying to simulate it, maybe even forcing the opponent to cooperate by rewriting if or something. Ha!

0[anonymous]11y
I did look briefly for ways of forcing a bot running my program as a simulation to cooperate, but that sort of thing needs way too much knowledge of the programming environment for me to have a hope of succeeding. An early attempt I had for a bot tried to save state between simulations inside the RNG seed, but that didn't work out either...

If only THE BLACK KNIGHT had won the tournament, I would've revealed my identity then. This way, the identity of THE BLACK KNIGHT shalt forevermore remain a mystery. Giddyup!

In more seriousness though, how could this happen, being worse than random? Just lacking a clause filtering for the string 'random' (that could be exploitable, too), or alternatively, not simulating the opponent, say, 10 times, then going with the majority behavior if there is one, and if there's not (6-5, 5-5, 4-6), treating the opponent as randombot. Won't filter all randombots, of c... (read more)

Given Rice's theorem, there is no source code analysis that is guaranteed to tell you anything interesting about your opponent, unless your opponent has been written in such a way as to facilitate that analysis. This suggests two possible modifications of the competition.

Firstly, the rule could be imposed that source code is unanalysable. A competing program will not be furnished with its opponent's source, but only an opaque blob. The only operation they can perform on this blob is to supply it with another blob and see what answer it gives. Each program ... (read more)

0Richard_Kennaway11y
Thinking about this further, given enough assertions, there's no need to have the programs at all. Let the agents be just the assertions that they make about themselves. Each agent would consist of a set of axioms, perhaps written in Prolog, about who they cooperate or defect against, and running a contest between two agents would just be a matter of taking the union of the two sets of axioms and attempting to deduce (by the standard Prolog proof-search) what choice each one makes.
1skepsci11y
What happens if the two sets of axioms are individually consistent, but together are inconsistent?
0Richard_Kennaway11y
Deem both of the agents to have not terminated?
1skepsci11y
How will you know? The set of consistent axiom systems is undecidable. (Though the set of inconsistent axioms systems is computably enumerable.)
0Richard_Kennaway11y
You just run the Prolog (or whatever logic system implements all this), and it either terminates with a failure or does not terminate within the time allowed by the competition. The time limit renders everything practically decidable.
2skepsci11y
Perhaps it terminates in the time required proving that A defects and B cooperates, even though the axioms were inconsistent, and one could also have proved that A cooperates and B defects.
1Richard_Kennaway11y
The competitors know the deterministic proof-search algorithm (e.g. that of Prolog), and the first answer that produces within the time limit, that is the answer.
0skepsci11y
What's wrong with "I will cooperate with anyone who verifiably asserts that they cooperate with me". A program could verifiably assert that by being, e.g., cooperatebot. A program could be written that cooperates with any opponent that provides it with a proof that it will cooperate.
0Richard_Kennaway11y
The word "me". By Rice's theorem, they can't tell that they're dealing with someone computationally equivalent to me, and there's no other sense of my "identity" that can be referred to. Athough that could be added. Have all competitors choose a unique name and allow them to verifiably claim to be the owner of their name. Then "I will cooperate with anyone who verifiably asserts that they cooperate with me" can work if "me" is understood to mean "the entity with my name". Discovering a blob's name would have to be a second primitive operation allowed on blobs. ETA: I think I missed your point a little. Yes, a cooperate-bot verifiably asserts that it cooperates with everyone, so it is an entity that "verifiably asserts (something that implies) that they will cooperate with me." And there will be other bots of which this is true. But I'm not sure that I can verifiably express that I will cooperate with the class of all such bots, because "the class of all such bots" looks undecidable.
1skepsci11y
Your source code is your name. Having an additional name would be irrelevant. It is certainly possible for bots to prove they cooperate with a given bot, by looking at that particular bot's source. It would, as you say, be much harder for a bot to prove it cooperates with every bot equivalent to a given bot (in the sense of making the same cooperate/defect decisions vs. every opponent). Rice's theorem may not be as much of an obstruction as you seem to indicate. For example, Rice's theorem doesn't prohibit a bot which proves that it defects against all defectbots, and cooperates with all cooperatebots. Indeed, you can construct an example of such a bot. (Rice's theorem would, however, prevent constructing a bot which cooperates with cooperatebots and defects against everyone else.)

Sorry if this is a stupid question, but this tournament looked to me like a thinly disguised version of:

"Construct a robot that can read code and interpret what it means."

which is a Really Hard Problem.

Is that not a fair description? Was there some other way to approach the problem?

The only way I can see to go about constructing a GOOD entrant to this is to write something that can take as its input the code of the opponent and interpret what it will actually DO, that can recognize the equivalence between (say):

return DEFECT

and

if 1: return ... (read more)

4selbram11y
One good way to interpret code is to run the code with "eval", which many submitted bots did. This method has no problems with the examples you gave. One important place it breaks down is with bots that behave randomly. In that case a robot may, by chance, be simulated to cooperate and defect in whatever sequence would make it seem worth cooperating with even if it actually ends up defecting. This, combined with a little luck, made the random bots come out ahead. There are ways to get around this problem, and a few bots did so, but they still didn't do better random bots because they had less potential for exploitation in this particular pool of entrants.
1BloodyShrimp11y
I agree, my fellow top-ranking-non-source-ignoring player. Saying "nobody could do any better than randomness in this tournament" is strictly true but a bit misleading; the tiny, defect-happy pool with almost 20% random players (the top 3 and also G; he just obfuscated his somewhat) didn't provide a very favorable structure for more intelligent bots to intelligently navigate, but there was still certainly some navigation. I'm pretty pleased with how my bot performed; it never got deterministically CD'd and most of its nonrandom mutual defections were against bots who had some unusual trigger condition for defecting based on source composition, not performance, or had very confused performance triggers (e.g. O--why would you want to play your opponent's anti-defectbot move when you determine they cooperate with cooperatebot?). Some of its mutual defections were certainly due to my detect-size-changes exploit, but so were its many DCs.
0olalonde11y
Perfect simulation is not only really hard, it has been proven to be impossible. See http://en.wikipedia.org/wiki/Halting_problem

Some notes I've made on the entries:

(Note, this is largely from the perspective of "I want to write a bot that works by doing lots of static analysis to the other bots", since I was working on such a bot. In fact I can see now that the idea I had in mind didn't actually make sense, but, oh well, I have an idea for how to fix it for the next contest...)

  • Mutation: Three of the bots -- K, L, and R -- used mutation. N and P used a function "namespace-set-variable-value!"; not sure what that does so I'm not sure if it's really mutating th
... (read more)
0BloodyShrimp11y
In a pool with a sufficient number of close-to-mimicbots, considering use of language features you didn't anticipate or can't handle to be defections seems like a good way to get too many mutual defections to win. Also, not sure what "mutation" means in context. If you mean "almost quine yourself, but write in a few differences", N did that; I can't think of any other reasonable meanings which N does. I used namespace-set-variable-value! merely so that eval could see my variables; in most lisps I've used in the past I could have omitted it entirely.
1Sniffnoy11y
By mutation I mean set!, or anything else that can change the value of a variable; things ending in '!'. namespace-set-variable-value! does technically mutate things but looking at the way people actually used it I don't think it would have caused a problem?
0BloodyShrimp11y
Ah, mutation in the "mutable state" sense. When I was doing some light experimenting with static analysis in the early days of the contest, I looked for variables storing anything involving eval (or involving another variable storing anything involving eval, etc.), and just treated (sounds like that should be "trought" or something) set! calls as define calls--another chance for a variable to become contaminated with evalness. Could you give an example of a case where set! makes things harder to analyze?
0Sniffnoy11y
OK. Let me explain what the basic idea of my program was. The idea was to analyze the opponent's program in an attempt to answer the following question: What is the probability that I can affect the outcome of their program? Now that I've thought about this some more, this makes less sense than I initially thought; if there's a second round (and I have time to program something as complicated as this), I'm going to try the rather more sensible "How much I can I affect the probability distribution of their program's output?" But for now let's go with the original idea, even though it doesn't make sense in general. So the idea was to compute this probability, p, and then cooperate if p>=1/2. (The number 1/2 is not arbitrary and is based on the particular values for T, R, P, and S used. Actually if the numbers were different, there wouldn't be a single p, it would depend on other things, but I'm not going to into that, because this strategy doesn't actually make sense.) So e.g. the strategy should defect against CooperateBot, DefectBot, and any sort of RandomBot; it would cooperate with (and get defected against by) Watson's entry; it would cooperate with (and get cooperated with) by what So8res called a "rank N MimicBot", so long as N>=2; and it would cooperate with what solipsist called a Mimicbot so long as "cooperate anyway" probability was at most 1/2, otherwise it would defect. OK. So let's say I'm running my program and it comes up against one of these MimicBots. It gets to a part that looks like (if (random 100) 'C ((eval opponent) self)) What does it do? Well, it has to check both branches and see that in the first, which occurs with probability 1/100, it can't affect the outcome, and in the second, which occurs with probability 99/100, it can. (Really it's a bit more complicated than that -- how does it get the probabilities? But A. the strategy doesn't actually quite make sense anyway and B. I'm trying to keep this simple.) Thing is, in order to check
0BloodyShrimp11y
I still don't see where set! is relevantly different from define... e.g. (define result (if (random 100) 'C ((eval opponent) self)))
1Sniffnoy11y
In this case (notionally), the (transformed) conditional would end up returning a data structure representing "probability 1/100 of always C, probability 99/100 of C or D depending on what I do", and then result would be set to that data structure. The (transformed) define itself doesn't return a value, but that's OK, it's not supposed to, it's just supposed to influence the value of something else. In the above case, result would first be set to a data structure representing "probability 1 of C" and then to a data structure representing "probability 1 of C or D depending on what I do". Then the (transformed) conditional itself would return something undefined, since the return value of set! is undefined. (And OK, it wasn't supposed to return a value either, but its influence over other parts of the program has been screwed up.) Basically, my (start of a) program was making the assumption that the return value of a thing was what was important -- or at least that, while side effects might exist, they wouldn't screw things up elsewhere. (With a few exceptions for special side-effecty functions which I intended to handle specially.) But allowing mutation would mean having to write my program to keep track of all of the simulated variables, whereas without it I can just focus on return values and let the Scheme interpreter handle that.

First of all, thanks for running the contest Alex and congrats to the winners.

Random behaviour was obviously a good strategy, but I don't feel it teaches us much about making better bots. If I had suspected so much random behaviour, defecting whenever it's detected would have been a good additional test.

Hmm, three way tie for fourth place.

I thought So8res was trying to set up subtle vulnerabilities for the mimicbot clique to exploit himself! My own exploit (store their size and see if it changes; if so defect, on the assumption that most who change their size will only change it once; play small-n mimicbot so you can induce cooperation on their turn and when the exploit doesn't fire at all) seems to have cut the problem a little cleaner, picking up more nonrandom DCs. I had essentially thought of this exploit in the first few days of the contest, but I ha... (read more)

Sorry if this was answered somewhere else, but do you plan on hosting another tournament? If so, when?

0AlexMennen11y
Undecided. I was kind of disappointed with the results of this one, but I'm not optimistic that another one would end up any better unless it has a better structure. I plan to run another tournament if I come up with a way to change the rules that would make it easier to submit effective programs, which I have not made any progress on so far. Some people have made suggestions in the comments, but they range from probably not helpful to difficult to implement.
0MTGandP11y
Would-be submitters have a lot more information going forward. In particular: if you want to win, you have to be able to beat a random bot.

COUNTIF could have made that xls document work without the third and fourth table.

One more question about rules for a second contest: Do we want to keep the rule as stated that "Each player will submit a file containing a single Scheme lambda-function. The function should take one input.", or should we relax it to "Each player will submit a file containing a single Scheme expression, that must evaluate to a function taking one input"?

Benefits of the latter: Allows quines to be written less artificially (although, if we switch to the two-argument version, this becomes unimportant), is just generally more flexible.

Bene... (read more)

[-][anonymous]11y00

-

[This comment is no longer endorsed by its author]Reply

I would have expected a few more metaevaluator bots which clean up the environment to prevent detection. Of course this can be an expensive strategy, and certainly is in programmer time. A metaevaluator bot would have probably broken several recursion detection strategies, or even defining set! out of the language.