ThisSpaceAvailable comments on Announcing the 2014 program equilibrium iterated PD tournament - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (61)
I'm having trouble comprehending what running one of these bots would consist of. Suppose MyBot is put against OpponentBot. So MyBot is going to run a simulation of OpponentBot, and that instance of OpponentBot is going to run a simulation of MyBot, and that instance is going to run a simulation of OpponentBot ... how does this process terminate? My head hurts.
I would guess that in that case, you run out of time and get penalized. Which suggests a strategy: Program your own bot to wait until the last possible millisecond before transmitting its choice.
What do you mean, "in that case"? My issue is that I don't see how two bots can both simulate each other. I don't see what I presented as a special case, but the general case.
Just because your bot has the option of simulating its opponent doesn't mean that you would necessarily program it to do so. Evidently there is a time limit in the game so if your bot went overboard simulating it would end up running out of time and being penalized a lot of the time.
For example one possible strategy is "transparent tit for tat," i.e. program your bot not to simulate anything at all but to simply play tit for tat. With the hope that other bots will simulate yours and figure out that the best strategy is repeated cooperation.
Transparent tit-for-tat is a decent enough strategy, but it is easily beaten 1-on-1 by "cooperate on every turn except the very last one", and against many other simple bots by "TFT but defect on the last turn".
That being said, when the game is 100 turns long, it's pretty hard to do significantly better than Tit-for-Tat - the aforementioned strategies only beat TFT with a score of 302-297.
I think that's probably a good reason to shorten the round length from 100 down to, say, 10 or 20 - otherwise the variance from a single RandomBot would drown out that score difference.
That's fixable by not letting the bot know how many round are there other than, say, more than 20 and less than a hundred.
In that case the game becomes significantly less interesting, because there's far less you can do to improve on TFT and similar strategies, because TFT is a Nash Equilibrium. Granted, you can still include provisions to detect against RandomBots, CooperateBots, and DefectBots, but the finitely repeated PD is significantly more interesting.
"Interesting" is in the eye of the beholder. Not knowing the exact number of rounds adds complexity. I doubt that adding complexity means "there's far less you can do to improve".
(1) Saying "complexity" is not an argument.
(2) The unknown horizon case is actually strictly less complex, because now each round is the same w.r.t. the future, whereas for the finite horizon case each round has a distinctly different future.
(3) You haven't countered the point that TFT is a Nash Equilibrium in the infinite-horizon case - that makes it fairly trivial and obvious.
(4) http://lesswrong.com/lw/to/the_truly_iterated_prisoners_dilemma/
(1) It's a about the same kind of argument as saying "less interesting" :-P
(2) No, each round is not the same because the horizon is not unknown, it's just that instead of a fixed number you have a probability distribution to deal with.
(3) We are not talking about the infinite-horizon case.
(4) Yes, and?
I agree, I was simply giving an example of a strategy which would not lead to the sort of infinite regress envisioned by ThisSpaceAvailable.
I do think it's worth noting that "TFT but defect on the last turn" would be beaten 1-on-1 by "TFT but defect on the last two turns." And both of them would be beaten 1-on-1 by "TFT but defect on the last 50 turns."
In fact, in a 1-on-1 game, it seems to me the best strategy would seem to be to always defect and hope the other fellow is stupid enough to cooperate at least once. Actually it seems pretty much pointless to talk about 1-on-1 prisoner's dilemma, because it turns things back into a zero sum game.
If you think about it as win/lose, then yes, of course. However, it's still instructive to ask the question "what's the best response strategy"? In other words, "what strategy will maximise my utility against this given, fixed opponent strategy?".
In that sense, the best response to TFT is indeed "TFT but defect on the last turn", because "TFT but defect on the last two turns" does strictly worse when playing against TFT.
Fortunately, you can indeed do significantly better than TFT in this game!
But what if you are playing against "TFT but defect on the last turn"? In that case, the best strategy is TFT-2. And if the other fellow is playing TFT-2, the best strategy is TFT-3. And so on. Agreed?
Seems to me it depends on what strategies the other contestants are playing.
Oh, yes, of course; the best response to TFT-1 is clearly TFT-2, and so on.
As for how well strategies do, while it's clear that it will depend on the strategies of other contestants and in that sense there cannot be a "best strategy", I think one can do better - for example, if there's a Nash Equilibrium strategy that isn't simply (Defect, Defect).
At a bare minimum, you can improve upon TFT by also making sure it defects against CooperateBots, and doesn't wait until the second turn to defect against DefectBots. Of course, there may indeed be JusticeBots out there who punish you for defecting against CooperateBots...
That's assuming that the new algorithm can correctly identify its opponents. Also, if other algorithms correctly sense that yours is opportunistic, they might change their strategy to your detriment.
Each bot can run a simulation of the other if the circumstances under which they run their opponents are not the same.
For example, JusticeBot does indeed run a simulation of its opponent, but this is not a problem because JusticeBot simulates what the opponent will do against CooperateBot, not against JusticeBot.
Similarly, an agent could run a simulation of its opponent against itself, but with a different history.
But that different history would have to result in not running a further simulation.
Not immediately, but yes, the recursion would eventually need to hit a base case.
If you run out of time, you defect. If your opponent simulates you without a time limit, and you take long enough to cause ver to run out of time, ve'll defect, which isn't what you want.
But there's a function that lets you run a function for up to a specified number of microseconds, and tells you what it returns or whether it times out.
I'm not sure about that - the original post says this:
So presumably most bots will be set up to make a choice within the time limit.
Right, so the practical effect of the strategy I proposed would be to deny opponents knowledge. Of course, one can envision situations where you want to be transparent.