Okay, so this is what happens with the PD strategy in this comment.
Let's try to get an optimal counter-strategy (CS) to the probabilistic strategy above (PS). We work backwards. Suppose we've worked out CS's behavior for the last N-1 turns. Then on the Nth turn, in each of the four possible situations, the probabilities above, and what we've found for CS's behavior, can be used to get us expected payouts for the remainder of the match if we cooperate and if we defect. We choose the action that yields the larger expected payout. This is the optimal strategy to use against this opponent if we want to get a high score.
Note that since PS is stupid and does the same thing on every turn, CS should just defect on the last turn.
However, after working out the math, it appears that CS is actually a very nice strategy. It defects on the last turn, and also on the next-to-last turn if it finds itself in a "CC" situation; in all other cases, it cooperates.
It's obvious that PS, which has some probability of defecting, will win the match against CS, because it's effectively playing against a cooperative rock. In other words, if you play against this strategy and try to maximize your own score, your opponent will have a higher score.
This isn't as ridiculous as it appears! CS isn't "losing" in any significant sense, because the goal we gave it wasn't to win the match; it was to get as many points as possible. In an infinite Prisoner's Dilemma (which is the situation considered in the paper), this is the only reasonable thing to ask, because there's no match to be won. So the "extortion" of PS is actually that if you try to maximize your points against it, it will get even more points than you will.
However, after working out the math, it appears that the optimal strategy against this one is actually a very nice one.
Of course, the same as in a game of chicken where your opponent precommits to defecting.
In infinite IPD:
Bill "Numerical Recipes" Press and Freeman "Dyson sphere" Dyson have a new paper on iterated prisoner dilemas (IPD). Interestingly they found new surprising results:
They discuss a special class of strategies - zero determinant (ZD) strategies of which tit-for-tat (TFT) is a special case:
The evolutionary player adjusts his strategy to maximize score, but doesn't take his opponent explicitly into account in another way (hence has "no theory of mind" of the opponent). Possible outcomes are:
A)
B)
This latter case sounds like a formalization of Hosfstadter's superrational agents. The cooperation enforcement via cross-setting the scores is very interesting.
Is this connection true or am I misinterpreting it? (This is not my field and I've only skimmed the paper up to now.) What are the implications for FAI? If we'd get into an IPD situation with an agent for which we simply can not put together a theory of mind, do we have to live with extortion? What would effectively mean to have a useful theory of mind in this case?
The paper ends in a grand style (spoiler alert):