However, after working out the math, it appears that the optimal strategy against this one is actually a very nice one.
Of course, the same as in a game of chicken where your opponent precommits to defecting.
In infinite IPD:
Point 2 may not be obvious, but follows straight from the payoff matrix.
Well, yes; I'm assuming that I know the strategy my opponent is playing, which assumes a precommitment. I'm just trying to explain the reasoning in the paper, without going into determinants and Markov chains and so on.
Bill "Numerical Recipes" Press and Freeman "Dyson sphere" Dyson have a new paper on iterated prisoner dilemas (IPD). Interestingly they found new surprising results:
They discuss a special class of strategies - zero determinant (ZD) strategies of which tit-for-tat (TFT) is a special case:
The evolutionary player adjusts his strategy to maximize score, but doesn't take his opponent explicitly into account in another way (hence has "no theory of mind" of the opponent). Possible outcomes are:
A)
B)
This latter case sounds like a formalization of Hosfstadter's superrational agents. The cooperation enforcement via cross-setting the scores is very interesting.
Is this connection true or am I misinterpreting it? (This is not my field and I've only skimmed the paper up to now.) What are the implications for FAI? If we'd get into an IPD situation with an agent for which we simply can not put together a theory of mind, do we have to live with extortion? What would effectively mean to have a useful theory of mind in this case?
The paper ends in a grand style (spoiler alert):