I'm not really able to evaluate the claims in the paper myself, so thanks for the input. Having said that, do you think the paper specifies the strategies with enough detail for us to code them up and test their mettle in a Less Wrong IPD tournament?
I know that this isn't exactly what you're asking, but: Stewart and Plotkin tested two variants of ZD strategies in a variant of Axelrod's original tournament; one variant (ZDGTFT-2) had the highest total score, beating TFT.
Bill "Numerical Recipes" Press and Freeman "Dyson sphere" Dyson have a new paper on iterated prisoner dilemas (IPD). Interestingly they found new surprising results:
They discuss a special class of strategies - zero determinant (ZD) strategies of which tit-for-tat (TFT) is a special case:
The evolutionary player adjusts his strategy to maximize score, but doesn't take his opponent explicitly into account in another way (hence has "no theory of mind" of the opponent). Possible outcomes are:
A)
B)
This latter case sounds like a formalization of Hosfstadter's superrational agents. The cooperation enforcement via cross-setting the scores is very interesting.
Is this connection true or am I misinterpreting it? (This is not my field and I've only skimmed the paper up to now.) What are the implications for FAI? If we'd get into an IPD situation with an agent for which we simply can not put together a theory of mind, do we have to live with extortion? What would effectively mean to have a useful theory of mind in this case?
The paper ends in a grand style (spoiler alert):