Paper: Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent

mapnoterritory

39 Paper: Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent

2nd Jun 2012

2 min read

39

Bill "Numerical Recipes" Press and Freeman "Dyson sphere" Dyson have a new paper on iterated prisoner dilemas (IPD). Interestingly they found new surprising results:

It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores.

They discuss a special class of strategies - zero determinant (ZD) strategies of which tit-for-tat (TFT) is a special case:

The extortionate ZD strategies have the peculiar property of sharply distinguishing between “sentient” players, who have a theory of mind about their opponents, and “evolutionary” players, who may be arbitrarily good at exploring a ﬁtness landscape (either locally or globally), but who have no theory of mind.

The evolutionary player adjusts his strategy to maximize score, but doesn't take his opponent explicitly into account in another way (hence has "no theory of mind" of the opponent). Possible outcomes are:

If X alone is witting of ZD strategies, then IPD reduces to one of two cases, depending on whether Y has a theory of mind. If Y has a theory of mind, then IPD is simply an ultimatum game (15, 16), where X proposes an unfair division and Y can either accept or reject the proposal. If he does not (or if, equivalently, X has ﬁxed her strategy and then gone to lunch), then the game is dilemma-free for Y. He can maximize his own score only by giving X even more; there is no beneﬁt to him in defecting.

If X and Y are both witting of ZD, then they may choose to negotiate to each set the other’s score to the maximum cooperative value. Unlike naive PD, there is no advantage in defection, because neither can affect his or her own score and each can punish any irrational defection by the other. Nor is this equivalent to the classical TFT strategy (7), which produces indeterminate scores if played by both players.

This latter case sounds like a formalization of Hosfstadter's superrational agents. The cooperation enforcement via cross-setting the scores is very interesting.

Is this connection true or am I misinterpreting it? (This is not my field and I've only skimmed the paper up to now.) What are the implications for FAI? If we'd get into an IPD situation with an agent for which we simply can not put together a theory of mind, do we have to live with extortion? What would effectively mean to have a useful theory of mind in this case?

The paper ends in a grand style (spoiler alert):

It is worth contemplating that, though an evolutionary player Y is so easily beaten within the conﬁnes of the IPD game, it is exactly evolution, on the hugely larger canvas of DNA-based life, that ultimately has produced X, the player with the mind.

Prisoner's Dilemma

Personal Blog

39

New Comment

Rendering 0/19 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 2:28 AM

Moderation Log

39 Paper: Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent

by mapnoterritory

2nd Jun 2012

2 min read

39

Bill "Numerical Recipes" Press and Freeman "Dyson sphere" Dyson have a new paper on iterated prisoner dilemas (IPD). Interestingly they found new surprising results:

It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores.

They discuss a special class of strategies - zero determinant (ZD) strategies of which tit-for-tat (TFT) is a special case:

The extortionate ZD strategies have the peculiar property of sharply distinguishing between “sentient” players, who have a theory of mind about their opponents, and “evolutionary” players, who may be arbitrarily good at exploring a ﬁtness landscape (either locally or globally), but who have no theory of mind.

If X alone is witting of ZD strategies, then IPD reduces to one of two cases, depending on whether Y has a theory of mind. If Y has a theory of mind, then IPD is simply an ultimatum game (15, 16), where X proposes an unfair division and Y can either accept or reject the proposal. If he does not (or if, equivalently, X has ﬁxed her strategy and then gone to lunch), then the game is dilemma-free for Y. He can maximize his own score only by giving X even more; there is no beneﬁt to him in defecting.

If X and Y are both witting of ZD, then they may choose to negotiate to each set the other’s score to the maximum cooperative value. Unlike naive PD, there is no advantage in defection, because neither can affect his or her own score and each can punish any irrational defection by the other. Nor is this equivalent to the classical TFT strategy (7), which produces indeterminate scores if played by both players.

This latter case sounds like a formalization of Hosfstadter's superrational agents. The cooperation enforcement via cross-setting the scores is very interesting.

The paper ends in a grand style (spoiler alert):

It is worth contemplating that, though an evolutionary player Y is so easily beaten within the conﬁnes of the IPD game, it is exactly evolution, on the hugely larger canvas of DNA-based life, that ultimately has produced X, the player with the mind.

Prisoner's Dilemma

Personal Blog

39

New Comment

Rendering 0/19 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 2:28 AM

Moderation Log

More from mapnoterritory

Curated and popular this week

19Comments

Comment Permalink

Kindly14y90

Okay, so this is what happens with the PD strategy in this comment.

Let's try to get an optimal counter-strategy (CS) to the probabilistic strategy above (PS). We work backwards. Suppose we've worked out CS's behavior for the last N-1 turns. Then on the Nth turn, in each of the four possible situations, the probabilities above, and what we've found for CS's behavior, can be used to get us expected payouts for the remainder of the match if we cooperate and if we defect. We choose the action that yields the larger expected payout. This is the optimal strategy to use against this opponent if we want to get a high score.

Note that since PS is stupid and does the same thing on every turn, CS should just defect on the last turn.

However, after working out the math, it appears that CS is actually a very nice strategy. It defects on the last turn, and also on the next-to-last turn if it finds itself in a "CC" situation; in all other cases, it cooperates.

It's obvious that PS, which has some probability of defecting, will win the match against CS, because it's effectively playing against a cooperative rock. In other words, if you play against this strategy and try to maximize your own score, your opponent will have a higher score.

This isn't as ridiculous as it appears! CS isn't "losing" in any significant sense, because the goal we gave it wasn't to win the match; it was to get as many points as possible. In an infinite Prisoner's Dilemma (which is the situation considered in the paper), this is the only reasonable thing to ask, because there's no match to be won. So the "extortion" of PS is actually that if you try to maximize your points against it, it will get even more points than you will.

Andreas_Giger14y20

However, after working out the math, it appears that the optimal strategy against this one is actually a very nice one.

Of course, the same as in a game of chicken where your opponent precommits to defecting.

In infinite IPD:

There are lots of probabilistic strategies your opponent can precommit to that prevent you from averaging CC (in this case: 3).
If your opponent accepts any probabilistic precommitment from you without precommiting himself, you can maximise your score beyond CC.
If you model your opponent as a probabilistic strategy, you accept any

... (read more)

See in context