LESSWRONG
is fundraising!
LW

How do we really escape Prisoners' Dilemmas? — LessWrong

3 How do we really escape Prisoners' Dilemmas?

by drnickbone

31st Aug 2012

10 min read

3

Personal Blog

3

New Comment

57 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:55 AM

[-]buybuydandavis13y140

For instance, a more sophisticated strategy than Tit For Tat (TFT) would determine when it has reached the last iteration, and then defect. Call this TFT-1. But then once TFT-1 has established itself, a strategy which detects and defects the last two iterations (TFT-2) would establish itself, and so on.

Since prisoners' dilemmas are always finite in practice, and always have been (we are mortal, and the Sun will blow up at some point), this raises the question of why we actually co-operate in practice. Why is TFT, or something very like it, still around?

Finite life does not imply that you know up front which iteration will be the last, so there is no cascade of defections walking back from the last iteration.

[-]drnickbone13y20

The usual Nash equilibrium argument doesn't depend on players knowing when the end is... there just has to be some upper bound on the number of iterations. This is sufficient to show that any true equilibrium for the game must involve both players always defecting. It is mathematically rigorous.

My point here is that any strategy which is stable to evolution (usually called an ESS) is a Nash equilibrium, so we need some sort of explanation why we don't see an ESS for real life prisoner's dilemmas. Just observing that the game is iterated isn't enough.

[-]Unnamed13y10

The usual Nash equilibrium argument doesn't depend on players knowing when the end is... there just has to be some upper bound on the number of iterations. This is sufficient to show that any true equilibrium for the game must involve both players always defecting. It is mathematically rigorous.

This seems like it would depend on the assumptions. Suppose that you are playing an iterated PD where, after each round, there is a 10% chance that the game stops and a 90% chance that it continues. Does the proof still apply?

Edit: The more general point is that this proof by backwards induction, which is central to this post, sounds extremely fragile. If I'm reconstructing it correctly in my head, it depends on there being some round where, if you reached it, you would be absolutely certain that it was the last round. It also depends on you being absolutely certain that the other person would defect in that round.* If either probability shifts from zero to epsilon, the proof breaks. And with any agent that was programmed as haphazardly as humans were, you have to deal in probabilities rather than logical certainties.

* It seems that it would also depend on them knowing (with certainty) that you know that they'll defect, and so on up to as many meta-levels as there are rounds. And does it also depend on you and the other person having the same belief about the maximum possible number of rounds? If you think "there's no way that we'll play a trillion rounds" and I think "there's no way that we'll play 2 trillion rounds", would the proof still go through?

[-]drnickbone13y00

If after each round, there is a 90% chance of continuation, then this is an infinite iterated prisoner's dilemma, and you are right, the backwards induction doesn't apply in that case. But my point was that this never applies in the real world... there are always finite upper bounds, and in that case a strategy like TFT is vulnerable to invasion by a variant (or "mutant") strategy which defects on the last round. There is no need to assume a biological mutation by the way... It could be a cultural mutation, learned response which others copy etc.

By the way, the "common knowledge" objection to backwards induction doesn't work, though I've seen it before. This is because the demonstration that an ESS must be a Nash equilibrium doesn't require any strong common knowledge assumptions about the problem statement, the rationality of the other players etc. You just need to consider what happens if a population adopts a strategy which is not a Nash equilibrium, and then mutants following a superior strategy show up...

[-]Unnamed13y50

If there is a 90% chance of continuation and a maximum of 10,000 rounds, then a variant of TFT that always defects on the 10,000th round has less than a 10^-400 chance of behaving differently from TFT. In practice, TFT would be evolutionarily stable against this variant, since our universe hasn't lasted long enough for these two strategies to be distinguishable.

If there is a shortcut that makes it possible to skip this infeasible evolutionary process, I suspect that it would need to involve strong assumptions about common knowledge.

In the real world, I think it is relatively rare to reach a last round and know that it is the last round (and rarer still to know that a certain round is the next-to-last round, or the third-from-last), which limits the advantages of strategies based on backwards induction. Our ancestors lived in smallish gossipy groups, which meant that few of their interactions were isolated prisoners' dilemmas (with no indirect costs of defection).

[-]drnickbone13y00

If there is a shortcut that makes it possible to skip this infeasible evolutionary process, I suspect that it would need to involve strong assumptions about common knowledge.

I've been thinking about this, and here are some possible shortcuts. Consider a strategy which I will call "Grumpy-Old-Man" or GOM-n. This behaves like TFT for the first n rounds, and then defects afterwards.

In your model, GOM-n for n = 150 would be able to drift into a population of TFT (since it has a very small fitness penalty of about 1 in 10 million, which is small enough to allow drift). If it did drift in and stabilize, there would then be selection pressure to slowly reduce the n, to GOM-149, then GOM-148 and so on.

Worse still, consider a mutant that produces GOM-150 but has selective advantages in earlier life, at a cost of crippling the TFT machinery in later life. (Not implausible because mutations often do have several effects). Then it could enter a population with a positive selective advantage, and clear the way for n to slowly reduce.

The particular example I was thinking of was a variant which is very good at detecting sneaky defection, disguised to look like co-operation. But then, as a side-effect in later life, everything starts to look like defection, so it becomes grumpy and stops co-operating itself. I know one or two folks like this...

[-]drnickbone13y00

I see your point here about dependency on assumptions. You describe a finite iterated model with consistently high probability of future interaction, but which then suddenly drops to zero at a high upper bound. (There is no "senescence" effect, with the probability of future interaction declining down to zero). Agreed that in that model, there is minuscule chance of anyone ever reaching the upper bound, so a strategy which isn't strictly an ESS won't be replaced. Thanks for the proof of concept - upvoted.

That model (and ones like it with slow senescence) would match a case described in my original article, and which I mentioned below in my reply to Randaly. Basically, TDT-1 never gets a hold, because it is too unlikely that "this time is the last".

But my skepticism about this as an explanation remains: we do find real occasions where we know that "this time is the last" (or the only) interaction, and while our behaviour changes somewhat, we don't automatically defect in those cases. This suggests to me that a variant strategy of "Detect if this time is the last, and if so defect" would indeed be possibe, but for some (other) reason is not favoured.

[-]GeraldMonroe13y10

But do you actually know that? I mean, in your evolutionary past, there were people alive who committed betrayals that they thought they had gotten away with. They didn't, and those people are less likely to be your ancestors.

So people's brains, presumably with circuitry tuned to follow TFT because their ancestors used it, warns them not to commit acts of betrayal that in reality they could get away with. So they commit less acts of perfect betrayal than they otherwise would.

Like most other instincts, it can be overcome with learned behavior, which is why people exist who do betray others every chance they get and get away with it.

[-]drnickbone13y00

The fact that people will sometimes get it wrong (predict they can get away with betrayal, but can't) is not a problem. It's really a balance of fitness question (in cases where there s high probability of getting away with a defection gain, vs small probability of getting caught. Consider that the waiter you don't tip might just chase you out of the restaurant with a gun. Probably won't though.) Evolution would still favour last-round defections in such cases.

[-]GeraldMonroe13y00

I'm saying that in the past, if you committed a major betrayal against your tribe - they kill you. It wouldn't even what you stole or who you raped, etc, it's the fact that you were willing to do it against the tribe. So, even in last round cases where you might think you got away with it, the times that you fail to get away with it erase your gains.

Look what happens to powerful people in today's society if they get caught with some relatively minor transgression. So what if a Congressman sends naked pictures of himself to potential mates, or coaxes a female intern to give him a BJ? But, in both cases, the politician was betraying implicit promises and social norms for behavior that the voters want to see in a person in elected office.

[-]drnickbone13y00

This might be true... If the punishment for defection is always very severe after getting caught, then even with a very low probability of getting caught, but a low gain from defecting, evolution would favour co-operating on the last round (or single round) rather than defection. But this means that others' commitments to vengeance have transformed the prisoner's dilemma to a non-PD, which is my explanation 2 in the original article. (Or explanation 3 if vengeance is exacted by the whole tribe, including members who weren't directly injured by the original defection.)

However, this just shifts the burden of explanation to accounting for why we (or whole tribes) are vengeful to such an extreme extent. After all, vengeance is enormously costly, and risks injury (the condemned can fight back) or counter-vengeance (whoever kills the original defector risks being killed in turn by the defector's surviving family, and then the whole tribe splits apart in a cyle of killing). And notice that at that point, the original defection has already happened, so can't be deterred any more, and the injury-risking, potentially-tribe-splitting vengeance has negative fitness. The tribe's already in trouble - because of the betrayal - and the vengeance cycle could now destroy it. So why does it happen? What selection pressure maintains such severe punishment when it is fitness destroying?

[-]Unnamed13y00

Short answer: we're adaptation executors, not fitness maximizers.

Isolated prisoners' dilemmas were rare in the ancestral environment - most PD-like interactions took place in a social environment where they would (at least in expectation) have indirect effects either within a particular relationship or on one's broader reputation (and thus shared features with an indefinitely iterated PD). So the advantages of being good at PD-in-a-social-context far outweighed the possible benefits of consistently defecting in the rare truly one-shot PD. That means most of evolution's optimization power went towards building in adaptations that were good at PD-in-a-social-context, even if the adaptation made one less likely to defect in a truly one-shot PD.

For example, people tend to internalize their reputation, and to feel bad when others disapprove of them (either a particular close other or one's broader reputation). Having a model of how others will react to your behavior, which is readily accessible and closely tied to your motivations, is very useful for PD-in-a-social-context, but it will make it harder to defect in a one-shot PD.

Another adaptation is the capability of feeling close to another individual, in such a way that you like & trust them and feel motivated to do things that help them. This adaptation probably involved repurposing the machinery that makes parents love their offspring (it involves the hormone Oxytocin), and it makes it harder to defect on the last turn. For actions towards one's offspring, evolution didn't want us to defect on the last turn. Adding last-turn defection in non-kin relationships seems like a lot of complexity for a low return, with a potentially high cost if the adaptation isn't narrowly targeted and has collateral damage towards kin or earlier turns.

There are also various specific emotions which encourage TFT-like-behavior, like gratitude and vindictiveness. Someone who spends their last words on their deathbed praising the person who helped them, or cursing the person who cheated them, is cooperating in a PD. They are spreading accurate reputational information, and probably also strengthening the social rewards system by increasing people's expectations that good behavior will be socially rewarded or bad behavior will be socially punished. Even if these deathbed acts don't benefit the individual, they arise from emotions that did benefit the individual (making others more likely to help them, or less likely to cheat them). And again, when these emotions were in development by natural selection, deathbed turn-off was probably a relatively low-priority feature to add (although there does seem to be some tendency for vindictiveness to get turned off when someone is dying - I'm not sure if that's related).

[-]drnickbone13y00

Short answer: we're adaptation executors, not fitness maximizers.

I fully get the point, but this doesn't by itself explain why superior adaptations haven't come along. Basically, we need to consider a "constraint on perfection" argument here, and ask what may be the constraints concerned in this case. It is generally possible to test the proposals.

Some obvious (standard) proposals are:

1) Mutations can't arise to turn TFT into TFT-1

This is a bit unlikely for the reasons I already discussed. We do seem to have slightly different behaviour in the one-shot (or last-round) cases, so it is not implausible that some "mutant" would knock out co-operation completely on the last round (or on all rounds after a certain age - see my Grumpy-Old-Man idea above). There is a special concern when we allow for "cultural" mutations (or learned responses which can be imitated) as well as "biological" mutations.

2) Additional costs

The argument here is that TFT-1 has an additional cost penalty, because of the complexity overhead of successfully detecting the last round (or the only round), and the large negative cost of getting it wrong. Again it faces the objection that we do appear to behave slightly differently in last (or only) rounds, whereas if it were truly too difficult to discrimate, we'd have the same behaviour as on regular rounds.

3) Time-lags

This is the argument that we are adapted for an environment which has recently shifted, so cases of single-round (or known last-round) Prisoner's Dilemma are much more common than before and evolution hasn't caught up.

This might be testable by directly comparing behaviours of people living in conditions closer to Paleolithic versus industrialized conditions. Are there any differences in reactions when they are presented with one-shot prisoner's dilemmas? If one-shot PD is a new phenomenon, then we might expect "Paleo-people" to instinctively co-operate, whereas westerners think a bit then defect (indicating that a learned response is overriding an instinctive response). This strikes me as somewhat unlikely (I think it's more likely that the instinct is to defect, because there is a pattern-match to "not a member of my tribe", whereas industrialized westerners have been conditioned to co-operate, at least some of the time). But it's testable.

A variant of this is Randaly's suggestion that true last-rounds are indeed new, because of the effect of retaliation against family (which has only recently been prohibited). This has a nice feature, that in cases where the last round truly was the last (because there was no family left), the mutant wouldn't spread.

4) Side effects

Perhaps the mutations that would turn TFT into TFT-1 have other undesirable side effects? This is the counter-argument to Grumpy-Old-Man mutants invading because they have other positive side effects. Difficult to test this one until we know what range of mutations are possible (and whether we are considering biological or cultural ones).

[-]Randaly13y00

I don't think it was particularly central; while he did give it as an argument, drnickbone also gave examples of people cooperating on one-shot PD's, both in formal experiments and in practice (eg choosing to tip at a foreign restaurant to waiter who will never be seen again.)

[-][anonymous]13y00

My point here is that any strategy which is stable to evolution (usually called an ESS) is a Nash equilibrium,

No, it isn't. Where did you get the idea that it was?

EDIT: More specifically, the Nash Equilibrium in a one-shot or known-length game is not the same as an ESS in a version where the length of iterations is not known. If you think otherwise, put your "always defect" algorithm into an IPD contest and watch it lose miserably.

[This comment is no longer endorsed by its author]Reply

[-]Viliam_Bur13y00

Finite life does not imply that you know up front which iteration will be the last

Yes, but the older you are, the higher is the probability -- could we design an experiment to check for this (old people are more likely to defect in Prisonners' Dilemmas)?

[-]drnickbone13y20

Thanks for this suggestion.

I also suggested to look for such a phenomenon in vampire bats, and other reciprocating species. Do bats stop co-operating after a certain age? (Or do other bats stop co-operating with them?)

In my experience, old people are LESS likely to defect in Prisoner's dilemma, as judged by real-life instances. And other people are less likely to defect when interacting with them. This fact is worthy of some explanation, as it's not what the basic theory of reciprocal altruism would predict.

The best explanation I've heard so far on the thread is that it is because of reputation post-mortem affecting relatives. This requires a social context where the "sins of the father are visited on the son" (to quote Randaly's example).

[-]Rhwawn13y20

One potential confound is that the rewards may not scale right: the older you are, often the wealthier you are. A kindergartner might be thrilled to defect for $1, while an old person can barely be troubled to stoop for a $1 bill.

[-]buybuydandavis13y00

I was responding to the particular one factor argument claiming hat defection was the rational strategy, which wasn't correct even with that factor in isolation.

For your point, as your probability of dying increases, so too does your need for cooperation to avoid it. The closer you are to risk of dying, the more likely you will need help to avoid it, so the more you would want to encourage cooperation. Again, the argument that it is rational to defect does not hold from this factor alone either.

But it isn't that it's necessarily rational to cooperate either - it's just that the trade off from defection versus cooperation is an empirical matter of all the factors in the situation, and arguments from based on one factor alone aren't decisive, even when they are correct.

As for an experiment, it wouldn't show what it is rational to do, only what people in fact do. If you had lived a life of cooperation, encouraging others to cooperate, and denouncing those who don't, the consistency bias would make it less likely that you would change that behavior despite any mistakenly perceived benefit.

There would be a billion and one factors involved, not the least of which would be the particulars of the experiment chosen. Maybe you found in the lab, in your experiment, that age correlated with defection. It's quite a leap to generalize that to a propensity to defect in real life.

[-]Thomas13y50

For instance, we can tell when we are visiting a restaurant we will never visit again (on a trip abroad say), but are still likely to tip.

I remember an article in the SciAm, about 30 years ago. That waitresses on the highway restaurants are in fact tipped LESS.

[-]GeraldMonroe13y20

But less is different from never. A rational choice would be to never tip at a restaurant you never visit again.

The reason people still tip is because of evolved mechanisms that make them feel guilty for betraying the waitress even when rationally they will face no negative consequences for doing it.

And this mechanism, in turn, is hard wired in to encourage you to play fair even when higher areas of your brain determine their is no reason to in this situation.

[-]Vladimir_Nesov13y30

A rational choice would be to never tip at a restaurant you never visit again.

This is debatable. You prefer tips to be made in the (counterfactual) hypothetical where you work at that restaurant, so to the extent there is a priori uncertainty about whether you would be working at a restaurant vs. be a customer who never visits again, there is potentially an opportunity for increasing expected utility by transferring value between these hypotheticals.

[-]wedrifid13y40

This is debatable. You prefer tips to be made in the (counterfactual) hypothetical where you work at that restaurant, so to the extent there is a priori uncertainty about whether you would be working at a restaurant vs. be a customer who never visits again, there is potentially an opportunity for positive sum transfer of expected utility between these hypotheticals.

No, Gerald is correct. Given a known culture with known typical behavior of tipping by the other (human) customers and known consequences (or lack thereof) of not tipping after a single visit it is an error to use updateless considerations as an excuse to give away money. UDT does not cooperate with CooperateBot (or anonymous restaurant staff). If all the human customers and waiters with their cultural indoctrination were discarded and replaced with agents like itself then the question becomes somewhat more open.

[-]Vladimir_Nesov13y00

(I edited the last sentence for clarity since you've quoted it.)

My point was not that the situation is analogous to PD (the waiter doesn't play, it's a one player decision, not a two player game). It's the uncertainty about utility of waiter's profit that UDT considerations apply to. If you are the waiter, then you value waiter's profit, otherwise you don't (for the purposes of the thought experiment). In PD, you don't care about CooperateBot's winnings.

The analogy is with Counterfactual Mugging. The coin toss (a priori uncertainty) is whether you would become a customer or a waiter, the observation is that you are in fact a customer, and a relevant UDT consideration is that you should optimize expected utility across both of the hypotheticals, where in one of them you are a customer and in the other a waiter. By giving the tip, you subtract utility from your hypothetical where you are a customer, and transfer it to the other hypothetical where you are a waiter (that is, in the hypothetical where you are a customer, the utility becomes less if customers give tips; and in the hypothetical where you are a waiter, the utility becomes greater if customers give tips).

I don't know which direction is more valuable: for a waiter to tip the customer or conversely. It might be that the predictable effect is too small to matter. I certainly don't understand this situation enough to settle on a recommendation in either direction. My point is that the situation is more complex than it may seem, so a policy chosen in absence of these considerations shouldn't be seen as definitive.

it is an error to use updateless considerations as an excuse to give away money

It is an error to use excuses in general. It is enlightening to work on better understanding of what given considerations actually imply.

[-]wedrifid13y10

My point was not that the situation is analogous to PD (the waiter doesn't play, it's a one player decision, not a two player game).

Not true (that it is single player game), but this is tangential.

It's the uncertainty about utility of waiter's profit that UDT considerations apply to. If you are the waiter, then you value waiter's profit, otherwise you don't (for the purposes of the thought experiment). In PD, you don't care about CooperateBot's winnings.

My previous response applies. In particular that this consideration only applies after you discard key features of the problem---that is, you make all the other relevant participants in the game rational agents rather than humans with known cultural programming. In the actual problem you have no more reason to (act as if you) believe you are (or could be a priori) the waiter than to believe you are the cow that you are served or the fork you use to eat the slaughtered, barbecued cow.

It is enlightening to work on better understanding of what given considerations actually imply.

These considerations don't apply. This is just another example of the all too common use of "Oooh, Deep Timeless Updateless Reflective. Cooperate, morality, hugs!" when the actual situation would prompt a much more straightforward but less 'nice' solution.

[-]Vladimir_Nesov13y20

My point was not that the situation is analogous to PD (the waiter doesn't play, it's a one player decision, not a two player game).

Not true (that it is single player game), but this is tangential.

Well, it seems obvious to me that this is a one player game, so for me it's not tangential, it's very important for me to correct the error on this. As I see it, the only decision here is whether to tip, and this decision is made by the customer. Where is the other player, what is its action?

make all the other relevant participants in the game rational agents rather than humans with known cultural programming. In the actual problem you have no more reason to (act as if you) believe you are (or could be a priori) the waiter than to believe you are the cow that you are served or the fork you use to eat the slaughtered, barbecued cow.

Rationality of the other participants is only relevant to the choice of their actions, and no actions of the waiter are involved in this thought experiment (as far as I can see or stipulate in my interpretation). So indeed the waiter is analogous to a cow in this respect, as a cow's inability to make good decisions is equally irrelevant. It's value of personal prosperity that the hypotheticals compare. The distinction I'm drawing attention to is how you care about yourself vs. how you could counterfactually care about the waiter if you were the waiter (or a cow if you were the cow), not how you make decisions yourself vs. how the waiter (or a cow) makes decisions.

It is enlightening to work on better understanding of what given considerations actually imply.

These considerations don't apply.

That's exactly the question I'm considering. I'm not sure if they apply or not, or what they suggest if they do, I don't know how to think about this problem so as to see this clearly. You insist that they don't, but that doesn't help me if you don't help me understand how they don't.

One sense of "applying" for an idea is when you can make novel conclusions about a problem by making an analogy with the idea. Since I'm not making novel conclusions (any conclusions!), in this sense the idea indeed doesn't apply. What I am insisting on is that my state of knowledge doesn't justify certainty in the decision in question, and I'm skeptical of certainty in others being justified.

[-]Vladimir_Nesov13y00

This is just another example of the all too common use of "Oooh, Deep Timeless Updateless Reflective. Cooperate, morality, hugs!" when the actual situation would prompt a much more straightforward but less 'nice' solution.

(It may sound unlikely, but I'm almost certain I'm indifferent to conclusions on things like this in the sense that I'm mostly interested in what decision theory itself says, and much less in what I'd do in practice with that information. The trouble is that I don't understand decision theory well enough, and so going against emotional response is no more comforting than going with it.)

[-]Richard_Kennaway13y20

A second objection is that this reputational theory still doesn't cover end-of-life effects: why are we worried at all about our reputation when death is near? (Why do we even worry more about our reputation in such cases?)

Because we -- most people, at any rate -- care about what happens after our death. Given that fact that we do care, "why do we care?" is a question to be answered with an explanation for this fact, not a rhetorical question suggesting that we obviously should not.

[-]drnickbone13y20

The "why are we worried" question here is precisely calling for an explanation of the fact that we are worried about our post-mortem reputation. It is not denying that fact, nor arguing that we shouldn't be.

The best explanation I've seen so far is the one below from Randaly... we have evolved to care about post-mortem reputation, because of the possibility of vengeance against our family.

[-]Randaly13y20

A few, disorganized notes:

The difficulty with this explanation is that humans can (often) recognize when "this time is the last", and the computational cost of doing something different in that case is not great.

Biological altruism is not unique to humans; other animals, whose adaptations for altruism, are presumably much worse at conscious considerations of this kind. In addition, the common nature of one-shot PD-esque encounters, where you'll never see the other player again, is in many ways a unique byproduct of the modern world, meaning that there would be little selection pressure for an adaption that defects on the last round.

And we often change our behaviour radically when we know we are going to die soon, but this change rarely involves antisocial behaviour like stealing, mugging, running up huge debts we'll never have to pay back and so on.

Evolutionarily speaking, what matters is the genes; even after a person dies, their foes can retaliate against their kids.

But there is a major problem with this story: the "emotionally committed" ancestors could be out-competed in turn by bluffers. Anyone who could fake the emotional signals would be able to elicit the benefits of co-operation (they would successfully deter defection), but without having to follow through on the (costly) commitments in case the co-operation failed. Bluffing out-competes commitment.

This is a strictly theoretical response; empirically, it appears that people are willing to punish defectors, even at cost to themselves. (There are a couple of ways around the theoretical objection; one is that anybody who failed to punish would immediately become a target for defection in future rounds or by other people; another is that it's possible that the social group (or tribe or whatever) would join in the punishment, or even punish non-punishers, in order to prevent defection. I have no evidence whatsoever for either of these possible mechanisms, and do not strongly believe in either or them.)

But a more basic objection is "How did we ever get into a social environment where third party reputation matters like this?" Consider for instance a small society involving Anne, Bob, and Charles. Anne and Bob are engaging in an iterated prisoners' dilemma, and regularly co-operating. Bob and Charles meet in a one-shot prisoners' dilemma, and Bob defects. Anne sees this. How does it help Anne in this situation to start defecting against Bob? Generally it doesn't. A reputational system only helps if it identifies and isolates people who won't co-operate at all (the pure defectors). But Bob is not a pure defector, so why does he end up being penalized by Anne?

I feel like this abstracts too far away from reality. Nobody is actually a pure defector or pure cooperator; their decision to defect or not to defect against one person provides evidence to others about how likely/under what circumstances they will defect.

[-]drnickbone13y10

Biological altruism is not unique to humans; other animals, whose adaptations for altruism, are presumably much worse at conscious considerations of this kind. In addition, the common nature of one-shot PD-esque encounters, where you'll never see the other player again, is in many ways a unique byproduct of the modern world, meaning that there would be little selection pressure for an adaption that defects on the last round.

Note that I did discuss an argument like this (such that the variant which defects on the last round is more complicated, and the last round or single-round case happens rarely enough that it can't be selected for). But it strikes me as implausible, particularly for human beings. There's something "off" with the theory of reciprocal altruism if it relies on that defence against last round defection : the prediction would be that the cleverer the species, then the more likely they are to detect last-round cases, and the more likely reciprocal altruism is to collapse over time. So we would see less reciprocal altruism in cleverer species, but the opposite is true.

Incidentally, we could try to test the theory in vampire bats (also known to practice reciprocal altruism). Do bats stop reciprocating after a certain age (because death now too close)? Is there any selection pressure for that age to slowly creep forwards in a given population to younger and younger ages? If so, is there a counter-pressure which stops the creep forward, and what is it?

Evolutionarily speaking, what matters is the genes; even after a person dies, their foes can retaliate against their kids.

Yes, that's theoretically sound i.e. this could stop last-round defection ever entering the population. Again it could be tested in vampire bats and other reciprocating species.

It's kind of a grizzly thought though, isn't it? That reciprocal altruism is maintained by vengeance against a whole family... It suggests that cultures which taboo such vengeance will eventually collapse (from last round defection creeping forwards), but there's no evidence of that as far as I can see. Perhaps the taboo is too recent, and the effect hasn't set in yet.

[-]Randaly13y30

[first quote]

Yes, that's theoretically sound i.e. this could stop last-round defection ever entering the population. Again it could be tested in vampire bats and other reciprocating species.

Well, we know that a fair number of historical human legal systems allowed the sins of the father to be visited on their children:

In Japan, Samurai revenge killings could also kill the perpetrator's family.
In the Bible/Torah, while humans are banned from killing a son for the sins of his father, it's also stated that God will punish the next four generators for the sins of one person. (It's also suggestive that they needed to ban killing the son for the sins of the father.)
In Afghanistan, the cultural practice of using daughters as currency to settle disputes of their relatives is called baad; the girl is then typically abused. (I am assuming that, since they are being treated as property, they can be confiscated after death to settle debts just like any other property.)
In Hinduism the Laws of Manu (4-173) state that "If (the punishment falls) not on (the offender) himself, (it falls) on his sons, if not on the sons, (at least) on his grandson."
The Taoist Treatise on Response and Retribution states that "If at death an unexpiated offence be left, the evil luck will be transferred to children and grandchildren. Moreover, all those who wrongly seize others' property may have to compensate for it, with wives or children or other family members, the expiation to be proportionate up to a punishment by death."

(As an aside, thank you for your fantastically well-written and well-thought out comment.)

[-]orthonormal13y10

Firstly, "always defect" loses hard in evolutionary tournaments with fixed but large iteration numbers. See here for a great example; in general (and if populations could grow back from tiny proportions), what you see is a Paper-Rock-Scissors cycle between TFT, "TFT until the last turn, then defect", "TFT until the second-to-last turn, then defect", and so on (but not all the way back to the beginning; TFT trumps one of the early-defectors after only a few steps, where the number depends on the payoff matrix).

Secondly, as an intuition pump, an iterated tournament in which every round has a 1% chance of being the last is one in which it's hard to beat TFT. And capping it at 1000 rounds only changes the outcome in a minuscule number of cases, so there's no practical difference between TFT, "TFT until turn 1000, then defect", "TFT until round 999, then defect", and so on until we reach strategies that are dominated by TFT again.

Does that make sense?

[-]drnickbone13y00

Thanks for the link. One issue with these tournaments is that the strategies are submitted in advance, and then one "wins". Whereas, in evolutionary terms, you'd have new submissions after the "winner" dominates the population. There is nothing obvious to stop a population of TFT being invaded by TFT-1, which in turn is invaded by TFT-2, which is in turn invaded by TFT-3 and so on.

You argue that TFT-n, for some n is then invaded by TFT, so there is a "rock, scissors, paper" cycle, but how does that work? A solitary TFT will co-operate one more time than the surrounding population of TFT-n, and will meet defection in that final co-operation so it has a strictly lower fitness than TFT-n. So it can't invade.

Possible solutions to this are if a group of TFTs show up in a TFT-n population, and have most of their interactions with each other, or at least enough interactions with each other to outweigh the lower utilty against TFT-n. That is in effect a Group Selection argument (which I discussed in my original article, part 4), and I agree it could work, but I'm a bit concerned about relying on such arguments. The standard treatment of an ESS assumes that the "invader" has almost all of its interactions against the existing strategy.

On the "intuition pump", I noticed this case in my reply to Unnamed and Randaly earlier, and in my original article, part 1:

This is most plausibly the case where there is a very large upper bound on iterations (such as 100 years), but the upper bound is so rarely (if ever) reached in practice, that strategies which do something different in the final phase just don't have a selective advantage compared to the cost of the additional complexity. So the replacement of TFT by TFT-1 never happens.

My concern was that we do, in fact, find cases where we know we are in the final round (or the only round) and our behaviour is, in fact, a bit different in such cases (we co-operate less, or have something like a 50% chance of co-operating vs 50% of defecting). But we don't always defect in that final round. This is an interesting fact that needs explanation.

By the way, other commenters have argued these "known last round" or "known single round" cases are an artefact of current conditions, and wouldn't have occurred in ancestral conditions, which strikes me as an ad hoc response. It's not hard to see such interactions happening in an ancestral context too, such as one-off trades with a nomadic clan, passing through. We probably would trade, rather than just steal from the nomads (and risk them staying and fighting, which is strictly irrational from their point of view, but rather likely to happen). Or consider finding a tribal colleague alone in a desert with a very-lethal-looking wound (missing legs, blood everywhere) crying in pain and asking for some water. Very safe to walk away in that case, since very high chance that no-one would ever know. But we wouldn't do it.

[-]orthonormal13y10

You argue that TFT-n, for some n is then invaded by TFT, so there is a "rock, scissors, paper" cycle, but how does that work? A solitary TFT will co-operate one more time than the surrounding population of TFT-n, and will meet defection in that final co-operation so it has a strictly lower fitness than TFT-n. So it can't invade.

Oops, you're right- there's a minimum "foothold" proportion (depending on the payoffs and on n) that's required. But if foothold-sized cliques of various TFT-n agents are periodically added (i.e. random mutations), then you get that cycle again—and in the right part of the cycle, it is individually beneficial to be TFT rather than TFT-n, since TFT-n never gets cooperation on any of the last (n-1) turns.

On your other point, it's worth noting that organisms are adaptation-executors, not fitness-maximizers; it seems easier to evolve generally altruistic values (combined with a memory of past defectors to avoid getting exploited by them or by similar agents again) than to evolve a full calculator for when defection would be truly without cost.

[-]drnickbone13y00

This "clique" solution has some problems. First, a single mutant can't form a clique. OK, but maybe the mutant is interacting with nearby individuals, some of whom also share the mutation? That works if the nearby partners are relatives, but the difficulty there is that kin selection would already be favouring co-operation with neighbours, so how does TFT get an advantage? You can juggle with the pay-offs and the "shadow of the future" probability to try and get this to work (i.e. find a set of parameters where co-operation with neighbours via kin selection is not favoured, whereas TFT is), but it all looks a bit shaky.

Andreas Griger below suggests that the TFT mutants preferentially interact with each other rather than the TFT-n (or DefectBots) around them. This is another solution, though it adds to the overhead/complexity of a successful invader. However, it does lead to a nice testable prediction: species which practice reciprocation with non-relatives will also practice partner selection.

The point about adaptation executors not being fitness maximizers was also brought up by Unnamed below though see my response. The general issue is that citing the link is not an all-purpose excuse for maladaptation (what Richard Dawkins once referred to as the "evolution has bungled again" explanation). In particular you might want to see a paper by Fehr and Henrich Is Strong Reciprocity a maladaptation? which looks at the maladaptation hypothesis in detail and shows that it just doesn't fit the evidence. Definitely worth a read if you have time.

[-]Unnamed13y00

By the way, other commenters have argued these "known last round" or "known single round" cases are an artefact of current conditions, and wouldn't have occurred in ancestral conditions, which strikes me as an ad hoc response. It's not hard to see such interactions happening in an ancestral context too, such as one-off trades with a nomadic clan, passing through. We probably would trade, rather than just steal from the nomads (and risk them staying and fighting, which is strictly irrational from their point of view, but rather likely to happen). Or consider finding a tribal colleague alone in a desert with a very-lethal-looking wound (missing legs, blood everywhere) crying in pain and asking for some water. Very safe to walk away in that case, since very high chance that no-one would ever know. But we wouldn't do it.

The point isn't that one-shot PDs never arose, or that it was always rational to cooperate. The point is that one-shot PDs were rare, compared to other interactions, and it was often rational to cooperate (because our ancestors evolved adaptations and developed a social structure which turned potential one-shot PDs into something else). And since the mechanisms that influence human behavior (natural selection, emotions, heuristics, reinforcement learning, etc.) don't have perfect fine-grained control which allows them to optimize every single individual decision, we often wind up with cooperation even in the cases that are true one-shot PDs.

[-]GeraldMonroe13y10

The answer to this problem is elementary. First of all, very rarely are true prisoner's dilemma's going to happen in the real world. Apparently, during the time periods when humans evolved, "one-shot" interactions happened less often than interactions where the victim or his or her relatives got a chance to retaliate for the betrayal.

Also, keep in mind that ultimately the human beings are simply physical manifestations of the information in their genome - you and I are ultimately robots that attempt to copy our information to the next generation, and have no other purpose. So the back and forth of the prisoner's dilemma does not take place over a single life span - it should be thought of as a continuous series of interactions that have been occurring since the first primates. The only reason why betrayals ever happen at all can be explained by imperfections in the process of passing on information about past interactions.

Furthermore, it could be plausibly argued that defections are harmful to the species itself, and individuals who betray or cheat in a given scenario would face punishment by other members of the same tribe.

This quite simply and rationally explains why humans have innate psychological needs to get revenge, and can attempt methods of revenge that exceed the original insult. These powerful urges to get revenge could be viewed as genetic code to discourage defectors and thus make defectors less likely to breed in the future.

Obviously, sexual betrayals are among the strongest forms of betrayal, and this would explain why homicides over infidelity take place. If you think about it rationally, killing another member of your tribe for acts of sexual intercourse than have a probability less than 1 of even resulting in a out of pair pregnancy doesn't make sense. A fertile adult is worth many children worth of resources due to mortality to a tribe. But if the murder reduces the alleles of the betray-er in the population, thus reducing the prevalence of the actual genes making betrayals more likely, then the population as a whole benefits.

This also explains why in some cases victims of a betrayal will attempt revenge at the cost of their own life. If they are able to kill the betray-er and thus reduce the frequency of the betray-er's alleles in the population, and the majority of the population has alleles more like the victim, then this trade-off makes rational sense.

[-]drnickbone13y20

This sounds like a Group-selection argument (revenge is for the benefit of the tribe, or even for the benefit of the species; it doesn't actually help the avenger). See my point 4.

That's different from the standard argument, that having a vengeful nature is of benefit to the individual concerned because others can predict he is likely to take vengeance, so don't cross him in the first place.

[-]wuthefwasthat13y10

Well, for one thing, we don't know how many round there are, ahead of time.

[-]asparisi13y00

I think group selection may have something to recommend to it here.

Let's say that your odds of reproduction go up 2% if you Defect and the other person's go down by 1%, and that the other person's odds of reproduction go up 1% if you cooperate. This creates a pretty standard PD scenario: If you both Cooperate, you both get 1% added to your chance to pass on your genes. If you Cooperate and they Defect, they get a 3% bonus and you take a 1% risk. This reverses if you Defect and they Cooperate. If you both Defect, both of your odds of reproduction go up 1%.

Defectbots quickly take over CooperateBots in this case, but don't beat out TFT. But it still doesn't explain why TFT wouldn't just Defect against TFT.

I can think of a few reasons why it might not, in the case of humans.

For one, whatever algorithm is running might not model it as a finite scenario. This isnt accurate, but it might be beneficial anyway (space isn't flat, but we model it as flat anyway because it was more efficient for hunting gazelle) so the inaccuracy might have helped groups implementing a TFT-like algorithm.

For another (as GeraldMonroe points out) most scenarios aren't PD scenarios. And it's unlikely that we have one part of our brain to help just with PD and another for all other interactions. That'd be more expensive, as far as brains go. So we probably just use the one sort of reasoning, even when it isn't appropriate.

Third, sometimes we do act in ways that TFT would predict. Waitresses being tipped less if they work on highways might not be consistent with TFT (since they shouldn't get tipped at all) but it also isn't consistent with saying that people cooperate the same amount. (since then we wouldn't see a difference) If I had to guess, I'd say that we have some alogrithm that makes an estimate on how likely the scenario is to affect us, probably weighing the counterfactual (we would want to get tipped if we were the waitress) vs. TFT. (we don't expect to benefit from tipping)

[-]prase13y00

Emotional commitments change the pay-offs

Emotional pay-offs aren't the ones which evolution cares about. Emotional commitments are an evolved mechanism which makes us cooperate more, but it wouldn't have evolved if more cooperation wasn't advantageous in the first place.

[-]drnickbone13y00

There are two senses in which commitments could be said to change the payoffs.

Suppose Anne has an emotional commitment mechanism (the full range of gratitude, loyalty, anger, vengeance and so on). Then the subjective utility cost to Anne of defecting against Bob (who is co-operating) is high: it really feels bad. This is the sort of payoff that humans care about, but evolution does not.

But the fact that Anne has this commitment mechanism also changes the objective payoffs to Bob, namely the likelihood that he survives by co-operating or defecting, or the expected number of his offspring and other relatives; basically the Darwinian utility function for Bob (inclusive fitness). This is the part that matters for evolution (at least biological evolution), and is the sense in which the game has shifted so it is no longer a true Prisoner's Dilemma for Bob.

[-]prase13y00

Yes, but does mentioning emotional commitment in the latter sense really help to answer the question of why (apparently) non-Nash strategies have evolved? There is no practical difference for Bob whether Anne plays TFT because her emotional commitment or because a pure game-theoretical calculation. In the last turn Bob should defect. Or put another way: how did emotional commitment first arise?

[-]Kindly13y00

Emotional commitment arises because the 100% foolproof (but, unfortunately, difficult) way to win a Prisoner's Dilemma is to credibly pre-commit to a strategy. Anne's (why not Alice's?) emotional reasons to play TFT are in effect a way to pre-commit to playing TFT; the most Bob can do is defect on the last turn.

If, on the other hand, Anne simply plays TFT because she thinks it's the smart thing to do, then the defect-on-the-last-X-turns strategy can escalate and result in everyone defecting. For that matter, Bob could try something like "If you cooperate when I defect, I'll sometimes cooperate... maybe" and test Anne's stubbornness.

[-]Andreas_Giger13y-20

Since prisoners' dilemmas are always finite in practice, and always have been (we are mortal, and the Sun will blow up at some point), this raises the question of why we actually co-operate in practice. Why is TFT, or something very like it, still around?

In general, Nash equlibria in non-zero-sum games don't mean much if the other player isn't a rational game theoretician. If you can for whatever reasons expect the other player to cooperate the first round, defecting is obviously a mistake.

Also, since you're discussing semi-evolutionary adaption of strategies (like TFT-2D replacing TFT-1D), keep in mind that TFT-nD with sufficiently high n will lose against pure TFT with sufficiently high population. If you're interested in how selective IPD works, you might want to skim over this post.

[-]drnickbone13y00

I've read the link, and my response is similar to the one to orthonormal further up the thread. I'm struggling to understand how TFT can invade a population of TFT-nD, because it will not satisfy what the link calls the "survival rule":

Survival Rule: For A in this scenario not to go extinct regardless of initial population, it must score at least equally high against X as X does against itself, and if it doesn't score higher, it must score at least equally high against itself as X does against itself while not losing direct encounters.

In the particular case, A is TFT and X is TFT-nD. But TFT has strictly lower fitness against TFT-nD than TFT-nD has against itself, so it won't pass the survival rule. A rare mutant practising TFT in a population of TFT-nD will go extinct.

I can't see any way round this except some sort of Group selection argument, whereby the TFT mutants cluster and interact mostly with each other until they meet the Threshold rule, at which point they can go on to dominate the whole population:

Threshold rule: If A fulfills the conditions for dominance but not the conditions for survival (i.e. it scores less against X than X does against itself), it will need a certain threshold to avoid extinction and achieve dominance.

Such a Group selection approach could work, but it seems frankly dubious. While it could happen once or twice (e.g. enough to get TFT going in the first place), it looks like a real stretch to claim it hapens repeatedly, so ensuring a rock-scissors-paper cycle is preserved between TFT and various flavours of TFT-nD.

Am I just missing something really obvious here? Hardly anyone else on the thread seems to recognise this as a problem.

[-]Andreas_Giger13y-10

You are correct in saying that TFT cannot strictly evolve (starting with 0 population) from TFT-nD in that case. However, for increasing n and number of rounds the extinction threshold becomes sufficiently small. Maybe more importantly, unlike in the type of tournaments discussed in the link, real people can identify other people outside the game. There's nothing to stop a player from playing different strategies depending on which group of people the other player belongs to. If we assume a prehistoric tribe where people defect a lot, just two players cooperating with each other for a little longer than the others have found a winning strategy already. In addition to that, it also becomes attractive for other players to join their little clique.

Basically, this leads us to a form of meta-TFT—"If you defect first in interpersonal interaction, you are not in our little group of cooperators anymore." So in a tribe of people who know each other and need to interact with certain other people within that tribe, cooperation is the winning strategy for everyone. Inter-tribe competition amplifies this.

The larger the tribe and the easier it is to leave the tribe, the smaller the benefits of cooperation become. But even in modern society most people are part of such tribe-like groups where they are forced or have sufficient motivation to interact with certain other people—family, kindergarten, school, workplace, clubs, whatever. People learn to achieve their goals within these groups from a very young age by forming cliques, so TFT-like strategies are naturally adopted.

[-]drnickbone13y10

Thanks for acknowledging that there is an issue here, and something worth explaining. Upvoted.

Your suggested explanation for how TFT invades TFT-nD requires something more than just TFT here. As well as a cluster entry (at least two initially, not just one mutant), it also requires an ability to select preferred partners (i.e. the two co-operators preferentially select each other), and a reputational system (to help decide which partners to select).

This raises a question which could be tested: do all species that engage in reciprocal altruism have those additional features i.e. preferred partners and reputation? Do vampire bats? (It seems quite an overhead for a bat, doesn't it?) Can TFT plausibly invade with fewer features?

Another concern would be how TFT enters a population of DefectBots in the first place... It would require a major (and implausible) mutation to introduce all these features at once. Even TFT by itself (without the extra features) is significantly more complicated than a DefectBot, which raises an origin question : what series of mutations can put TFT together starting from a DefectBot, and how are the intermediates favoured? Does machinery for kin selection need to evolve first and then get re-purposed? (This leads to another prediction, that reciprocating species also have to practice kin selection, or at least have ancestors which did).

[-]Andreas_Giger13y-20

I disagree with you that defecting is the default action for animals in state/herd/pack/tribe-like communities. Unless you want to discuss how these kind of communities could form in the first place, it seems to me that the question how TFT can prohibit TFT-nD from invading is much more relevant than how TFT can invade TFT-nD. And that is ultimately the point—as I've explained above, for a tribe-forming species TFT (or meta-TFT) is an evolutionary stable strategy.

[-]drnickbone13y00

I disagree with you that defecting is the default action for animals in state/herd/pack/tribe-like communities. Unless you want to discuss how these kind of communities could form in the first place,

Are you claiming here that all herd or pack species are practising TFT, or that it is the default for herd/pack species? That seems empirically dubious: my understanding was that herd or pack species are mainly held together either by kin selection (the pack consists of close relatives) or by simple mutualism (e.g. being in the herd protects against predation, and it would be suicide to leave) rather than by something as sophisticated as TFT. It's a while since I looked at the literature, but species practising reciprocal altruism with non-relatives seem to be fairly rare. But if you can cite studies, that would be helpful.

[-]wedrifid13y-10

If you can for whatever reasons expect the other player to cooperate the first round, defecting is obviously a mistake.

No it isn't. All else being equal knowing that the other player will cooperate on the first round (independently of what you do on the first round) is a reason to defect. It is the expectations of conditional cooperations on later rounds that make cooperation seem wise.

[-]Andreas_Giger13y-20

Of course this is a matter of conditional cooperation, this is fixed-length IPD after all. I don't see your point?

[-]wedrifid13y00

Of course this is a matter of conditional cooperation, this is fixed-length IPD after all. I don't see your point?

Your claim, quoted in the grandparent, is false. You should have instead claimed something that is true. This would seem to be the implied point when correcting errors.

[-]Andreas_Giger13y-20

If you don't have anything useful to post, maybe better not post anything?

It is the expectations of conditional cooperations on later rounds that make cooperation seem wise.

No, it is the possibility of conditional cooperations on later rounds that make cooperation seem wise. You don't "expect" anything the first round. Your claim is false; you should have instead claimed something that is true.

All else being equal knowing that the other player will cooperate on the first round (independently of what you do on the first round) is a reason to defect.

If you have no knowledge about your opponent except for his first move, then him cooperating is no reason for you to defect. If anything, you might defect regardless of your opponent's first move, but defecting because of cooperation is irrational and insane. Your claim is false, etc pp.

Excuse me for calling into question your ability to comprehend IPD, but since you were the one to submit DefectBot to the tournament that seems justified to me at this point.

[-]wedrifid13y00

Your original claim:

If you can for whatever reasons expect the other player to cooperate the first round, defecting is obviously a mistake.

...while clearly an error, was at least a comparatively minor one. I, and probably most readers, assumed that you knew the basics but were just slightly lax in your wording. I expected you to simply revise it to, for example, something along the lines of:

If you can for whatever reasons expect the other player to act similarly to TFT-nD with sufficiently small n, defecting is obviously a mistake.

You instead chose to defend the original, overly generalized and unqualified position using a series of non-sequitur status challenges. That wasn't a good decision.

Excuse me for calling into question your ability to comprehend IPD, but since you were the one to submit DefectBot to the tournament that seems justified to me at this point.

As I said, I originally had the impression that you comprehended the basics but were being careless. I now agree that at least one of us is fundamentally confused.

If you have no knowledge about your opponent except for his first move, then him cooperating is no reason for you to defect. If anything, you might defect regardless of your opponent's first move, but defecting because of cooperation is irrational and insane.

No. Knowledge that the other will cooperate on the first round regardless of what you do does, in fact, eliminate one of the strongest reasons for cooperating. In particular, that you believe they are able to make predictions about you and act accordingly---ie. that you expect them to have the same prediction capability that you have ascribed to yourself.

[-]Andreas_Giger13y00

I've read your post a few times, but it still seems like you're saying that the possibility of your opponent being an omniscient maximiser is your main reason for cooperating. So if you knew you were to play against Omega-Clippy, who can predict all your possible actions right from the start, because of that you would play some kind of TFT? Did I get that right?

Moderation Log

More from drnickbone

Curated and popular this week

57Comments

This discussion article was provoked in part by Yvain's post on Main a few weeks ago, and some of the follow-up comments.

EDIT: I've also just noticed that there was a recent sequence rerun on the point about finite iterations. My bad: I simply didn't see the rerun article, as it had already slipped down a couple of pages when I posted. If you down-voted (or didn't read) out of a feeling of "Didn't we just do this?" then sorry.

In any case, one of my main motivations for running this article was point 5 (Does an environment of commitment and reputation create the background against which TDT - or something like it - can easily evolve?) I didn't get any responses on that point, so might try to run it again in a future article.

END EDIT

It is well-known that in a one-shot prisoner's dilemma, the only stable solution (Nash equilibrium) is for both parties to defect. But, perhaps less well-known, this is true for any finite-shot version of the dilemma, or any version where there is a finite upper bound on the number of iterations. For instance, a more sophisticated strategy than Tit For Tat (TFT) would determine when it has reached the last iteration, and then defect. Call this TFT-1. But then once TFT-1 has established itself, a strategy which detects and defects the last two iterations (TFT-2) would establish itself, and so on.

Since prisoners' dilemmas are always finite in practice, and always have been (we are mortal, and the Sun will blow up at some point), this raises the question of why we actually co-operate in practice. Why is TFT, or something very like it, still around?

Somehow, evolution (biological, cultural or both) has managed to engineer into us a strategy which is not a Nash equilibrium. Because any "evolutionarily stable strategy" (as usually defined) is a Nash equilibrium, somehow we have evolved a strategy which is not strictly evolutionarily stable. How could that have happened?

I can think of a few possibilities, and have a view about which of these are more realistic. I'm also wondering if other Less Wrong contributors have seriously thought through the problem, and have alternative suggestions.

1. Strategies like TFT succeed because they are very simple, and the alternatives are too complicated to replace them.

The argument here is that there are big costs to a strategy in "hardware" or "software" complexity, so that a crude strategy will out-compete a more sophisticated strategy. In particular TFT-1 is more complex than TFT and the additional computational costs outweigh the benefits. This is most plausibly the case where there is a very large upper bound on iterations (such as 100 years), but the upper bound is so rarely (if ever) reached in practice, that strategies which do something different in the final phase just don't have a selective advantage compared to the cost of the additional complexity. So the replacement of TFT by TFT-1 never happens.

The difficulty with this explanation is that humans can (often) recognize when "this time is the last", and the computational cost of doing something different in that case is not great. Yet we either don't change, or we change in ways that TFT-1 would not predict. For instance, we can tell when we are visiting a restaurant we will never visit again (on a trip abroad say), but are still likely to tip. Also, it is striking that people co-operate about 50% of the time in known one-shot prisoners' dilemmas and similar games (see this analysis of Split or Steal?). Why 50%, rather than nearly 0%, or nearly 100%? And we often change our behaviour radically when we know we are going to die soon, but this change rarely involves antisocial behaviour like stealing, mugging, running up huge debts we'll never have to pay back and so on.

So I'm not convinced by this "alternatives are too complicated" explanation.

2. Emotional commitments change the pay-offs

Victims of defection don't take it lying down. They react angrily, and vengefully. Even if there are no obvious opportunities for future co-operation, and even where it involves further cost, victims will go out of their way to attempt to hurt the defector. On the nicer side, emotions of friendliness, indebtedness, duty, loyalty, admiration or love can cause us to go out of our way to reward co-operators, again even if there are no obvious opportunities for future co-operation.

Given these features of human nature as a background, the pay-offs change in a one-shot or finite-bound prisoner's dilemma, and may convert it to a non-dilemma. The pay-off for co-operating becomes greater than the pay-off for defection. This "solves" the problem of why we co-operate in a PD by denying it - effectively there wasn't a true Prisoner's Dilemma in the first place.

There are a number of difficulties with this "solution", one being that even allowing for emotional reactions, there are some true PDs and we can usually recognize them. Scenarios such as the foreign restaurant, where we know we will not be pursued across the world by a vengeful waiter demanding pay-back for a missing tip. So why don't we always defect in such cases? Why is there a voice of conscience telling us not to? Perhaps this objection could be solved by the "too complicated" response. For example, a strategy which could reliably detect when it is safe to defect (no vengeful payback) would in principle work, but it is likely to have a large complexity overhead. And a strategy which almost works (sometimes thinks it can "get away with it" but actually can't) may have a big negative payoff, so there is no smooth evolutionary pathway towards the "perfect" strategy.

A further difficulty is to explain why humans react in this convenient pay-off-shifting fashion anyway. On one level, it is obvious: we are committed to doing so by strong emotions. Even when we suspect that emotions of vengeance and duty are "irrational" (all pain to us from now on, no gain) we can't help ourselves. Yet, it is this emotional commitment that increases the likelihood that others co-operate with us in the first place. So we can tell a plausible-sounding story about how ancestors with emotional commitments induced more co-operation from their fellows than those without, and hence the "irrationally emotional" ancestors out-competed the "coldly rational" non-ancestors.

But there is a major problem with this story: the "emotionally committed" ancestors could be out-competed in turn by bluffers. Anyone who could fake the emotional signals would be able to elicit the benefits of co-operation (they would successfully deter defection), but without having to follow through on the (costly) commitments in case the co-operation failed. Bluffing out-competes commitment.

Ahh, but if the bluff has been called, and the threatened vengeance (or promised loyalty) doesn't materialise, won't this lead to more defection? So won't people who genuinely follow-through on their commitments succeed at the expense of the bluffers? The answer is yes, but again only in the case of iterated interactions, and only in a potentially infinite scenario. The problem of the finite bound returns: it is always better to "bluff" a commitment on the very last interaction. And once bluffing on the last turn has been established, it is better to bluff on the next-to-last. And so on, leading to bluffing on all turns. And then there is no advantage in believing the bluffs, so no deterrent effect, and (in the final equilibrium), no advantage in making the bluffs either. The only true equilibrium has no commitment, no deterrence and no co-operation.

Again, we can try to rescue the "commitment" theory by recourse to the "too complicated" theory. Quite possibly, alternatives to true commitment are very costly in hardware or software: it is just too hard to bluff convincingly and successfully. That might be true, but on the other hand, there are plenty of poker players and con artists who would say differently.

3. Social pressures and reputational effects change the pay-offs

Human decisions to co-operate or defect are very rarely made in isolation, and this could help explain why we co-operate even though we know (or can predict) "this time is the last". We won't benefit from defection if we simultaneously gain reputations as defectors.

As in explanation 2, the effect of this social pressure is to change the pay-off matrix. Although there may appear to be a benefit from one-shot/last-shot defecting, in a social context where our actions are known (and defections by us will lead to defections by third parties against us), then there is a greater pay-off from co-operating rather than defecting.

Once again this "solves" the problem of why we co-operate in PDs by denying it. Once again it faces the objection that there are true PDs (involving secret defection) and we can recognize them, but often don't defect in them. Again, perhaps this objection could be met by the "too complicated" response; it is just too hard to tell when the defection is really secret.

A second objection is that this reputational theory still doesn't cover end-of-life effects: why are we worried at all about our reputation when death is near? (Why do we even worry more about our reputation in such cases?)

But a more basic objection is "How did we ever get into a social environment where third party reputation matters like this?" Consider for instance a small society involving Anne, Bob, and Charles. Anne and Bob are engaging in an iterated prisoners' dilemma, and regularly co-operating. Bob and Charles meet in a one-shot prisoners' dilemma, and Bob defects. Anne sees this. How does it help Anne in this situation to start defecting against Bob? Generally it doesn't. A reputational system only helps if it identifies and isolates people who won't co-operate at all (the pure defectors). But Bob is not a pure defector, so why does he end up being penalized by Anne?

Perhaps the relevant model is where Anne hasn't interacted with Bob yet at all, but there is a new opportunity for iterated co-operation coming up. By seeing Bob defect against Charles, Anne gets evidence that Bob is a defector rather than a co-operator, so she won't even start to co-operate with him. If Anne could discriminate a bit more clearly, she would see that Bob is not a pure defector, but she can't. And this is enough to penalize Bob for defecting against Charles. Possibly that works, but I'm doubtful if these "new opportunity to co-operate," cases occur often enough in practice to really penalize one-shot defection (which is observed at exactly the time needed to spoil the opportunity). Or more to the point, did they occur often enough in human history and pre-history to matter?

But suppose for the moment that we have an explanation for how the reputational system arises and persists. Then the reputational effect will apply to commitments as well: individuals won't benefit if they are identified as bluffers, so truly committed individuals (with strong emotions) benefit over those who are known to fake emotions, or to "coldly" override their emotions. So a reputational explanation for co-operation can strengthen a commitment explanation for co-operation. Or in the other direction, any emotional commitments (to principles of justice, disgust at exploitation etc.) can reinforce the reputational system. So it seems we have two somewhat dubious mechanisms which could nevertheless reinforce each other and build to a strong mechanism. Perhaps.

4. Group selection

There have been different societies / social groups through history. Perhaps some have had reputational systems which successfully converted Prisoners' Dilemmas into non-Prisoners' Dilemmas, while others haven't, and their members were left with lots of true PDs (and lots of defection). The societies which avoided true PDs experienced less defection, and out-competed the others.

This has a ring of plausibility about it, but suffers from many of the same general problems as any Group-selection theory. Human groups aren't isolated from each other like separate organisms, and don't reproduce like organisms: they exchange members too often.

Still, this solution might address one of the main theoretical objections to Group selection, that "co-operating" groups are unstable to defection (either arising from internal changes, or brought in by new members), and the defection will spread through the group faster than the group can out-reproduce rival groups. Groups with the right reputational systems are - by hypothesis - stable against defection. So it might work.

Or perhaps reputational systems aren't quite stable against defection - they eventually collapse because of secret defections, "last time" defections which can't be punished by other members, laziness of other members in enforcing the co-operation, false reputations and so on. This slow erosion eventually kills the group, but not before it has established child groups of some sort. Again perhaps this might work.

5. Prediction and Omegas : from TFT to TDT

One striking feature about both the commitment explanation (2) and the reputational explanation (3) is how they reward successful prediction of human behaviour. This is obvious for commitments: it is the predictable emotional commitment that creates the deterrent against defection (or the lure towards co-operation). And being able to predict who is really vengeful and loyal, and who is just bluffing, gives individuals a further strong advantage.

But this extends to the reputational system too. Suppose Bob defects against Charles, while Charles co-operates. Anne sees this and is disgusted (how could Bob exploit poor Charles like that?). Yet suppose Charles defects as well. Then Anne admires Bob for his prudence (rather than being taken for a ride by that evil Charles). So Bob gets the reputational pay-off precisely when he can successfully predict how Charles will behave, and do the same. If the reputational pay-off is high, then there is a strong pressure towards a "mirror" strategy (try to predict whether the other person will co-operate or defect and then do likewise).

This is rather interesting, since it is starting to sound like Newcomb's problem, where we have a (hypothetical) predictor who can't be outwitted. Why is that a believable story at all? Why don't we just stare in bemusement at the very idea? Well, suppose we model "co-operation" as the human player taking one box, which Omega fills with $1 million, versus "defection" as the human player taking both boxes (and Omega not filling the opaque one). Or suppose we treat a resolution to take one box as a "commitment" and an after-the-fact decision to take two boxes (because it no longer makes a difference) as bluffing on a commitment. And of course the rationale for a human player to "co-operate" or to truly "commit" is Omega's reputation for always predicting correctly!

So, here is a story about how "Timeless Decision Theory" (or something like it) could emerge from "Tit for Tat". A combination of commitment effects (2) and reputational effects (3) leads to an environment where successful prediction of human behaviour is rewarded. Such an environment is - possibly - maintained by group selection (4).

People get rather good at prediction. When meeting a successful predictor who will co-operate if you co-operate, and defect if you defect, it is better to co-operate. When the successful predictor will defect if he suspects you are bluffing on a commitment, it is better to have a true commitment. But it is still not obvious what to do on a one-shot prisoner's dilemma, because you don't know how the other party's prediction will go, and don't know what will enhance your own reputation (so sometimes people co-operate, sometimes defect).

All this favours a style of reasoning rather like TDT. But it can also favour a rather "superstitious" approach to justifying the reasoning, since there is no causal connection between our action and the prediction. Instead we get weird pseudo-causal explanations/justifications like gods who are always watching, ancestral spirits who can be angered, bad karma, what goes around comes around etc. and a general suspicion of those who don't go along with the local superstition (since they can't be predicted to co-operate with those who do).

Does this sound familiar?