You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Iterated Gambles and Expected Utility Theory

1 Sable 25 May 2016 09:29PM

The Setup

I'm about a third of the way through Stanovich's Decision Making and Rationality in the Modern World.  Basically, I've gotten through some of the more basic axioms of decision theory (Dominance, Transitivity, etc).

 

As I went through the material, I noted that there were a lot of these:

Decision 5. Which of the following options do you prefer (choose one)?

A. A sure gain of $240

B. 25% chance to gain $1,000 and 75% chance to gain nothing

 

The text goes on to show how most people tend to make irrational choices when confronted with decisions like this; most strikingly was how often irrelevant contexts and framing effected people's decisions.

 

But I understand the decision theory bit; my question is a little more complicated.

 

When I was choosing these options myself, I did what I've been taught by the rationalist community to do in situations where I am given nice, concrete numbers: I shut up and I multiplied, and at each decision choose the option with the highest expected utility.

 

Granted, I equated dollars to utility, which Stanovich does mention that humans don't do well (see Prospect Theory).

 

 

The Problem

In the above decision, option B clearly has the higher expected utility, so I chose it.  But there was still a nagging doubt in my mind, some part of me that thought, if I was really given this option, in real life, I'd choose A.

 

So I asked myself: why would I choose A?  Is this an emotion that isn't well-calibrated?  Am I being risk-averse for gains but risk-taking for losses?

 

What exactly is going on?

 

And then I remembered the Prisoner's Dilemma.

 

 

A Tangent That Led Me to an Idea

Now, I'll assume that anyone reading this has a basic understanding of the concept, so I'll get straight to the point.

 

In classical decision theory, the choice to defect (rat the other guy out) is strictly superior to the choice to cooperate (keep your mouth shut).  No matter what your partner in crime does, you get a better deal if you defect.

 

Now, I haven't studied the higher branches of decision theory yet (I have a feeling that Eliezer, for example, would find a way to cooperate and make his partner in crime cooperate as well; after all, rationalists should win.)

 

Where I've seen the Prisoner's Dilemma resolved is, oddly enough, in Dawkin's The Selfish Gene, which is where I was first introduced to the idea of an Iterated Prisoner's Dilemma.

 

The interesting idea here is that, if you know you'll be in the Prisoner's Dilemma with the same person multiple times, certain kinds of strategies become available that weren't possible in a single instance of the Dilemma.  Partners in crime can be punished for defecting by future defections on your own behalf.

 

The key idea here is that I might have a different response to the gamble if I knew I could take it again.

 

The Math

Let's put on our probability hats and actually crunch the numbers:

Format -  Probability: $Amount of Money | Probability: $Amount of Money

Assuming one picks A over and over again, or B over and over again.

Iteration A--------------------------------------------------------------------------------------------B

1 $240-----------------------------------------------------------------------------------------1/4: $1,000 | 3/4: $0

2 $480----------------------------------------------------------------------1/16: $2,000 | 6/16: $1,000 | 9/16: $0

3 $720---------------------------------------------------1/64: $3,000 | 9/64: $2,000 | 27/64: $1,000 | 27/64: $0

4 $960------------------------1/256: $4,000 | 12/256: $3,000 | 54/256: $2,000 | 108/256: $1,000 | 81/256: $0

5 $1,200----1/1024: $5,000 | 15/1024: $4,000 | 90/256: $3,000 | 270/1024: $2,000 | 405/1024: $1,000 | 243/1024: $0

And so on. (If I've ma de a mistake, please let me know.)

 

The Analysis

It is certainly true that, in terms of expected money, option B outperforms option A no matter how many times one takes the gamble, but instead, let's think in terms of anticipated experience - what we actually expect to happen should we take each bet.

 

The first time we take option B, we note that there is a 75% chance that we walk away disappointed.  That is, if one person chooses option A, and four people choose option B, on average three out of those four people will underperform the person who chose option A.  And it probably won't come as much consolation to the three losers that the winner won significantly bigger than the person who chose A.

 

And since nothing unusual ever happens, we should think that, on average, having taken option B, we'd wind up underperforming option A.

 

Now let's look at further iterations.  In the second iteration, we're more likely than not to have nothing having taken option B twice than we are to have anything.

 

In the third iteration, there's about a 57.8% chance that we'll have outperformed the person who chose option A the whole time, and a 42.2% chance that we'll have nothing.

 

In the fourth iteration, there's a 73.8% chance that we'll have matched or done worse than the person who has chose option A four times (I'm rounding a bit, $1,000 isn't that much better than $960).

 

In the fifth iteration, the above percentage drops to 63.3%.

 

Now, without doing a longer analysis, I can tell that option B will eventually win.  That was obvious from the beginning.

 

But there's still a better than even chance you'll wind up with less, picking option B, than by picking option A.  At least for the first five times you take the gamble.

 

 

Conclusions

If we act to maximize expected utility, we should choose option B, at least so long as I hold that dollars=utility.  And yet it seems that one would have to take option B a fair number of times before it becomes likely that any given person, taking the iterated gamble, will outperform a different person repeatedly taking option A.

 

In other words, of the 1025 people taking the iterated gamble:

we expect 1 to walk away with $1,200 (from taking option A five times),

we expect 376 to walk away with more than $1,200, casting smug glances at the scaredy-cat who took option A the whole time,

and we expect 648 to walk away muttering to themselves about how the whole thing was rigged, casting dirty glances at the other 377 people.

 

After all the calculations, I still think that, if this gamble was really offered to me, I'd take option A, unless I knew for a fact that I could retake the gamble quite a few times.  How do I interpret this in terms of expected utility?

 

Am I not really treating dollars as equal to utility, and discounting the marginal utility of the additional thousands of dollars that the 376 win?

 

What mistakes am I making?

 

Also, a quick trip to google confirms my intuition that there is plenty of work on iterated decisions; does anyone know a good primer on them?

 

I'd like to leave you with this:

 

If you were actually offered this gamble in real life, which option would you take?

Another Iterated Prisoner's Dilemma Tournament?

11 Andreas_Giger 25 May 2012 02:16PM

Last year, there was a lot of interest in the IPD tournament with people asking for regular events of this sort and developing new strategies (like Afterparty) within hours after the results were published and also expressing interest in re-running the tournament with new rules that allowed for submitted strategies to evolve or read their opponent's source code. I noticed that many of the submitted strategies performed poorly because of a lack of understanding of the underlying mechanics, so I wrote a comprehensive article on IPD math that sparked some interesting comments.

And then the whole thing was never spoken of again.

So now I'd like to know: How many LWers would commit to competing in another tournament of this kind, and would someone be interested in hosting it?

Fixed-Length Selective Iterative Prisoner's Dilemma Mechanics

25 Andreas_Giger 13 September 2011 03:24AM

Prerequisites: Basic knowledge of the Prisoner's Dilemma and the Iterated Prisoner's Dilemma.

I recently stumbled upon the selective IPD tournament results, and while I was very interested in the general concept I was also very disappointed by the strategies that were submitted, especially considering this is Less Wrong we are talking about.

This post is designed to help people who are interested in IPD problems to come up with possibly successful strategies; hopefully the same effect another couple of tournaments would have, just in a shorter period of time. All of the following is written with the tournament rules in mind, with scores depicted as deviations of matches with full mutual cooperation since the actual number of turns per match is arbitrary anyway. Also, the hypothetical objective is not to have the highest population after a certain number of generations, but to achieve lasting superiority eventually while treating near-clones that behave exactly the same in the late game as one single strategy. The results of tournaments with a very limited number of generations depend way too much on the pool of submitted strategies to be of general interest, in my opinion.

This post starts out with practical observations and some universal rules and gets increasingly theoretical. Here is a short glossary of terms I presume known:

Feeding on a strategy: Scoring higher against that strategy in a match than against itself.
Outperforming a strategy: Outscoring that strategy over the course of matches against all other strategies in the pool according to populations, including each other and themselves (i.e. improving the population ratio).
Dominance: More than 50% of the total population.
Dormant strategy: A strategy that will achieve dominance at some point but hasn't done so yet.
Extinction: Asymptotical approach to 0% percent of the total population, or actual extinction in case of integer truncation.
TFT-nD: A TFT strategy that defects from the nth last turn on, so TFT-0D is vanilla TFT.

Disclaimer: There is only very basic mathematics and logical reasoning in this post, so if anything seems confusing, it must be me using the wrong words (I'm not a native speaker). Please point out any of these cases so that I can correct them.

 

Survival And Dominance

Let us assume a scenario with only two strategies, one of them dominating which we call X, the other one A.

Survival Rule:
For A in this scenario not to go extinct regardless of initial population, it must score at least equally high against X as X does against itself, and if it doesn't score higher, it must score at least equally high against itself as X does against itself while not losing direct encounters.

Let us assume X is TFT-0D and A is TFT-1D, in which case the numbers are as follows:

TFT-0D vs TFT-0D =   0 :  0
TFT-1D vs TFT-0D = +3 : -4

The survival ratio is +3 : 0. Therefore, in any situation with only TFT and TFT-1D, the latter cannot go extinct. This still doesn't tell us anything about what's needed for achieving dominance, so let's get on with that.

Dominance rule:
For any strategy to achieve dominance in a two-strategy situation where its survival is guaranteed according to the survival rule or where both strategies initially make up for half of the total population, it needs to outscore the other strategy over the course of these four matches:
X vs X
A vs X (2x)
A vs A

Let us again do the numbers with TFT-0D and TFT-1D:

TFT-0D vs TFT-0D =   0 : 0
TFT-1D vs TFT-0D = +3 : -4
TFT-1D vs TFT-0D = +3 : -4
TFT-1D vs TFT-1D =  -3 : -3

On aggregate, the dominance ratio is 0 : -8. Therefore, TFT-1D will achieve dominance.

The conditions for A to exterminate the formerly dominant strategy X follow directly from the conditions for avoiding extinction, and since being the only surviving strategy is not the objective anyway they aren't interesting at this point.

Threshold rule:
If A fulfills the conditions for dominance but not the conditions for survival (i.e. it scores less against X than X does against itself), it will need a certain threshold to avoid extinction and achieve dominance.

Thresholds vary from strategy to strategy, but are obviously always below 50%. The more balanced the survival ratio is, the higher the threshold. In most cases though, the threshold is much too low to be of any relevance in a selective tournament.

You can easily see that any TFT-nD will be dominated by n+1. However, with increasing n the strategy will score lower not only against itself, but also against any TFT-mD with m <= n-2, which makes TFT-nD with large n very unsuccessful in the early generations of the tournament.

 

A Word About Afterparty

The logical solution to this problem are "Afterparty" strategies that are essentially TFT-nD CliqueBots that return to cooperating if their opponent defected first on the same turn as them. I refer to these strategies as Efficient CliqueBots for reasons that will become apparent later on.

The "Afterparty" strategy suggested on Less Wrong first defects on the sixth last turn, I will therefore call it TFT-D5C from here on. In a TFT-nD dominated tournament, however, TFT-D5C is not the optimum, as you can check by doing the math above. The optimum is in fact TFT-D3C, because it is the TFT-DnC with the lowest n that can exterminate any TFT-DmC with m > n (if the other strategy falls below a certain threshold) as well as dominate any TFT-mD with m >= n - 2 (if it reaches a certain threshold). This means that in a tournament similar to the control group tournament over 1000 generations, it would eventually achieve domination since it is safe from extinction due to scoring equally high respectively higher against TFT-2D and TFT-3D than those score against themselves while winning direct encounters and scoring higher against itself anyway (survival rule):

TFT-DnC vs TFT-DnC = -3 : -3

TFT-2D vs TFT-2D = -6 : -6
TFT-D3C vs TFT-2D = -6 : -13

TFT-3Dvs TFT-3D = -9 : -9
TFT-D3C vs TFT-3D = -6 : -13

Also, it outscores TFT-4D (-32 : -38) and TFT-5D (-38 : -48):

TFT-4D vs TFT-4D = -12 : -12
TFT-D3C vs TFT-4D = -13 : -6
TFT-D3C vs TFT-4D = -13 : -6
TFT-D3C vs TFT-D3C = -3 : -3

TFT-5D vs TFT-5D = -15 : -15
TFT-D3C vs TFT-5D = -16 : -9
TFT-D3C vs TFT-5D = -16 : -9
TFT-D3C vs TFT-D3C = -3 : -3

This means that in a situation where TFT-D3C is initially equally well represented as TFT-4D or TFT-5D, it will eventually outperform those (not taking into account TFT-5D feeding on TFT-4D if both are present).

In any situation with only two strategies both being TFT-DnC variants, TFT-D3C will exterminate any other TFT-DnC strategies once that strategy falls below a certain threshold, because no other TFT-DnC strategy can score higher against TFT-D3C or itself than TFT-D3C does against itself. This is trivially true for all TFT-DnC strategies starting from n = 2:

TFT-D2C vs TFT-D2C = -3 : -3
TFT-D3C vs TFT-D2C = -6 : -13

The threshold decreases for increasing n.

TFT-D3C is also certain to gain a better start than any other TFT-DnC strategies with n > 3 due to higher gain from TFT-nD strategies with n <= 3 that dominate the early game. TFT-D1C is essentially the same as TFT-1D and equally outperformed by TFT-2D, while TFT-D2C cannot prevent TFT-D3C from crossing its threshold as TFT-D3C can feed on TFT-2D as well as on TFT-3D / TFT-D2C. This pretty much leaves TFT-D3C as the only viable TFT-DnC strategy to survive in a selective tournament of this kind. However this doesn't take into account true parasites which I will talk more about later.

 

CliqueBots

CliqueBots have faired very poorly in this tournament, however that is mostly because they have been very poorly designed. A CliqueBot that needs five defections to identify a clone is doomed from the start, and any strategy that turns into DefectBot after identifying an alien strategy before the last turns is very doomed anyway. It seems that even though participants anticipated TFT-1D and even TFT-2D, no one considered cooperative CliqueBots a possible solution, which is surprising. So let me introduce a CliqueBot that could technically be considered another optimum of TFT-DnC, although it behaves slightly different (it is not an Efficient CliqueBot). The main idea is to defect on the very first turn and cooperate if the opponent's previous move matches its own previous move, i.e. if facing a clone, and playing TFT-2D or TFT-3D otherwise. Also in case the game begins with defect-coop followed by coop-defect, the CliqueBot will cooperate for a second time, since the opponent can be considered a TFT variant and a high number of mutual cooperations is benefitial. To keep the name short and follow the established naming system, I will from here on refer to CliqueBot strategies that defect once on turn i and otherwise behave pretty much as TFT-nD strategies as i/TFT-nD.

Some comparison with selected strategies as described in the rules for dominance (four matches each):

1/TFT-3D vs TFT-0D = -14 : -22
1/TFT-3D vs TFT-1D = -14 : -28
1/TFT-3D vs TFT-2D = -14 : -34
1/TFT-3D vs TFT-3D = -26 : -38
1/TFT-3D vs TFT-4D = -34 : -38
1/TFT-3D vs TFT-5D = -40 : -50

However, the survival of any 1/TFT-nD is guaranteed only against TFT-mD with m = n-1. In addition, any TFT-DmC with m >= n-1 will outperform 1/TFT-nD, so 1/TFT-nD (or any other i/TFT-nD for that matter) cannot prevail.

 

Parasites, Identification And Efficient CliqueBots

A true parasite is any strategy that pretends to be a clone of its host until the last possible opportunity of gainful defection. TFT is essentially a CliqueBot that identifies its clones by sustained cooperation, and TFT-1D is a parasite of TFT as TFT-2D is of TFT-1D and so on.

Parasite rule:
Since parasites trade points gained from encounters with clones for points gained from encounters with hosts, parasites can never go extinct in a scenario with only them and their hosts, no matter how little their population. Any parasite will also ultimatively achieve dominance in these scenarios, because the points gained from one-sided defection (+7) are more than the points lost from mutual defection (-3).

Of which follows the dormancy rule:
A strategy can only win an open-ended selective IPD tournament if it stays dormant until its own parasite has gone extinct.

This means that theoretically any parasite only needs to survive long enough until it is the only strategy left besides its host to achieve ultimate victory, as was the case with TFT-3D, parasite of TFT-2D in the thousand generations tournament. This can easily be achieved if the host is the dominant strategy, because the parasite is better equipped to feed on the host than any other strategy in the pool, which is a guarantee of survival. However, the situation becomes very difficult for a parasite of a parasite of a dominant host in the early game, which is why TFT-4D is probably the highest TFT-nD able to survive and therefore to achieve dominance in this kind of tournament with integer rounding.

This brings us back to Efficient CliqueBots. All CliqueBots, including vanilla TFT, use identification patterns to identify clones in order to maximise total point gain. Parasites make use of these patterns in order to pretend to be their hosts and maximise their own point gain. Nice CliqueBots like vanilla TFT identify their clones by continuous cooperation, which makes them unable to identify other nice CliqueBots as non-clones. This is why TFT can't ever defect on the last turn when facing CooperateBot and why nice CliqueBots aren't usually considered "real" CliqueBots. Non-nice CliqueBots detect clones by mutual defections at fixed identification turns, and cooperate from there on if facing a clone in order to maximise total point gain and avoid the danger of losing points in the late game. So what's better than a parasite of a dominant host? Well obviously a parasite of a dominant host that's also a CliqueBot - because that's what TFT-DnC strategies are, Efficient CliqueBots.

The decisive feature of Efficient CliqueBots is that their identification turn is late enough so that in case of an alien opponent, they can simply continue defecting until the end, maximizing their efficiency and allowing them to perform better against TFT-nD strategies with higher n. Now the funny thing is that for example TFT-3D is a parasite of not only TFT-2D, but also TFT-D2C, which is what makes TFT-D3C the optimal TFT-DnC because it still outperforms TFT-4D. But is TFT-4D really the optimal parasite of TFT-D3C? Obviously not, because that would be TFT-D2CD - an Efficient CliqueBot parasite of an Efficient CliqueBot parasite (TFT-D3C) of a parasite (TFT-3D) of a parasite (TFT-2D) of a parasite (TFT-1D) of a nice CliqueBot (TFT-0D). According to the parasite rule, in a scenario with only TFT-D3C and TFT-D2CD, the latter will ultimately achieve dominance. Of course it has its own parasite in TFT-DC2D, which in turn has its own parasite TFT-2D2C, which is the host of TFT-2DCD, host of TFT-4D. And TFT-D4C and so on until DefectBot. Obviously DefectBot isn't the solution, it's only a solution to one specific strategy. So what is the solution? Well that mostly depends on the pool of submitted strategies, but the Efficient CliqueBot TFT-D3C still seems a pretty good guess.

 

Composite Strategies: Parasite-Host Tandems

So how can we make sure our parasite strategy stays dormant long enough? Well there's one less-than obvious choice, which is to turn it into a Composite strategy. A Composite is pretty much what the name says: At the beginning of each match, with some probability execute strategy A, else execute strategy B. For example, a parasite could with 99% probability execute its host's strategy (e.g. TFT-0D), and with 1% execute the actual parasite strategy (TFT-1D). This would in most cases result in the death of the parasite's parasite (TFT-2D) while still resulting in absolute dominance, albeit very slowly. However this would obviously allow a singleton TFT-1D to feed on the Composite as well as on singleton TFT-0D, achieving dominance very quickly, while at the same time feeding TFT-2D. So 1% doesn't seem to be the ideal percentage, and TFT-2D would have been able to feed on TFT-0D anyway, so TFT-1D is a lost cause from the beginning. Also, even if there never was a singleton TFT-1D and TFT-2D, a similar tandem strategy could simply increase the percentage a bit.

So let's stop talking about Parasite-Host Tandems and instead focus on other kinds of Composites that seem more promising.

 

Composite Strategies: Independent Tandems

These are Composites of two strategies that don't aim to achieve dormancy by keeping their hosts dominant until any own parasites have gone extinct. Instead the idea is that if two independent strategies achieve common dominance, it is less likely that both of their parasites survive. You could obviously increase the number of strategies, but this will create some problems I'll talk more about later. First of all, let us assume a 50-50 tandem of TFT-2D and UtopiaBot, the latter being an imaginary strategy that has no parasites and scores high against itself. Both strategies are highly efficient and will soon eliminate TFT-0D and TFT-1D, but TFT-3D will survive. However, TFT-3D will be unable to achive dominance, since the situation is basically that it can only feed on TFT-2D which makes up only half of the tandem's population, while it still has to deal with UtopiaBot which it is not designed for.

Of course there is a big issue with Independent Tandems: You need to find two somewhat successful strategies with different parasites who score reasonably high against themselves as well as against each other. An obvious candidate for this would be a tandem of an Efficient CliqueBot like TFT-D3C and a regular CliqueBot that continues to cooperate after a single opponent retaliation and defects on the last turns, like 1/TFT-3D. This CliqueBot would cooperate until the end if the initial defection was not met by retaliation (facing the sibling) and TFT-D3C would cooperate until the end if the opponent defected only on the identification turn which it would not retaliate against.

 

Composite Strategies: Parasite Killer Tandems

Another option would be a tandem of two strategies of which one is the parasite of the other's parasite, e.g. TFT-2D and TFT-D3C, the latter taking care of both TFT-3D and TFT-D2C. Here, the Efficient CliqueBot TFT-D3C would revert to cooperation if the defections on the 3rd and 4th last turns have not been met by retaliation, effectively turning it into TFT-2D2C, and TFT-2D would not retaliate after defections on these turns. The main problem with Parasite Killer Tandems like these is that a singleton TFT-2D will also profit from TFT- 2D2C killing TFT-3D strategies. This is somewhat offset by TFT- 2D2C also slowly killing singleton TFT-2D, but possibly not enough to prohibit TFT-3D from feeding on it. This depends on the strategies in the tandem and the amount of surviving parasites. In addition, the parasite killer's parasite (in this case TFT-D2CD) may prove a problem as well as parasites of both strategies (TFT-4D), although these should be pretty low in numbers at that point, probably unable to achieve dominance.

 

Composite Strategies: Random CliqueBots

A Composite can consist of an arbitrary high number of strategies, combining Parasite Killers with independent strategies. The only limitations are the effectiveness of the individual strategies and the ability of these strategies to score high against each other, although the latter is not so much a problem as coop-defect loses only 1 point.

In fact coop-defect is so much superior to defect-defect that this brings us to another kind of Composite strategies: Random CliqueBots, which are basically i/TFT-nD CliqueBots with randomized i. For example, if i ranges from 1 to 10, this strategy wouldn't retaliate against an opponent's first defection on either of turns 1 to 10, thus reducing point loss from identifying clones/siblings. With increasing ranges of i, point loss approaches 1 which is much lower than the 6 points lost by regular CliqueBots with fixed i.

Random CliqueBot vs Efficient CliqueBot dominance ratio:

TFT-D3C vs TFT-D3C = -3 : -3
1-∞/TFT-4D vs TFT-D3C = -13 : -13
1-∞/TFT-4D vs TFT-D3C = -13 : -13
1-∞/TFT-4D vs 1-∞/TFT-4D = +3 : -4

Which nets -27 : -32. However, as TFT-D3C always scores 1 point higher against TFT-nD and other TFT-DnC strategies than 1-∞/TFT-4D does, the winner of this duel will depend heavily on the pool of submitted strategies.

In any case, this concludes Composite strategies, so let's get on with finding some replacements for boring ol' TFT.

 

CRD vs TFT

In the late game, TFT is successful because it doesn't defect first while its ability to retaliate becomes very unimportant except during the very last turns. On the other hand, TFT is successful in the early game because it doesn't let strategies like DefectBots take too much advantage of it while maximising points gained from other nice strategies. But maybe there are other strategies with these same qualities that don't turn into a bloody mess after one random defection or score higher against RandomBots? Here's a very interesting observation for all of you that no one seems to have noticed, and I wouldn't have noticed it either had I not been specifically looking for it. Anyway, this is the observation: In both round-robin tournaments of the control group, the highest finishing TFT-0D variant is C6. Any strategies that finished higher were either TFT-1D or TFT-2D variants. And what is C6?

It is a strategy that always forgives the opponent's first defection.

Had C6 defected on the last two turns, I suspect it would have dominated the tournament until the emergence of TFT-3D, i.e. won the 100 generations tournament. Had it been a forgiving TFT-D3C, it would have won the 1000 generations tournament. There were other strategies that had a chance of forgiving, but they did that on every defection, allowing DefectBots and RandomBots to trample all over them (like TF2T).

In order to maximise gain from RandomBots and other insane strategies, it might also be worth to switch to DefectBot after a total of three or two subsequent opponent's defections. For the sake of simplicity and three-letter acronyms I call this strategy CRD (Cooperate, Retaliate, Defect).

Should CRD prove to outperform TFT in the early game, any other strategy would get one early defection for free, which is pretty much parasite CliqueBot heaven. On the other hand, CRD strategies would be able to feed on any Random CliqueBots.

 

Final Remarks

This pretty much concludes my thoughts on the theoretical nature of the selective Iterated Prisoner's Dilemma. As I see it, the strategies with the highest chance of success are:

Efficient CliqueBots:
Might work if the correct host is chosen and no true parasites are present. If there is a good chance that other participants will come to the same conclusion regarding the host while it is unlikely for any of the strategy's parasites to be able to survive, it might be beneficial to experiment with TFT variants such as CRD in order to maximise early game growth. This is what gave I the lasting advantage over C4 in the thousand generations tournament.

Regular CliqueBots:
Might work if there is a high number of Efficient CliqueBots hindering each other's process, especially if there are also specialized parasites of those. Regular CliqueBots will also have an advantage if there are lots of forgiving strategies.

Parasite Killer Tandems:
Might work if the correct parasite is chosen as a host and no parasites of the parasite killer survive.

Random CliqueBots:
Will most likely work, except if there are early dominant Efficient CliqueBot strategies on which the Random CliqueBot cannot feed, or if dominant strategies are of the forgiving (CRD) kind.

Regular TFT-nD strategies will be exterminated by Efficient CliqueBots.

There also are a few general guidelines:

  • Any strategy needs to be carefully designed, which for (non-Efficient) CliqueBots includes forgiving after opponent retaliaton as well as updating the identity of the opponent in case of defections before or after the identification turn.
  • CliqueBots and Composites should not waste a single point when identifying clones, siblings and aliens.
  • Any strategy needs to score against any other strategy at least as high as any potential competitors score against that strategy, including most importantly the competitor itself.
  • Any strategy needs to score against itself as high as possible.
  • It is not necessary or important to win direct encounters if the guidelines above are followed.
  • The probability that I have made not a single arithmetic error in all of this post is pretty low, so double-checking the calculations relevant to the strategy in question seems rational.

Please also see this comment for graphical comparison of some of the discussed strategies.

Lastly, a few selected strategies written in pseudocode, with n being the number of turns per match:

 

Efficient CliqueBot: TFT-D3C

Cooperation on first turn.
Continue with TFT.
If opponent defects on any of turns 1 to n-4:
    Continue with TFT.
    Defect on turns n-1 and n.
    If opponent defects on any of turns n-5 to n:
        Defect until end.
Else:
    Defect on turn n-3.
    If opponent has defected on turn n-3:
        Continue with cooperation.
        If opponent defects any turn:
            Defect until end.
    Else:
        Defect until end.

 

Regular CliqueBot: 1/TFT-2D

Defect on first turn.
Cooperate on second turn.
If opponent has defected on first turn and cooperated on second turn:
    Cooperate on third turn.
    Continue with TFT.
    If opponent defects on any of turns 3 to n:
        Defect on turns n-1 (if still possible) and n.
Else:
    If opponent has defected on both turns:
        Defect on third turn.
    Else:
        Cooperate on third turn.
    Continue with TFT until turn n-2.
    Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
    Defect until end.

 

Random CliqueBot: 10-19/TFT-2D

Randomly pick an integer i from 10 to 19.
Cooperate on first turn.
Play TFT until turn 9.
If opponent has defected on any of turns 1 to 9:
    Continue with TFT.
    Defect on turns n-1 and n.
Else:
    Continue with cooperate until turn 19.
    If opponent has defected a total of two times:
        Continue with TFT.
        Defect on turns n-1 and n.
    Else if opponent has defected once before turn 19:
        Cooperate after opponent defection.
        Continue with TFT.
    Else:
        Defect on turn i.
        Continue with cooperate.
        If opponent has defected on turn i:
            If opponent has cooperated on turn i+1:
                Continue with TFT.
            Else:
                Continue with TFT.
                Defect on turns n-1 and n.
        Else:
            If i < 19 and opponent has cooperated on turn i+1:
                Continue with TFT.
                If opponent defects:
                    Defect on turns n-1 and n.
            Else:
                Cooperate.
                Continue with TFT.
                Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
    Defect until end.