Simulating Problems
Apologies for the rather mathematical nature of this post, but it seems to have some implications for topics relevant to LW. Prior to posting I looked for literature on this but was unable to find any; pointers would be appreciated.
In short, my question is: How can we prove that any simulation of a problem really simulates the problem?
I want to demonstrate that this is not as obvious as it may seem by using the example of Newcomb's Problem. The issue here is of course Omega's omniscience. If we construct a simulation with the rules (payoffs) of Newcomb, an Omega that is always right, and an interface for the agent to interact with the simulation, will that be enough?
Let's say we simulate Omega's prediction by a coin toss and repeat the simulation (without payoffs) until the coin toss matches the agent's decision. This seems to adhere to all specifications of Newcomb and is (if the coin toss is hidden) in fact indistinguishable from it from the agent's perspective. However, if the agent knows how the simulation works, a CDT agent will one-box, while it is assumed that the same agent would two-box in 'real' Newcomb. Not telling the agent how the simulation works is never a solution, so this simulation appears to not actually simulate Newcomb.
Pointing out differences is of course far easier than proving that none exist. Assuming there's a problem we have no idea which decisions agents would make, and we want to build a real-world simulation to find out exactly that. How can we prove that this simulation really simulates the problem?
(Edit: Apparently it wasn't apparent that this is about problems in terms of game theory and decision theory. Newcomb, Prisoner's Dilemma, Iterated Prisoner's Dilemma, Monty Hall, Sleeping Beauty, Two Envelopes, that sort of stuff. Should be clear now.)
Can anyone explain to me why CDT two-boxes?
I have read lots of LW posts on this topic, and everyone seems to take this for granted without giving a proper explanation. So if anyone could explain this to me, I would appreciate that.
This is a simple question that is in need of a simple answer. Please don't link to pages and pages of theorycrafting. Thank you.
Edit: Since posting this, I have come to the conclusion that CDT doesn't actually play Newcomb. Here's a disagreement with that statement:
If you write up a CDT algorithm and then put it into a Newcomb's problem simulator, it will do something. It's playing the game; maybe not well, but it's playing.
And here's my response:
The thing is, an actual Newcomb simulator can't possibly exist because Omega doesn't exist. There are tons of workarounds, like using coin tosses as a substitution for Omega and ignoring the results whenever the coin was wrong, but that is something fundamentally different from Newcomb.
You can only simulate Newcomb in theory, and it is perfectly possible to just not play a theoretical game, if you reject the theory it is based on. In theoretical Newcomb, CDT doesn't care about the rule of Omega being right, so CDT does not play Newcomb.
If you're trying to simulate Newcomb in reality by substituting Omega with someone who has only empirically been proven right, you substitute Newcomb with a problem that consists of little more than simple calculation of priors and payoffs, and that's hardly the point here.
Edit 2: Clarification regarding backwards causality, which seems to confuse people:
Newcomb assumes that Omega is omniscient, which more importantly means that the decision you make right now determines whether Omega has put money in the box or not. Obviously this is backwards causality, and therefore not possible in real life, which is why Nozick doesn't spend too much ink on this.
But if you rule out the possibility of backwards causality, Omega can only make his prediction of your decision based on all your actions up to the point where it has to decide whether to put money in the box or not. In that case, if you take two people who have so far always acted (decided) identical, but one will one-box while the other one will two-box, Omega cannot make different predictions for them. And no matter what prediction Omega makes, you don't want to be the one who one-boxes.
Edit 3: Further clarification on the possible problems that could be considered Newcomb:
There's four types of Newcomb problems:
- Omniscient Omega (backwards causality) - CDT rejects this case, which cannot exist in reality.
- Fallible Omega, but still backwards causality - CDT rejects this case, which cannot exist in reality.
- Infallible Omega, no backwards causality - CDT correctly two-boxes. To improve payouts, CDT would have to have decided differently in the past, which is not decision theory anymore.
- Fallible Omega, no backwards causality - CDT correctly two-boxes. To improve payouts, CDT would have to have decided differently in the past, which is not decision theory anymore.
That's all there is to it.
Edit 4: Excerpt from Nozick's "Newcomb's Problem and Two Principles of Choice":
Now, at last, to return to Newcomb's example of the predictor. If one believes, for this case, that there is backwards causality, that your choice causes the money to be there or not, that it causes him to have made the prediction that he made, then there is no problem. One takes only what is in the second box. Or if one believes that the way the predictor works is by looking into the future; he, in some sense, sees what you are doing, and hence is no more likely to be wrong about what you do than someone else who is standing there at the time and watching you, and would normally see you, say, open only one box, then there is no problem. You take only what is in the second box. But suppose we establish or take as given that there is no backwards causality, that what you actually decide to do does not affect what he did in the past, that what you actually decide to do is not part of the explanation of why he made the prediction he made. So let us agree that the predictor works as follows: He observes you sometime before you are faced with the choice, examines you with complicated apparatus, etc., and then uses his theory to predict on the basis of this state you were in, what choice you would make later when faced with the choice. Your deciding to do as you do is not part of the explanation of why he makes the prediction he does, though your being in a certain state earlier, is part of the explanation of why he makes the prediction he does, and why you decide as you do.
I believe that one should take what is in both boxes. I fear that the considerations I have adduced thus far will not convince those proponents of taking only what is in the second box. Furthermore I suspect that an adequate solution to this problem will go much deeper than I have yet gone or shall go in this paper. So I want to pose one question. I assume that it is clear that in the vaccine example, the person should not be convinced by the probability argument, and should choose the dominant action. I assume also that it is clear that in the case of the two brothers, the brother should not be convinced by the probability argument offered. The question I should like to put to proponents of taking only what is in the second box in Newcomb's example (and hence not performing the dominant action) is: what is the difference between Newcomb's example and the other two examples which make the difference between not following the dominance principle, and following it?
Another Iterated Prisoner's Dilemma Tournament?
Last year, there was a lot of interest in the IPD tournament with people asking for regular events of this sort and developing new strategies (like Afterparty) within hours after the results were published and also expressing interest in re-running the tournament with new rules that allowed for submitted strategies to evolve or read their opponent's source code. I noticed that many of the submitted strategies performed poorly because of a lack of understanding of the underlying mechanics, so I wrote a comprehensive article on IPD math that sparked some interesting comments.
And then the whole thing was never spoken of again.
So now I'd like to know: How many LWers would commit to competing in another tournament of this kind, and would someone be interested in hosting it?
Fixed-Length Selective Iterative Prisoner's Dilemma Mechanics
Prerequisites: Basic knowledge of the Prisoner's Dilemma and the Iterated Prisoner's Dilemma.
I recently stumbled upon the selective IPD tournament results, and while I was very interested in the general concept I was also very disappointed by the strategies that were submitted, especially considering this is Less Wrong we are talking about.
This post is designed to help people who are interested in IPD problems to come up with possibly successful strategies; hopefully the same effect another couple of tournaments would have, just in a shorter period of time. All of the following is written with the tournament rules in mind, with scores depicted as deviations of matches with full mutual cooperation since the actual number of turns per match is arbitrary anyway. Also, the hypothetical objective is not to have the highest population after a certain number of generations, but to achieve lasting superiority eventually while treating near-clones that behave exactly the same in the late game as one single strategy. The results of tournaments with a very limited number of generations depend way too much on the pool of submitted strategies to be of general interest, in my opinion.
This post starts out with practical observations and some universal rules and gets increasingly theoretical. Here is a short glossary of terms I presume known:
Feeding on a strategy: Scoring higher against that strategy in a match than against itself.
Outperforming a strategy: Outscoring that strategy over the course of matches against all other strategies in the pool according to populations, including each other and themselves (i.e. improving the population ratio).
Dominance: More than 50% of the total population.
Dormant strategy: A strategy that will achieve dominance at some point but hasn't done so yet.
Extinction: Asymptotical approach to 0% percent of the total population, or actual extinction in case of integer truncation.
TFT-nD: A TFT strategy that defects from the nth last turn on, so TFT-0D is vanilla TFT.
Disclaimer: There is only very basic mathematics and logical reasoning in this post, so if anything seems confusing, it must be me using the wrong words (I'm not a native speaker). Please point out any of these cases so that I can correct them.
Survival And Dominance
Let us assume a scenario with only two strategies, one of them dominating which we call X, the other one A.
Survival Rule:
For A in this scenario not to go extinct regardless of initial population, it must score at least equally high against X as X does against itself, and if it doesn't score higher, it must score at least equally high against itself as X does against itself while not losing direct encounters.
Let us assume X is TFT-0D and A is TFT-1D, in which case the numbers are as follows:
TFT-0D vs TFT-0D = 0 : 0
TFT-1D vs TFT-0D = +3 : -4
The survival ratio is +3 : 0. Therefore, in any situation with only TFT and TFT-1D, the latter cannot go extinct. This still doesn't tell us anything about what's needed for achieving dominance, so let's get on with that.
Dominance rule:
For any strategy to achieve dominance in a two-strategy situation where its survival is guaranteed according to the survival rule or where both strategies initially make up for half of the total population, it needs to outscore the other strategy over the course of these four matches:
X vs X
A vs X (2x)
A vs A
Let us again do the numbers with TFT-0D and TFT-1D:
TFT-0D vs TFT-0D = 0 : 0
TFT-1D vs TFT-0D = +3 : -4
TFT-1D vs TFT-0D = +3 : -4
TFT-1D vs TFT-1D = -3 : -3
On aggregate, the dominance ratio is 0 : -8. Therefore, TFT-1D will achieve dominance.
The conditions for A to exterminate the formerly dominant strategy X follow directly from the conditions for avoiding extinction, and since being the only surviving strategy is not the objective anyway they aren't interesting at this point.
Threshold rule:
If A fulfills the conditions for dominance but not the conditions for survival (i.e. it scores less against X than X does against itself), it will need a certain threshold to avoid extinction and achieve dominance.
Thresholds vary from strategy to strategy, but are obviously always below 50%. The more balanced the survival ratio is, the higher the threshold. In most cases though, the threshold is much too low to be of any relevance in a selective tournament.
You can easily see that any TFT-nD will be dominated by n+1. However, with increasing n the strategy will score lower not only against itself, but also against any TFT-mD with m <= n-2, which makes TFT-nD with large n very unsuccessful in the early generations of the tournament.
A Word About Afterparty
The logical solution to this problem are "Afterparty" strategies that are essentially TFT-nD CliqueBots that return to cooperating if their opponent defected first on the same turn as them. I refer to these strategies as Efficient CliqueBots for reasons that will become apparent later on.
The "Afterparty" strategy suggested on Less Wrong first defects on the sixth last turn, I will therefore call it TFT-D5C from here on. In a TFT-nD dominated tournament, however, TFT-D5C is not the optimum, as you can check by doing the math above. The optimum is in fact TFT-D3C, because it is the TFT-DnC with the lowest n that can exterminate any TFT-DmC with m > n (if the other strategy falls below a certain threshold) as well as dominate any TFT-mD with m >= n - 2 (if it reaches a certain threshold). This means that in a tournament similar to the control group tournament over 1000 generations, it would eventually achieve domination since it is safe from extinction due to scoring equally high respectively higher against TFT-2D and TFT-3D than those score against themselves while winning direct encounters and scoring higher against itself anyway (survival rule):
TFT-DnC vs TFT-DnC = -3 : -3
TFT-2D vs TFT-2D = -6 : -6
TFT-D3C vs TFT-2D = -6 : -13
TFT-3Dvs TFT-3D = -9 : -9
TFT-D3C vs TFT-3D = -6 : -13
Also, it outscores TFT-4D (-32 : -38) and TFT-5D (-38 : -48):
TFT-4D vs TFT-4D = -12 : -12
TFT-D3C vs TFT-4D = -13 : -6
TFT-D3C vs TFT-4D = -13 : -6
TFT-D3C vs TFT-D3C = -3 : -3
TFT-5D vs TFT-5D = -15 : -15
TFT-D3C vs TFT-5D = -16 : -9
TFT-D3C vs TFT-5D = -16 : -9
TFT-D3C vs TFT-D3C = -3 : -3
This means that in a situation where TFT-D3C is initially equally well represented as TFT-4D or TFT-5D, it will eventually outperform those (not taking into account TFT-5D feeding on TFT-4D if both are present).
In any situation with only two strategies both being TFT-DnC variants, TFT-D3C will exterminate any other TFT-DnC strategies once that strategy falls below a certain threshold, because no other TFT-DnC strategy can score higher against TFT-D3C or itself than TFT-D3C does against itself. This is trivially true for all TFT-DnC strategies starting from n = 2:
TFT-D2C vs TFT-D2C = -3 : -3
TFT-D3C vs TFT-D2C = -6 : -13
The threshold decreases for increasing n.
TFT-D3C is also certain to gain a better start than any other TFT-DnC strategies with n > 3 due to higher gain from TFT-nD strategies with n <= 3 that dominate the early game. TFT-D1C is essentially the same as TFT-1D and equally outperformed by TFT-2D, while TFT-D2C cannot prevent TFT-D3C from crossing its threshold as TFT-D3C can feed on TFT-2D as well as on TFT-3D / TFT-D2C. This pretty much leaves TFT-D3C as the only viable TFT-DnC strategy to survive in a selective tournament of this kind. However this doesn't take into account true parasites which I will talk more about later.
CliqueBots
CliqueBots have faired very poorly in this tournament, however that is mostly because they have been very poorly designed. A CliqueBot that needs five defections to identify a clone is doomed from the start, and any strategy that turns into DefectBot after identifying an alien strategy before the last turns is very doomed anyway. It seems that even though participants anticipated TFT-1D and even TFT-2D, no one considered cooperative CliqueBots a possible solution, which is surprising. So let me introduce a CliqueBot that could technically be considered another optimum of TFT-DnC, although it behaves slightly different (it is not an Efficient CliqueBot). The main idea is to defect on the very first turn and cooperate if the opponent's previous move matches its own previous move, i.e. if facing a clone, and playing TFT-2D or TFT-3D otherwise. Also in case the game begins with defect-coop followed by coop-defect, the CliqueBot will cooperate for a second time, since the opponent can be considered a TFT variant and a high number of mutual cooperations is benefitial. To keep the name short and follow the established naming system, I will from here on refer to CliqueBot strategies that defect once on turn i and otherwise behave pretty much as TFT-nD strategies as i/TFT-nD.
Some comparison with selected strategies as described in the rules for dominance (four matches each):
1/TFT-3D vs TFT-0D = -14 : -22
1/TFT-3D vs TFT-1D = -14 : -28
1/TFT-3D vs TFT-2D = -14 : -34
1/TFT-3D vs TFT-3D = -26 : -38
1/TFT-3D vs TFT-4D = -34 : -38
1/TFT-3D vs TFT-5D = -40 : -50
However, the survival of any 1/TFT-nD is guaranteed only against TFT-mD with m = n-1. In addition, any TFT-DmC with m >= n-1 will outperform 1/TFT-nD, so 1/TFT-nD (or any other i/TFT-nD for that matter) cannot prevail.
Parasites, Identification And Efficient CliqueBots
A true parasite is any strategy that pretends to be a clone of its host until the last possible opportunity of gainful defection. TFT is essentially a CliqueBot that identifies its clones by sustained cooperation, and TFT-1D is a parasite of TFT as TFT-2D is of TFT-1D and so on.
Parasite rule:
Since parasites trade points gained from encounters with clones for points gained from encounters with hosts, parasites can never go extinct in a scenario with only them and their hosts, no matter how little their population. Any parasite will also ultimatively achieve dominance in these scenarios, because the points gained from one-sided defection (+7) are more than the points lost from mutual defection (-3).
Of which follows the dormancy rule:
A strategy can only win an open-ended selective IPD tournament if it stays dormant until its own parasite has gone extinct.
This means that theoretically any parasite only needs to survive long enough until it is the only strategy left besides its host to achieve ultimate victory, as was the case with TFT-3D, parasite of TFT-2D in the thousand generations tournament. This can easily be achieved if the host is the dominant strategy, because the parasite is better equipped to feed on the host than any other strategy in the pool, which is a guarantee of survival. However, the situation becomes very difficult for a parasite of a parasite of a dominant host in the early game, which is why TFT-4D is probably the highest TFT-nD able to survive and therefore to achieve dominance in this kind of tournament with integer rounding.
This brings us back to Efficient CliqueBots. All CliqueBots, including vanilla TFT, use identification patterns to identify clones in order to maximise total point gain. Parasites make use of these patterns in order to pretend to be their hosts and maximise their own point gain. Nice CliqueBots like vanilla TFT identify their clones by continuous cooperation, which makes them unable to identify other nice CliqueBots as non-clones. This is why TFT can't ever defect on the last turn when facing CooperateBot and why nice CliqueBots aren't usually considered "real" CliqueBots. Non-nice CliqueBots detect clones by mutual defections at fixed identification turns, and cooperate from there on if facing a clone in order to maximise total point gain and avoid the danger of losing points in the late game. So what's better than a parasite of a dominant host? Well obviously a parasite of a dominant host that's also a CliqueBot - because that's what TFT-DnC strategies are, Efficient CliqueBots.
The decisive feature of Efficient CliqueBots is that their identification turn is late enough so that in case of an alien opponent, they can simply continue defecting until the end, maximizing their efficiency and allowing them to perform better against TFT-nD strategies with higher n. Now the funny thing is that for example TFT-3D is a parasite of not only TFT-2D, but also TFT-D2C, which is what makes TFT-D3C the optimal TFT-DnC because it still outperforms TFT-4D. But is TFT-4D really the optimal parasite of TFT-D3C? Obviously not, because that would be TFT-D2CD - an Efficient CliqueBot parasite of an Efficient CliqueBot parasite (TFT-D3C) of a parasite (TFT-3D) of a parasite (TFT-2D) of a parasite (TFT-1D) of a nice CliqueBot (TFT-0D). According to the parasite rule, in a scenario with only TFT-D3C and TFT-D2CD, the latter will ultimately achieve dominance. Of course it has its own parasite in TFT-DC2D, which in turn has its own parasite TFT-2D2C, which is the host of TFT-2DCD, host of TFT-4D. And TFT-D4C and so on until DefectBot. Obviously DefectBot isn't the solution, it's only a solution to one specific strategy. So what is the solution? Well that mostly depends on the pool of submitted strategies, but the Efficient CliqueBot TFT-D3C still seems a pretty good guess.
Composite Strategies: Parasite-Host Tandems
So how can we make sure our parasite strategy stays dormant long enough? Well there's one less-than obvious choice, which is to turn it into a Composite strategy. A Composite is pretty much what the name says: At the beginning of each match, with some probability execute strategy A, else execute strategy B. For example, a parasite could with 99% probability execute its host's strategy (e.g. TFT-0D), and with 1% execute the actual parasite strategy (TFT-1D). This would in most cases result in the death of the parasite's parasite (TFT-2D) while still resulting in absolute dominance, albeit very slowly. However this would obviously allow a singleton TFT-1D to feed on the Composite as well as on singleton TFT-0D, achieving dominance very quickly, while at the same time feeding TFT-2D. So 1% doesn't seem to be the ideal percentage, and TFT-2D would have been able to feed on TFT-0D anyway, so TFT-1D is a lost cause from the beginning. Also, even if there never was a singleton TFT-1D and TFT-2D, a similar tandem strategy could simply increase the percentage a bit.
So let's stop talking about Parasite-Host Tandems and instead focus on other kinds of Composites that seem more promising.
Composite Strategies: Independent Tandems
These are Composites of two strategies that don't aim to achieve dormancy by keeping their hosts dominant until any own parasites have gone extinct. Instead the idea is that if two independent strategies achieve common dominance, it is less likely that both of their parasites survive. You could obviously increase the number of strategies, but this will create some problems I'll talk more about later. First of all, let us assume a 50-50 tandem of TFT-2D and UtopiaBot, the latter being an imaginary strategy that has no parasites and scores high against itself. Both strategies are highly efficient and will soon eliminate TFT-0D and TFT-1D, but TFT-3D will survive. However, TFT-3D will be unable to achive dominance, since the situation is basically that it can only feed on TFT-2D which makes up only half of the tandem's population, while it still has to deal with UtopiaBot which it is not designed for.
Of course there is a big issue with Independent Tandems: You need to find two somewhat successful strategies with different parasites who score reasonably high against themselves as well as against each other. An obvious candidate for this would be a tandem of an Efficient CliqueBot like TFT-D3C and a regular CliqueBot that continues to cooperate after a single opponent retaliation and defects on the last turns, like 1/TFT-3D. This CliqueBot would cooperate until the end if the initial defection was not met by retaliation (facing the sibling) and TFT-D3C would cooperate until the end if the opponent defected only on the identification turn which it would not retaliate against.
Composite Strategies: Parasite Killer Tandems
Another option would be a tandem of two strategies of which one is the parasite of the other's parasite, e.g. TFT-2D and TFT-D3C, the latter taking care of both TFT-3D and TFT-D2C. Here, the Efficient CliqueBot TFT-D3C would revert to cooperation if the defections on the 3rd and 4th last turns have not been met by retaliation, effectively turning it into TFT-2D2C, and TFT-2D would not retaliate after defections on these turns. The main problem with Parasite Killer Tandems like these is that a singleton TFT-2D will also profit from TFT- 2D2C killing TFT-3D strategies. This is somewhat offset by TFT- 2D2C also slowly killing singleton TFT-2D, but possibly not enough to prohibit TFT-3D from feeding on it. This depends on the strategies in the tandem and the amount of surviving parasites. In addition, the parasite killer's parasite (in this case TFT-D2CD) may prove a problem as well as parasites of both strategies (TFT-4D), although these should be pretty low in numbers at that point, probably unable to achieve dominance.
Composite Strategies: Random CliqueBots
A Composite can consist of an arbitrary high number of strategies, combining Parasite Killers with independent strategies. The only limitations are the effectiveness of the individual strategies and the ability of these strategies to score high against each other, although the latter is not so much a problem as coop-defect loses only 1 point.
In fact coop-defect is so much superior to defect-defect that this brings us to another kind of Composite strategies: Random CliqueBots, which are basically i/TFT-nD CliqueBots with randomized i. For example, if i ranges from 1 to 10, this strategy wouldn't retaliate against an opponent's first defection on either of turns 1 to 10, thus reducing point loss from identifying clones/siblings. With increasing ranges of i, point loss approaches 1 which is much lower than the 6 points lost by regular CliqueBots with fixed i.
Random CliqueBot vs Efficient CliqueBot dominance ratio:
TFT-D3C vs TFT-D3C = -3 : -3
1-∞/TFT-4D vs TFT-D3C = -13 : -13
1-∞/TFT-4D vs TFT-D3C = -13 : -13
1-∞/TFT-4D vs 1-∞/TFT-4D = +3 : -4
Which nets -27 : -32. However, as TFT-D3C always scores 1 point higher against TFT-nD and other TFT-DnC strategies than 1-∞/TFT-4D does, the winner of this duel will depend heavily on the pool of submitted strategies.
In any case, this concludes Composite strategies, so let's get on with finding some replacements for boring ol' TFT.
CRD vs TFT
In the late game, TFT is successful because it doesn't defect first while its ability to retaliate becomes very unimportant except during the very last turns. On the other hand, TFT is successful in the early game because it doesn't let strategies like DefectBots take too much advantage of it while maximising points gained from other nice strategies. But maybe there are other strategies with these same qualities that don't turn into a bloody mess after one random defection or score higher against RandomBots? Here's a very interesting observation for all of you that no one seems to have noticed, and I wouldn't have noticed it either had I not been specifically looking for it. Anyway, this is the observation: In both round-robin tournaments of the control group, the highest finishing TFT-0D variant is C6. Any strategies that finished higher were either TFT-1D or TFT-2D variants. And what is C6?
It is a strategy that always forgives the opponent's first defection.
Had C6 defected on the last two turns, I suspect it would have dominated the tournament until the emergence of TFT-3D, i.e. won the 100 generations tournament. Had it been a forgiving TFT-D3C, it would have won the 1000 generations tournament. There were other strategies that had a chance of forgiving, but they did that on every defection, allowing DefectBots and RandomBots to trample all over them (like TF2T).
In order to maximise gain from RandomBots and other insane strategies, it might also be worth to switch to DefectBot after a total of three or two subsequent opponent's defections. For the sake of simplicity and three-letter acronyms I call this strategy CRD (Cooperate, Retaliate, Defect).
Should CRD prove to outperform TFT in the early game, any other strategy would get one early defection for free, which is pretty much parasite CliqueBot heaven. On the other hand, CRD strategies would be able to feed on any Random CliqueBots.
Final Remarks
This pretty much concludes my thoughts on the theoretical nature of the selective Iterated Prisoner's Dilemma. As I see it, the strategies with the highest chance of success are:
Efficient CliqueBots:
Might work if the correct host is chosen and no true parasites are present. If there is a good chance that other participants will come to the same conclusion regarding the host while it is unlikely for any of the strategy's parasites to be able to survive, it might be beneficial to experiment with TFT variants such as CRD in order to maximise early game growth. This is what gave I the lasting advantage over C4 in the thousand generations tournament.
Regular CliqueBots:
Might work if there is a high number of Efficient CliqueBots hindering each other's process, especially if there are also specialized parasites of those. Regular CliqueBots will also have an advantage if there are lots of forgiving strategies.
Parasite Killer Tandems:
Might work if the correct parasite is chosen as a host and no parasites of the parasite killer survive.
Random CliqueBots:
Will most likely work, except if there are early dominant Efficient CliqueBot strategies on which the Random CliqueBot cannot feed, or if dominant strategies are of the forgiving (CRD) kind.
Regular TFT-nD strategies will be exterminated by Efficient CliqueBots.
There also are a few general guidelines:
- Any strategy needs to be carefully designed, which for (non-Efficient) CliqueBots includes forgiving after opponent retaliaton as well as updating the identity of the opponent in case of defections before or after the identification turn.
- CliqueBots and Composites should not waste a single point when identifying clones, siblings and aliens.
- Any strategy needs to score against any other strategy at least as high as any potential competitors score against that strategy, including most importantly the competitor itself.
- Any strategy needs to score against itself as high as possible.
- It is not necessary or important to win direct encounters if the guidelines above are followed.
- The probability that I have made not a single arithmetic error in all of this post is pretty low, so double-checking the calculations relevant to the strategy in question seems rational.
Please also see this comment for graphical comparison of some of the discussed strategies.
Lastly, a few selected strategies written in pseudocode, with n being the number of turns per match:
Efficient CliqueBot: TFT-D3C
Cooperation on first turn.
Continue with TFT.
If opponent defects on any of turns 1 to n-4:
Continue with TFT.
Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
Defect until end.
Else:
Defect on turn n-3.
If opponent has defected on turn n-3:
Continue with cooperation.
If opponent defects any turn:
Defect until end.
Else:
Defect until end.
Regular CliqueBot: 1/TFT-2D
Defect on first turn.
Cooperate on second turn.
If opponent has defected on first turn and cooperated on second turn:
Cooperate on third turn.
Continue with TFT.
If opponent defects on any of turns 3 to n:
Defect on turns n-1 (if still possible) and n.
Else:
If opponent has defected on both turns:
Defect on third turn.
Else:
Cooperate on third turn.
Continue with TFT until turn n-2.
Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
Defect until end.
Random CliqueBot: 10-19/TFT-2D
Randomly pick an integer i from 10 to 19.
Cooperate on first turn.
Play TFT until turn 9.
If opponent has defected on any of turns 1 to 9:
Continue with TFT.
Defect on turns n-1 and n.
Else:
Continue with cooperate until turn 19.
If opponent has defected a total of two times:
Continue with TFT.
Defect on turns n-1 and n.
Else if opponent has defected once before turn 19:
Cooperate after opponent defection.
Continue with TFT.
Else:
Defect on turn i.
Continue with cooperate.
If opponent has defected on turn i:
If opponent has cooperated on turn i+1:
Continue with TFT.
Else:
Continue with TFT.
Defect on turns n-1 and n.
Else:
If i < 19 and opponent has cooperated on turn i+1:
Continue with TFT.
If opponent defects:
Defect on turns n-1 and n.
Else:
Cooperate.
Continue with TFT.
Defect on turns n-1 and n.
If opponent defects on any of turns n-5 to n:
Defect until end.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)