Observe the payoff matrix at right (the unit of reward? Cookies.). Each player wants to play 'A', but only so long as the two players play different moves.
Suppose that Red got to move first. There are some games where moving first is terrible - take Rock Paper Scissors for example. But in this game, moving first is great, because you get to narrow down your opponent's options! If Red goes first, Red picks 'A', and then Blue has to pick 'B' to get a cookie.
This is basically kidnapping. Red has taken all three cookies hostage, and nobody gets any cookies unless Blue agrees to Red's demands for two cookies. Whoever gets to move first plays the kidnapper, and the other player has to decide whether to accede to their ransom demand in exchange for a cookie.
What if neither player gets to move before the other, but instead they have their moves revealed at the same time?
Pre-Move Chat:
Red: "I'm going to pick A, you'd better pick B."
Blue: "I don't care what you pick, I'm picking A. You can pick A too if you really want to get 0 cookies."
Red: "Okay I'm really seriously going to pick A. Please pick B."
Blue: "Nah, don't think so. I'll just pick A. You should just pick B."
And so on. They are now playing a game of Chicken. Whoever swerves first is worse off, but if neither of them give in, they crash into each other and die and get no cookies.
So, The Question: is it better to play A, or to play B?
You might want to analyze this game according to standard game theory before trying to reinvent the wheel (and risking to make it square).
In particular, when your proposed strategy involves guessing what the other players do, and this doesn't converge when the other players do the same to you, it is usually a bad sign (sometimes it's not possible to do anything better, but sometime it is).
Let's start with the standard version:
There are two pure strategy Nash equilibria: (A, B) and (B, A). Both are Pareto-optimal, but they benefit the player who played A more than the other, thus they are "unfair", in the sense that they yield asymmetric payoff in a symmetric game.
There is also a mixed strategy Nash equilibrium which is symmetric but sub-optimal. The game is a bargaining problem and a coordination game as well. In particular, it is a variation of the battle of the sexes game.
Unfortunately, there doesn't seem to be any truly satisfactory solution to this game. There is, however, a correlated equilibrium solution. If both players have access to a public random bit generator, they can obtain a Pareto-optimal symmetric payoff equilibrium, assuming that they can communicate to coordinate a strategy (they don't have to make a contract).
Now, let's consider the program-swap variant of the game.
First of all, a remark on the wording:
Players don't have access to each others' source code. Players submit 'bots': programs which have access to each others' source code. The submitted programs are (pure) strategies, not players. I think it is important to always make the distinction clear, otherwise there is a risk of confusion between the decision makers and the decisions themselves, which may lead to flawed proposed solutions.
In the program-swap variant of this game we (players) can achieve something eqiivalent to the correlated equilibrium by using clique-like mixed strategies:
Each player flips a coin and submits either CliqueBot_0 or CliqueBot_1, which are defined as:
def CliqueBot_X(otherBot):
if otherBot is not a CliqueBot_X then output 'A'
Y = extract 'X' from otherBot
ID = player id (either 0 or 1)
if X xor Y xor ID then output 'A'
otherwise output 'B'
This achieves a Pareto-optimal symmetric equilibrium against itself and, being a clique, it also achieves stability: defectors from the clique are immediately punished. It also achieves maximum payoff against B-bots and PredictorBots, though it performs poorly against A-bots.
The main issue with this strategy is, of course, that there are infinitely many cliques, each of them doesn't cooperate with the others. This becomes to a "choosing sides" coordination game, which is easier than the original game: you just need a single way communication channel between the players to achieve coordination. Even if there is no communicaton channel, it might possible to achieve coordination using a prior, without lack of convergence.
Fair enough,
Well... This article is tagged decision_theory, not game_theory. The goal is not merely to explore solutions to this game, the goal is to explore solution-finding strategies to this game, or at least lay the foundations for that.
I think your CliqueBots need the insight behind Emile's hashMe proposal - they each need a unique ID. Otherwise, when CliqueBot_0 plays a copy, "X xor Y xor ID" is 0, so both play B, so they both lose.