I'm sharing a comment from Reddit here because I think this will be a common response.
You lost me at this sentence:
The core issue is the assumption of independence between the players.
That assumption is built into what the prisoner’s dilemma is. If the independence of the players is not there, it as a genuinely different game. This is either because the utilities are shifted in some way (either by external forces like a threat by the mob, a contract), or by one’s own ethical sense (violation of one’s ethics can be compiled into loss of utility in players utility function, as the guilt/shame of behaving contrary to one’s ethic).
One of the important takeaways of prisoners dilemma (and variants) are that, when we find natural occurrences of this in society, it is often in our collective interests to modify the utilities involved to create better overall outcomes.
Fantastic question, this gets straight to the heart of the analysis.
I wholeheartedly agree that whatever Alice does within her cell will not have any causal effect on Bob's decision. They are separate in the sense that Alice cannot ripple the atoms in the universe in any way to affect Bob differently in either case. There are no downstream causal dependencies between them.
However, there is upstream causal dependence. Here's an analogy. Imagine I have two papers with C(ooperate) written on them and two papers with D(efect) written on them. Then I blindly select either two Cs or two Ds. Whichever two papers I choose, I put them in envelopes labeled A and B, and I take those envelopes and shoot them each a light year in opposite directions along with Alice and Bob correspondingly. Before Alice opens her envelope up, she has no idea what Bob has––to her, it really is 50% C or D. When she opens up her envelope and sees that she has a C, she hasn't causally affected Bob's envelope, but now she does have information about it. Namely, she knows that Bob also has a C. When Alice sees that she has a C or a D, she gets new information about D's envelope because of the upstream causal dependence I incorporated by placing the same letters in the envelopes. This corresponds with the clone case. To get the more subtle cases, suppose I choose Alice's envelope content randomly. Then I can flip a weighted coin to determine whether or not to choose Bob's contents randomly or to put in the same contents that Alice has. This corresponds to the rho-analysis in the second to last section.
Hopefully this metaphor helps clarify what exactly I mean when I say that Alice and Bob's decisions share upstream causal dependencies.
Again, from Reddit:
To your credit, you do appear to have actually read and understood the math of the prisoners dilemma, which is far more than most people claiming to refute a well established result in a blog post.
Anyway, there's a lot to unpack to your argument, but superficially it looks like you're basically substituting the iterated variant of the prisoners dilemma for the normal case on the basis that assuming you might be in an iterated situation is the rational thing to do in general? I don't know if that's true, but even if it is, I don't know that its useful to "refute" the the prisoners dilemma since its main use is as a pedagogical example rather than an important result in its own right.
I appreciate the kind words. I can go back and clarify the language of the post to convey that I'm not actually smuggling in a form of iterated prisoner's dilemma, rather this is specifically describing a one-shot play of the prisoner's dilemma where Alice and Bob need not have ever met before or ever meet again.
TL;DR
Descriptions of the Prisoner's Dilemma typically suggest that the optimal policy for each prisoner is to selfishly defect instead of to cooperate. I disagree with the traditional analysis and present a case for cooperation.
The core issue is the assumption of independence between the players. Articulations of the game painstakingly describe how the prisoners are in explicitly separate cells with no possibility of communication. From this, it's assumed that one's action can have no causal effect on the decision of the other player. However, (almost) everything is correlated, and this significantly changes the analysis.
Imagine the case where the prisoners are clones and make exactly the same decision. Then, when they compare the expected payout for each possible action, their payout will be higher in the case where they cooperate because they are certain the other player is having the same thoughts and deterministically will make the same choice. This essay generalizes and formalizes this line of reasoning.
Here's what to expect in what follows. In the first section, we begin by introducing the standard causal decision theory analysis suggesting (Defect, Defect). Then, we introduce the machinery for mixed strategies in the following section. From there we discuss the particular case where both participants are clones, which motivates our new framework. Then, we introduce a bit more formalism around causal modeling and dependence. We proceed to analyze a more general case where both players converge to the same mixed strategy. Then we discuss the most general model, where the players' mixed strategies have some known correlation. Finally, we conclude the analysis. In summary, given some dependency structure due to upstream causal variables, we uncover the cases where the game theory actually suggests cooperation as the optimal policy.
The Traditional Analysis
In Game Theory 101, here's how the Prisoner's Dilemma analysis traditionally goes. Alice and Bob are conspirators in a crime. They're both caught and brought to separate interrogation rooms. They're presented with a Faustian bargain: to snitch or not to snitch. If neither snitches on the other, they both get a 1-year sentence. If one snitches and the other does not, then the snitch goes home free while the non-snitch has to serve 3 years. If they both snitch, they serve 2 years.
Here's the payoff diagram corresponding with this setup:
BCBDAC(−1,−1)(−3,0)AD(0,−3)(−2,−2)
If Alice cooperates, Bob is better off defecting to get 0 years instead of 1 year. If Alice defects, Bob is better off defecting to get 2 years instead of 3 years. So in either case Bob is better off defecting. A strategy which is optimal regardless of the choices of an opponent is called a dominant strategy. Symmetrically, Alice is better off defecting no matter what Bob does. This means that even though they're both happier in the case where they both cooperate, serving just one year each, the "Nash equilibrium" is the case where they both defect, serving two years each. A state is called a Nash equilibrium if no player can benefit by changing their action, holding the actions of all other players constant.
We can represent each player's preferences and optimal choices with an arrow diagram.
Nash equilibria are represented by a state with no arrows pointing away from it, meaning no player would prefer to switch their choice, holding the other players' choices the same. In the example above, (Defect, Defect) is the single Nash equilibrium.
We can generalize the payoff matrix a bit to represent all situations that capture the structure of a "Prisoner's Dilemma"-like scenario.
BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)
I use the variables Q,R,S,T to keep organized. These can be remembered in the following way. Q is the Quarrel payout when they rat each other out. R is the Reward for mutual cooperation. S is the Sucker's payout if the other player snitches and they do not. T is the Temptation payout for snitching while the other does not. The necessary and sufficient conditions for a prisoner's dilemma structure are that S<Q<R<T.
Probabilistic Play
Now, let's make our model a bit more rigorous and extend our binary action space to a probabilistic strategy model.
BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)
Instead of discrete actions, let's suppose Alice and Bob each choose mixed strategy vectors [p(Ac),p(Ad)] and [p(Bc),p(Bd)], respectively, which represent their probabilities of cooperation or defection, such that p(Ac)+p(Ad)=1 and p(Bc)+p(Bd)=1.
We formalize the analysis by noting that Alice wants to maximize her expected (VNM) utility. We now compute her optimal cooperation fraction p∗(Ac|B∗) given Bob's optimal policy.
E[UA(G)]=E[UA(G|Ac)]p∗(Ac)+E[UA(G|Ad)]p∗(Ad)
We decompose the expected value of the game to Alice E[UA(G)] into the expected value of the game given that she cooperates E[UA(G|Ac)] times the probability that she cooperates p∗(Ac) plus the expected value of the game given that she defects E[UA(G|Ad)] times the probability she defects p∗(Ad).
We can further clarify Alice's expected utility given her action E[UA(G|Ac)] into the cases where Bob cooperates and does not cooperate.
E[UA(G|Ac)]=E[UA(G|Ac,Bc)]p∗(Bc)+E[UA(G|Ac,Bd)]p∗(Bd)
E[UA(G|Ad)]=E[UA(G|Ad,Bc)]p∗(Bc)+E[UA(G|Ad,Bd)]p∗(Bd)
Bringing this back into our equation for E[UA(G)] yields:
E[UA(G)]=E[UA(G|Ac,Bc)]p∗(Ac)p∗(Bc) +E[UA(G|Ac,Bd)]p∗(Ac)p∗(Bd) +E[UA(G|Ad,Bc)]p∗(Ad)p∗(Bc) +E[UA(G|Ad,Bd)]p∗(Ad)p∗(Bd)
We can reduce the mess a bit by substituting in our variables Q,R,S,T.
E[UA(G)]=R p∗(Ac)p∗(Bc) +S p∗(Ac)p∗(Bd) +T p∗(Ad)p∗(Bc) +Q p∗(Ad)p∗(Bd)
Now we apply p(Ac)+p(Ad)=1 and p(Bc)+p(Bd)=1.
E[UA(G)]=R p∗(Ac)p∗(Bc) +S p∗(Ac)(1−p∗(Bc)) +T (1−p∗(Ac))p∗(Bc) +Q (1−p∗(Ac))(1−p∗(Bc))
Normally to find the p∗(Ac) which maximizes E[UA(G)] we would differentiate E[UA(G)] with respect to p∗(Ac) and set this result to zero. But because the equation is linear in p∗(Ac), the derivative degenerates. This means that there won't be an optimal "mixed strategy" which is not a "pure strategy", meaning that given the prisoner's dilemma payout structure presented thus far Alice and Bob are better off making a decision to 100% cooperate or 100% defect.
Expanding out E[UA(G)] a bit further we see:
E[UA(G)]=R p∗(Ac)p∗(Bc) +S p∗(Ac)−S p∗(Ac) p∗(Bc) +T p∗(Bc)−T p∗(Bc) p∗(Ac) +Q−Q p∗(Ac)−Q p∗(Bc)+Q p∗(Ac) p∗(Bc)
Now we isolate p∗(Ac):
E[UA(G)]=p∗(Ac)[R p∗(Bc) +S−S p∗(Bc) −T p∗(Bc)−Q +Q p∗(Bc)] +[T p∗(Bc)+Q−Q p∗(Bc)]
E[UA(G)]=p∗(Ac)[p∗(Bc)(Q+R−S−T)+(S−Q)] + [p∗(Bc) (T−Q)+Q]
To see if Alice's optimal cooperation percentage p∗(Ac) is 100% or 0%, we just need to see if the term p∗(Bc)(Q+R−S−T)+(S−Q) is greater than or less than zero. If it's greater than zero, then Alice best maximizes E[UA(G)] when p∗(Ac)=1, and correspondingly when it's less than zero then Alice should always defect (p∗(Ac)=0).
R p∗(Bc) +S−S p∗(Bc) −T p∗(Bc)−Q +Q p∗(Bc)?<0
0?<(T p∗(Bc)−R p∗(Bc))+(S p∗(Bc)−S)+(Q−Q p∗(Bc))
0?<p∗(Bc) (T−R)+Q (1−p∗(Bc))−S (1−p∗(Bc))
0?<p∗(Bc) (T−R)+(1−p∗(Bc)) (Q−S)
We now show that each piece is greater than zero. p∗(Bc)>0, T−R>0 because T>R, 1−p∗(Bc)>0, and Q−S>0 because Q>S. Therefore, 0>p∗(Bc) (T−R)+(1−p∗(Bc)) (Q−S), and Alice's optimal policy is to always defect. The same analysis can be performed for Bob as well.
This is how the analysis typically goes. The common perspective is that, given the payoff diagram and the constraint that T>R>Q>S, in a two-player simultaneous game where the players cannot communicate with one another, then unfortunately the game theory optimal policy is always to defect.
BCBDAC(R,R)(S,T)AD(T,S)(Q,Q)
I disagree with this conclusion and present my own analysis in the subsequent section.
The Clone Case for Cooperation
In the typical analysis of the prisoner's dilemma, we consider the choices of Alice and Bob to be independent of each other. This is because they make their decisions simultaneously and have no way of affecting the decision of the other. But, there is an underlying dependence structure between A and B that we must include in our model.
To illustrate this, imagine the case where Alice is cloned and given exactly the same environment such that both clones will make the same decisions. If we label clone A and clone B, in this case we now have p(BC|AC)=1, implying p(BD|AC)=0, p(BD|AD)=1, and p(BC|AD)=0. This is the case where there's perfect correlation between the decisions of A and B, which we analyze in this section.
Let's now recompute A's expected utility E[UA(G)] which she wants to maximize.
E[UA(G)]=E[UA(G|AC)]p∗(AC)+E[UA(G|AD)]p∗(AD)
Variable definitions:
Given B's potentially mixed strategy, the expected utility of the game to A given each of A's possible choices can be decomposed into the cases for each of B's choices given A's choice.
E[UA(G|AC)]=E[UA(G|AC,BC)]^pA(BC|AC)+E[UA(G|AC,BD)]^pA(BD|AC)
E[UA(G|AD)]=E[UA(G|AD,BC)]^pA(BC|AD)+E[UA(G|AD,BD)]^pA(BD|AD)
We now recombine this into E[UA(G)]:
E[UA(G)]=(E[UA(G|AC,BC)]^pA(BC|AC)+E[UA(G|AC,BD)]^pA(BD|AC))p∗(AC)+(E[UA(G|AD,BC)]^pA(BC|AD)+E[UA(G|AD,BD)]^pA(BD|AD))p∗(AD)
E[UA(G)]=E[UA(G|AC,BC)]^pA(BC|AC)p∗(AC)+E[UA(G|AC,BD)]^pA(BD|AC)p∗(AC)+E[UA(G|AD,BC)]^pA(BC|AD)p∗(AD)+E[UA(G|AD,BD)]^pA(BD|AD)p∗(AD)
Now we substitute in Q,R,S,T and use p∗(AC)+p∗(AD)=1 and ^pA(BC|Ax)+^pA(BD|Ax)=1.
E[UA(G)]=^pA(BC|AC) p∗(AC) R+ (1−^pA(BC|AC)) p∗(AC) S+ ^pA(BC|AD) (1−p∗(AC)) T+ (1−^pA(BC|AD)) (1−p∗(AC)) Q
Notice that, if the choices of the players are truly independent, then ^pA(Bx|Ay)=^pA(Bx), yielding the traditional analysis of the last section.
Let's now explore the case where A and B are clones who will make exactly the same choice, so ^pA(Bx|Ax)=1 and ^pA(Bx|A¬x)=0. Let's now update our E[UA(G)] calculation.
E[UA(G)]=1 p∗(AC) R+ (1−1) p∗(AC) S+ 0 (1−p∗(AC)) T+ (1−0) (1−p∗(AC)) Q
Which we simplify into:
E[UA(G)]= p∗(AC) R+(1−p∗(AC)) Q
E[UA(G)]= p∗(AC) R+Q−Q p∗(AC)
E[UA(G)]=p∗(AC)(R−Q)+Q
Because Reward > Quarrel, E[UA(G)] is maximized when p∗(AC)=1! This means A will cooperate and receive E[UA(G)]=Reward instead of Quarrel, and similarly for B. This seems trivial, but if A and B can make the same decision, they can obtain the globally optimal solution instead of the strictly worse typical Nash equilibrium which is reached when they believe they act independently.
Everything is Correlated
Alright, if Alice knows that Bob will make exactly the same decision, she can fearlessly cooperate, knowing for certain that he will too, dissolving the dilemma. So now, let's reincorporate uncertainty back into our model.
Let's say Alice and Bob are not clones but are instead siblings with similar family values, experiences, dispositions, genes, neurological structures, etc. We know that many, many, many things in the world are correlated in some way. The dependency structure of the world is extremely interconnected. There's also an evolutionary argument for this that I won't get to in much detail. But when picking the prisoners, we're actually sampling players who have played an iterated version of this game many many times. If you believe that humans are not purely rational agents and make decisions based on instinct, then there's likely to be downstream similarity in their policies. And if you believe that the agents are purely rational, they also have exactly the same game-theoretic payouts, so their GTO play should be exactly the same. It would be extremely improbable for Alice and Bob's choices here to be perfectly independent. So let's model some dependence structure.
(I will note that there's a semi-compelling argument suggesting that we don't even need to introduce more subtle forms of dependence. Because they have identical payoff structures, the game-theoretic optimal play should be exactly the same. Given their GTO play should be exactly the same and mixed strategies degenerate to pure strategies, we get correlation 1, and so they should both coordinate 100% of the time. But the real world is messy, so we proceed with slightly more subtlety.)
Causal Modeling 101
To better understand the dependencies and correlations between Alice and Bob's choices, we can frame the problem in the light of causal modeling a la Judea Pearl. In causal modeling, we represent the dependencies between variables using a directed acyclic graph (DAG), where nodes represent variables and directed edges represent causal relationships.
In the traditional analysis of the Prisoner's Dilemma, Alice and Bob are assumed to make their decisions independently. However, in reality, their decisions will definitely be influenced by common upstream factors. For example, Alice and Bob may have similar upbringings, values, experiences, and expectations about loyalty and cooperation that influence their decision-making. And if that's not compelling, they also have access to similar information about the situation, and have exactly the same payout structure, so there's a strong case that they will end up making similar decisions.
We can represent these upstream factors with a variable V. The causal model can be depicted as follows:
Here, V is a common cause that affects both Alice's and Bob's decisions. This creates a dependency between A and B.
Given the upstream factor V, the choices of Alice and Bob could be conditionally independent. However, if we do not observe V, the choices A and B become dependent. This is known as d-separation in causal graphs.
Mathematically, this can be expressed as:
P(A,B)=∑vP(A∣V=v)P(B∣V=v)P(V=v)
This shows that the joint probability of A and B depends on their conditional probabilities given V and the distribution of V.
Independent Case (Optional)
Suppose A and B were truly independent. Here's the causal model we might associate with this hypothesis.
Here V1 and V2 are separate upstream factors that affect Alice's and Bob's decisions, respectively. This creates a scenario where A and B are conditionally independent given V1 and V2. Mathematically, this can be expressed as:
P(A,B)=∑v1∑v2P(A∣V1=v1)P(B∣V2=v2)P(V1=v1)P(V2=v2)
This equation shows that the joint probability of A and B depends on their individual probabilities given V1 and V2, and the distribution of V1 and V2.
To prove the typical independence equation P(A,B)=P(A)P(B) from the more complicated seeming equation above, we first find the marginal probabilities P(A) and P(B) and sum over the respective upstream variables:
P(A)=∑v1P(A∣V1=v1)P(V1=v1)
P(B)=∑v2P(B∣V2=v2)P(V2=v2)
Now we multiply P(A) and P(B) together:
P(A)P(B)=(∑v1P(A∣V1=v1)P(V1=v1))(∑v2P(B∣V2=v2)P(V2=v2))
Then we expand the product of the sums and we notice that this expression is the same as the expression for the joint probability P(A,B) obtained in step 1:
P(A)P(B)=∑v1∑v2P(A∣V1=v1)P(V1=v1)P(B∣V2=v2)P(V2=v2)
Because the two expressions are identical, we've shown that P(A,B)=P(A)P(B) assuming that Alice's and Bob's decisions are influenced by separate, independent upstream variables V1 and V2. With this extra machinery in hand, we proceed into the heart of our analysis.
Incorporating Dependence into Expected Utility
In The Clone Case for Cooperation section, I make the assumption that Alice and Bob deterministically make exactly the same choice. This time, let's relax that assumption and suppose that because they're rational agents presented with identical payoff matrices, they'll conclude the same mixed strategies to be optimal. Though we're not sure what their cooperate/defect probabilities will be yet, we just suppose that their GTO policies will be the same given the same payout structures.
To model this formulation, we go back to the start and split A's expected utility based on her two possible actions.
E[UA(G)]=E[UA(G|AC)]p∗(AC)+E[UA(G|AD)]p∗(AD)
Then, we split each of those cases for each of B's possible choices.
E[UA(G)]=E[UA(G|AC,BC)]^pA(BC|AC)p∗(AC)+E[UA(G|AC,BD)]^pA(BD|AC)p∗(AC)+E[UA(G|AD,BC)]^pA(BC|AD)p∗(AD)+E[UA(G|AD,BD)]^pA(BD|AD)p∗(AD)
Now we substitue in Q,R,S,T and use p∗(AC)+p∗(AD)=1 and ^pA(BC|Ax)+^pA(BD|Ax)=1.
E[UA(G)]=^pA(BC|AC) p∗(AC) R+ (1−^pA(BC|AC)) p∗(AC) S+ ^pA(BC|AD) (1−p∗(AC)) T+ (1−^pA(BC|AD)) (1−p∗(AC)) Q
Now we need thoughtful ways of modeling ^pA(BC|AC) and ^pA(BC|AD). For Alice's approximation of Bob's probability of collaborating, ^pA(BC|AC), we can suppose that this matches Alice's own probability of collaborating p∗(AC) because of our assumption that rational agents will come to the same mixed strategy policies given identical payoff matrices. So, ^pA(BC|AC)=p∗(AC) and ^pA(BC|AD)=1−^pA(BD|AD)=1−p∗(AD)=p∗(AC).
This turns our E[UA(G)] into:
E[UA(G)]=p∗(AC) p∗(AC) R+ (1−p∗(AC)) p∗(AC) S+ p∗(AC) (1−p∗(AC)) T+ (1−p∗(AC)) (1−p∗(AC)) Q
Which simplifies to:
E[UA(G)]=p∗(AC)2 R+ p∗(AC) (1−p∗(AC)) S+ p∗(AC) (1−p∗(AC)) T+ (1−p∗(AC))2 Q
E[UA(G)]=p∗(AC)2 R+ (p∗(AC)−p∗(AC)2) S+ (p∗(AC)−p∗(AC)2) T+ (1−2p∗(AC)+p∗(AC)2) Q
E[UA(G)]=R p∗(AC)2+ S p∗(AC)−S p∗(AC)2+ T p∗(AC)−T p∗(AC)2+ Q−2 Q p∗(AC)+Q p∗(AC)2
E[UA(G)]=(R−S−T+Q) p∗(AC)2+(S+T−2Q) p∗(AC)+Q
Optimizing (Optional)
All that's left to do from here is to find the p∗(AC) which maximizes E[UA(G)] subject to the constraints that 0≤p∗(AC)≤1 and S<Q<R<T. (The cases get slightly hairy so this section is skippable.)
Unlike the previous case where fE[UA(G)](p∗(AC)) was linear, this time we actually can optimize via the derivative method suggested earlier. To do so, we compute:
dE[UA(G)]dp∗(AC)=2(Q+R−S−T)p∗(AC)+(S+T−2Q)=0
Because fE[UA(G)](p∗(AC)) is a second-degree polynomial, the parabola it sweeps out will have its vertex at p∗(AC)=(2Q−S−T)2(Q+R−S−T). While there's still some tidying up to do to determine which cases apply to which solutions, excitingly, the possible solution set is constrained to {0,(2Q−S−T)2(Q+R−S−T),1}. This vertex won't always be the maximum, and sometimes this vertex will be outside the valid probability range [0,1]. In either of these countercases, the maximum value for p∗(AC) will be either 0 or 1.
We now describe the cases where the vertex is a valid maximum. For the vertex to be a maximum, the parabola must be concave. As we know, a y=ax2+bx+c parabola is concave when a<0. So, Q+R-S-T<0 or Q+R<S+T. For the vertex to be valid, we say it must lie within the domain 0≤p∗(AC)≤1. First, to find the inequalities implied by 0≤p∗(AC)=(2Q−S−T)2(Q+R−S−T), we note that because the parabola is concave down, the denominator Q+R−S−T will be negative, so 0≥2Q−S−T ⟹ S+T≥2Q. Second, p∗(AC)=(2Q−S−T)2(Q+R−S−T)≤1 implies that 2Q−S−T≤2(Q+R−S−T) ⟹ S+T≤2R. We've found that, when Q+R<S+T and 2Q≤S+T≤2R, then the optimal cooperation fraction for Alice will be p∗(AC)=12(Q−RQ+R−S−T)+12.
We still have to sort out when p∗(AC) will be 1 or 0, and there's a bit of tidying up to do when the denominator 2(R−S−T+Q) is zero (which actually was the case in the first example in the Traditional Analysis section where R=−1, S=−3, T=0, and Q=−2). To solve this indeterminate case this when R−S−T+Q=0, we take the second derivative of E[UA(G)]: d2E[UA(G)]d(p∗(AC))2=2(Q+R−S−T). Because in this case Q+R-S-T = 0, the second derivative is zero, implying that the expected utility curve will be linear.
So, when R−S−T+Q=0, E[UA(G)]=(S+T−2Q) p∗(AC)+Q, where here E[UA(G)] is a linear function of p∗(AC). In this edge case, we find that:
For the other cases where we're deciding between 0 or 1, we just need to compare fE[UA(G)](p(AC)=0) with fE[UA(G)](p(AC)=1). That is, we need to compare (R−S−T+Q) 02+(S+T−2Q) 0+Q=Q with (R−S−T+Q) 12+(S+T−2Q) 1+Q = (R−S−T+Q)+(S+T−2Q)+Q=R. That is, in these cases, if R>Q then p∗(AC)=1, and if Q>R then p∗(AC)=0.
Putting this all together:
Represented as one piecewise expression:
p∗(AC)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩12(Q−RQ+R−S−T)+12if Q+R<S+T and 2Q≤S+T≤2R,0if Q+R=S+T and S+T−2Q<0,p∗(AC)∈[0,1]if Q+R=S+T and S+T−2Q=0,1if Q+R=S+T and S+T−2Q>0,0if Q+R>S+T and R<Q,12if Q+R>S+T and R=Q,1if Q+R>S+T and R>Q.
At last, if we make the assumption that A and B will conclude the same mixed strategy fractions to be optimal, given that they have identical payoffs, we now have a complete policy for determining their choices of optimal p∗(AC) and p∗(BC).
Interpretation
Let's interpret this solution a bit to make sure it accords with our intuitions. We start with the complicated seeming expression, p∗(AC)=12(Q−RQ+R−S−T)+12, in the case where Q+R<S+T and 2Q≤S+T≤2R.
This equation indicates that the optimal cooperation probability depends on the difference between the Quarrel and Reward payoffs, normalized by the overall differences between the players' payoffs when they perform the same action minus when they perform different actions. This value should be between -1 and 1, which bounds p∗(AC) between -1 and 1, as we would expect a probability to behave.
We continue to examine some important boundary conditions:
This result seems reasonable because if the quarrel payoff Q is better than the reward for mutual cooperation R, Alice is more likely to defect. Conversely, if R is better, Alice is more likely to cooperate.
Let's now try to apply this policy to our initial toy example. To see if our solution holds, we test the case when R=−1, S=−3, T=0, Q=−2.
BCBDAC(−1,−1)(−3,0)AD(0,−3)(−2,−2)
First, we calculate R−S−T+Q=−1−(−3)−0+(−2)=0. Then, because R−S−T+Q=0, we use the indeterminate case formula. We calculate S+T−2Q=−3+0−2(−2)=1. Finally, since, S+T−2Q>0, the optimal policy is p∗(AC)=1. This aligns with the interpretation that Alice should always cooperate in this particular setup.
We can also confirm this computationally as well.
Overall, the analysis and derived equations seem to correctly capture the dependence and provide a policy for determining the optimal cooperation probability in this correlated Prisoner's Dilemma. From what I can tell based on the tests I've performed, the solution works as intended and aligns with my theoretical expectations from the payoffs.
Incorporating Correlation into Expected Utility
If you're not convinced by the case where we assume that two rational agents make exactly the same decision given identical payoffs or by the extended case where the two rational agents converge on the same optimal mixed strategy policy, I present a third generalization which introduces noise into our model.
To model the dependence structure between the agents' decisions alongside some natural uncertainty, let's introduce a correlation parameter ρ that captures the degree of correlation between Alice's and Bob's choices due to the common upstream factors, V. We can think of ρ as a measure of how likely it is that if Alice cooperates, Bob will also cooperate, and similarly for defection. This parameter will range from -1 to 1, where:
With this in mind, let's redefine the probabilities ^pA(BC|AC), ^pA(BD|AC), ^pA(BC|AD), and ^pA(BD|AD) to incorporate ρ.
We assume the following relationships for the conditional probabilities based on ρ:
These expressions ensure that the probabilities are consistent with the correlation parameter.
We can now substitute these probabilities into our expected utility equation for Alice from earlier:
E[UA(G)]=(E[UA(G|AC,BC)]^pA(BC|AC)+E[UA(G|AC,BD)]^pA(BD|AC))p∗(AC)+(E[UA(G|AD,BC)]^pA(BC|AD)+E[UA(G|AD,BD)]^pA(BD|AD))p∗(AD)
E[UA(G)]=(E[UA(G|AC,BC)]1+ρ2+E[UA(G|AC,BD)]1−ρ2)p∗(AC)+(E[UA(G|AD,BC)]1−ρ2+E[UA(G|AD,BD)]1+ρ2)p∗(AD)
Now, substitute the payoffs R,S,T,Q and use p∗(AC)+p∗(AD)=1:
E[UA(G)]=(R1+ρ2+S1−ρ2)p∗(AC)+(T1−ρ2+Q1+ρ2)(1−p∗(AC))
Simplify further:
E[UA(G)]=(R(1+ρ)+S(1−ρ)2)p∗(AC)+(T(1−ρ)+Q(1+ρ)2)(1−p∗(AC))
E[UA(G)]=(R+S+ρ(R−S)2)p∗(AC)+(T+Q+ρ(Q−T)2)(1−p∗(AC))
Combine terms:
E[UA(G)]=(R+S+ρ(R−S)2)p∗(AC)+(T+Q+ρ(Q−T)2)−(T+Q+ρ(Q−T)2)p∗(AC)
E[UA(G)]=p∗(AC)(R+S+ρ(R−S)−T−Q−ρ(Q−T)2)+(T+Q+ρ(Q−T)2)
Simplify the coefficient of p∗(AC):
E[UA(G)]=p∗(AC)(R+S−T−Q+ρ(R−S−Q+T)2)+(T+Q+ρ(Q−T)2)
Because we're looking to maximize E[UA(G)], the optimal strategy for Alice depends on the sign of the coefficient of p∗(AC):
If R+S−T−Q+ρ(R−S−Q+T)2>0, then p∗(AC)=1. This implies that, if ρ>Q−R−S+T−Q+R−S+T, then p∗(AC)=1.
Similarly, if R+S−T−Q+ρ(R−S−Q+T)2<0, then p∗(AC)=0. This implies that, if ρ<Q−R−S+T−Q+R−S+T, then p∗(AC)=0.
In this section, we've incorporated the correlation parameter ρ into our analysis to allow us to capture the dependency between Alice's and Bob's choices caused by common upstream factors. We've found that this extension suggests that the degree of correlation between the actions of the two players significantly influences their optimal strategies.
p∗(AC)=⎧⎪⎨⎪⎩1if ρ>Q−R−S+T−Q+R−S+T0if ρ<Q−R−S+T−Q+R−S+T
Specifically, when ρ is sufficiently positive, there is a correlation threshold above which mutual cooperation becomes the dominant strategy for both players.
The Prisoners Do-Calculus
In the final most general case, we formalize the problem in the language of Pearl's do-calculus. Instead of assuming a ρ-model for correlation, we maintain a non-parametric perspective in this final stage.
We start again by defining the expected utility of the game to Alice:
E[UA(G)]=E[UA(G∣do(AC))]p∗(AC)+E[UA(G∣do(AD))]p∗(AD)
This time the expected utility of the game to Alice is the sum of the expected utility of the game to her given that she chooses to cooperate and the expected utility of the game to Alice given that she chooses to defect, weighted by the optimal probability that she does either. We now decompose the new terms E[UA(G∣do(AC))] and E[UA(G∣do(AD))].
E[UA(G∣do(AC))]=E[UA((G∣BC)∣do(AC))] ^pA(BC∣do(AC))+E[UA((G∣BD)∣do(AC)] ^pA(BD∣do(AC))
E[UA(G∣do(AD))]=E[UA((G∣BC)∣do(AD))] ^pA(BC∣do(AD))+E[UA((G∣BD)∣do(AD)] ^pA(BD∣do(AD))
That is,
E[UA(G∣do(AC))]=R⋅^pA(BC∣do(AC))+S⋅^pA(BD∣do(AC))
E[UA(G∣do(AD))]=T⋅^pA(BC∣do(AD))+Q⋅^pA(BD∣do(AD))
And,
E[UA(G)]=R⋅^pA(BC∣do(AC))p∗(AC)+ S⋅^pA(BD∣do(AC))p∗(AC)+ T⋅^pA(BC∣do(AD))p∗(AD)+ Q⋅^pA(BD∣do(AD))p∗(AD)
Given the dependence DAG earlier, we can incorporate our upstream variable V into our refined expected utility calculation using do-calculus.
^pA(BC∣do(AC))=∑vP(BC∣AC,v)P(v)
^pA(BD∣do(AC))=1−∑vP(BC∣AC,v)P(v)
^pA(BC∣do(AD))=∑vP(BC∣AD,v)P(v)
^pA(BD∣do(AD))=1−∑vP(BC∣AD,v)P(v)
Substituting these probabilities back into Alice's expected utility formula, we get:
E[UA(G)]=(R∑vP(BC∣AC,v)P(v)+S(1−∑vP(BC∣AC,v)P(v)))p∗(AC)+(T∑vP(BC∣AD,v)P(v)+Q(1−∑vP(BC∣AD,v)P(v)))p∗(AD)
Simplifying the expression:
E[UA(G)]=(R∑vP(BC∣AC,v)P(v)+S(1−∑vP(BC∣AC,v)P(v)))p∗(AC)+(T∑vP(BC∣AD,v)P(v)+Q(1−∑vP(BC∣AD,v)P(v)))p∗(AD)E[UA(G)]=(R∑vP(BC∣AC,v)P(v)+S−S∑vP(BC∣AC,v)P(v))p∗(AC)+(T∑vP(BC∣AD,v)P(v)+Q−Q∑vP(BC∣AD,v)P(v))p∗(AD)E[UA(G)]=((R−S)∑vP(BC∣AC,v)P(v)+S)p∗(AC)+((T−Q)∑vP(BC∣AD,v)P(v)+Q)(1−p∗(AC))E[UA(G)]=(R−S)∑vP(BC∣AC,v)P(v)p∗(AC)+Sp∗(AC)+(T−Q)∑vP(BC∣AD,v)P(v)−(T−Q)∑vP(BC∣AD,v)P(v)p∗(AC)+Q−Qp∗(AC)E[UA(G)]=((R−S)∑vP(BC∣AC,v)P(v)−(T−Q)∑vP(BC∣AD,v)P(v)+S−Q)p∗(AC)+(T−Q)∑vP(BC∣AD,v)P(v)+Q
This expression now shows the expected utility of the game to Alice in terms of her probability of cooperating p∗(AC) and incorporates the dependencies through the common factor V. We find that the expected utility of the game E[UA(G)] is maximized when:
(R−S)∑vP(BC∣AC,v)P(v)−(T−Q)∑vP(BC∣AD,v)P(v)+S−Q>0
(R−S)∑vP(BC∣AC,v)P(v)+S>(T−Q)∑vP(BC∣AD,v)P(v)+Q
If the inequality above is satisfied, then Alice should cooperate; otherwise, she should defect.
Confirmation
To test this most general form, we map it down to each of the particular cases we explored in the preceding sections.
Traditional Independent Case
In the first solution we assume independence between Alice's and Bob's choices. This means P(BC∣AC,v)=P(BC) and P(BC∣AD,v)=P(BC).
We substitute these probabilities into our general utility function, yielding:
E[UA(G)]=R⋅p∗(BC) p∗(AC)+ S⋅p∗(BD) p∗(AC)+ T⋅p∗(BC) p∗(AD)+ Q⋅p∗(BD) p∗(AD)
This simplifies down to:
E[UA(G)]=p∗(AC)[(R−S−T+Q)P(BC)+S−Q]+(T−Q)P(BC)+Q
This is exactly the expression we found for Alice's expected utility in that section, so pat on the back, onto the next section.
Clone Case for Cooperation
In the second model, Alice and Bob are assumed to make exactly the same decision, implying perfect correlation. Thus, P(BC∣AC,v)=1 and P(BC∣AD,v)=0. This simplifies the general solution as follows:
E[UA(G)]=R⋅1 p∗(AC)+ S⋅1 p∗(AC)+ T⋅0 p∗(AD)+ Q⋅0 p∗(AD)
E[UA(G)]=p∗(AC)(R+S)
Because R>S, E[UA(G)] is maximized when p∗(AC)=1. Just as we anticipated, Alice should always cooperate, aligning with our expectations again.
Identical Mixed Strategies
In the next model, we extend and suggest that Alice and Bob converge on the same mixed strategy. That is, ^pA(BC∣AC,v)=p∗(AC) and ^pA(BC∣AD,v)=p∗(AD). This gives us:
E[UA(G)]=R⋅p∗(AC) p∗(AC)+ S⋅p∗(AD) p∗(AC)+ T⋅p∗(AC) p∗(AD)+ Q⋅p∗(AD) p∗(AD)
This is the same result we expected as well, confirming our general causal model.
Correlated Mixed Policy
In the final model, we introduced correlation parameter ρ to incorporate possible noise.
We make the following definitions:
^pA(BC∣AC,v)=1+ρ2
^pA(BD∣AC,v)=1−ρ2
^pA(BC∣AD,v)=1−ρ2
^pA(BD∣AD,v)=1+ρ2
From this, the corresponding ρ-model falls out.
E[UA(G)]=R(1+ρ2)p∗(AC)+S(1−ρ2)p∗(AC)+T(1−ρ2)(1−p∗(AC))+Q(1+ρ2)(1−p∗(AC))
The language of do-calculus is quite appealing for its generality, and it, in my eyes, models the reality of the problem at hand extremely well.
Conclusion
This framework implies that, despite typical Game Theory 101 lectures suggesting 100% selfish play in one-shot games like this, if you believe that you think sufficiently similarly to your counterparty, the game theory optimal policy would be to faithfully cooperate and both walk away free.
A broader conclusion of this analysis is to always consider higher-level dependencies, even when two things seem to be perfectly independent. We should also include all possible information into our decisions, and this includes the decisions themselves. In this Prisoner's Dilemma situation, we can consider this in some sense to be the opposite of "adverse selection". Oftentimes, in adversarial games, conditional on performing an action, you're less happy. There are many possible reasons for this, but a common reason is that you're acting in a market and your ability to perform an action means that no one else in the market was willing to do what you just did, which gives you information that you might be making a mistake. This could be hiring a candidate where your ability to hire them means no other firm wanted them at the price you're willing to pay. Or this could be winning an auction where your winning means that no one else was willing to pay the price that you paid. However, in the case we have at hand, there's a sort of "advantageous selection". This is because, if you choose to cooperate, now you get extra information that someone else in a similar position also likely cooperated as well, which is quite a pleasant type of selection.
For those who remain compelled by the original argument that, "still, if you know your opponent is going to cooperate, you're better off defecting to serve no time instead of 1 year", I might share some of my ideas on why I suspect that the recursive expected utility maximization decision algorithm that I present is superior in a follow-up essay. But for now, I'll just say that while the Causal Decision Theorists are stuck in their (defect, defect) "rational" Nash equilibria, my co-conspirators and I will be faithfully cooperating and walking free into the warm sun of Pareto Optimality.
Acknowledgements: Thank you Alok Singh, Baran Cimen, Keegan McNamara, and Paul Schmidt-Engelbertz for reading through the draft and providing helpful guidance.