Now that we know a bit about derivatives, it's time to use them to find dominant strategies and Nash equilibria. It helps if the reader is familiar with Nash equilibria already.
We can see that the payoff for Prisoner 1 depends on her own action (Cooperate/Defect) but also on the action of Prisoner 2. Therefore, the payoff function for Prisoner 1 is a multivariable function: V1(a1,a2), where an is the action of Prisoner n (and n∈{1,2}). Let's say an=0 when the action of Prisoner n is Cooperate, and an=1 for Defect. So an∈{0,1}. Then V1(a1,a2)=20−20a2+10a1, and crucially, V1′a1(a1,a2)=10. So for Defect (a1=1), Prisoner 1's payoff will be 10 higher than for Cooperate (a2=0), as can be confirmed in the table. Note that a2 doesn't show up in V1′a1(a1,a2): Defect gives $10 more for Prisoner 1 regardless of what Prisoner 2 does, which makes Defect a dominant strategy. Don't get me wrong: Prisoner 1's payoff certainly does depend on what Prisoner 2 does. The point is that no matter what Prisoner 2 does, Prisoner 1's payoff will be $10 higher when she (Prisoner 1) defects - and that's what's reflected in V1′a1(a1,a2)=10.
Since the payoff matrix is symmetrical, V2(a1,a2)=20−20a1+10a2 and V2′a2(a1,a2)=10. Prisoner 2 therefore also has a dominant strategy: Defect. The Prisoner's dilemma, then, has a Nash equilibrium: when both prisoners defect. With the partial derivatives, we demonstrated that when both prisoners defect, no one prisoner can do better by changing her action to Cooperate. If e.g. Prisoner 1 were to do this, then a1 would go from 1 to 0, and since V1′a1(a1,a2)>0, that would lower V1 (regardless of a2). By symmetry, the same is true for Prisoner 2.
Nonlinear payoff functions
In the Prisoner's dilemma, the payoffs of both players (prisoners) can be modelled by linear payoff functions. What if the payoffs are nonlinear?
Let's say V1(a1,a2)=−a21 and V2(a1,a2)=−a22+a2. Then V1′a1(a1,a2)=−2a1 and V2′a2(a1,a2)=−2a2+1. A Nash equilibrium is a point where no player can do better by doing another action given the action of the other player; therefore, V1(a1,a2) should be maximized with respect to a1 while keeping a2 constant, whereas V2(a1,a2) should be maximized with respect to a2 while keeping a1 constant. If V1(a1,a2) has a peak value with respect to a1, V1′a1(a1,a2)=−2a1 must be 0 in that point. V1′a1(a1,a2)=−2a1=0 gives a1=0−2=0. So a1=0 could represent a peak, but also a valley, since V1′a1(a1,a2) would be 0 in both. If V1′a1(a1,a2)=−2a1, V1′′a1(a1,a2)=−2<0. So a1=0 represents a local maximum in V1(a1,a2) (when a2 is held constant)! Since V1(a1,a2) is quadratic, we can be sure this local maximum is the global maximum too (so there are no values for a1 for which V1(a1,a2) is higher when a2 is held constant).
V2′a2(a1,a2)=−2a2+1=0 gives 2a2=1 and a2=12. V2′′a2(a1,a2)=−2<0, so a2=12 again represents a local maximum. V2(a1,a2) is quadratic, so this is a global maximum as well.
So a1=0 represents a global maximum for V1 (for a constant a2), and a2=12represents a global maximum for V2 (for a constant a1). That means a1=0 is a dominant strategy for player 1, a2=12 is a dominant strategy for player 2 and we have a Nash equilibrium in (a1=0,a2=12).
Making things a bit more complicated
Let's now define V1(a1,a2)=−a21∗a2 and V2(a1,a2)=−(a2−1)2. Then for V1′a1(a1,a2)=−2a2∗a1=0, we have a2=0∨a1=0. V1′′a1(a1,a2)=−2a2, which is negative when a2>0.
For V2′a2=−2a2+2=0 we have a2=1. V2′′a2=−2<0, so this is a local optimum - and also the global one, since V2(a1,a2) is quadratic. For a2=1, V1′a1=−2∗1∗a1=−2a1. Solving for 0 gives a1=0 (which we found earlier as well). And since a2=1>0 and therefore V1′′a1(a1,a2)<0, we now have a local maximum for V1(a1,a2)! For a constant a2, V1(a1,a2) is quadratic, so this is the global maximum as well. We found a Nash equilibrium: (a1=0,a2=1).
Now that we know a bit about derivatives, it's time to use them to find dominant strategies and Nash equilibria. It helps if the reader is familiar with Nash equilibria already.
Prisoner's dilemma
The payoff matrix of the Prisoner's dilemma can be as follows:
We can see that the payoff for Prisoner 1 depends on her own action (Cooperate/Defect) but also on the action of Prisoner 2. Therefore, the payoff function for Prisoner 1 is a multivariable function: V1(a1,a2), where an is the action of Prisoner n (and n∈{1,2}). Let's say an=0 when the action of Prisoner n is Cooperate, and an=1 for Defect. So an∈{0,1}. Then V1(a1,a2)=20−20a2+10a1, and crucially, V1′a1(a1,a2)=10. So for Defect (a1=1), Prisoner 1's payoff will be 10 higher than for Cooperate (a2=0), as can be confirmed in the table. Note that a2 doesn't show up in V1′a1(a1,a2): Defect gives $10 more for Prisoner 1 regardless of what Prisoner 2 does, which makes Defect a dominant strategy. Don't get me wrong: Prisoner 1's payoff certainly does depend on what Prisoner 2 does. The point is that no matter what Prisoner 2 does, Prisoner 1's payoff will be $10 higher when she (Prisoner 1) defects - and that's what's reflected in V1′a1(a1,a2)=10.
Since the payoff matrix is symmetrical, V2(a1,a2)=20−20a1+10a2 and V2′a2(a1,a2)=10. Prisoner 2 therefore also has a dominant strategy: Defect. The Prisoner's dilemma, then, has a Nash equilibrium: when both prisoners defect. With the partial derivatives, we demonstrated that when both prisoners defect, no one prisoner can do better by changing her action to Cooperate. If e.g. Prisoner 1 were to do this, then a1 would go from 1 to 0, and since V1′a1(a1,a2)>0, that would lower V1 (regardless of a2). By symmetry, the same is true for Prisoner 2.
Nonlinear payoff functions
In the Prisoner's dilemma, the payoffs of both players (prisoners) can be modelled by linear payoff functions. What if the payoffs are nonlinear?
Let's say V1(a1,a2)=−a21 and V2(a1,a2)=−a22+a2. Then V1′a1(a1,a2)=−2a1 and V2′a2(a1,a2)=−2a2+1. A Nash equilibrium is a point where no player can do better by doing another action given the action of the other player; therefore, V1(a1,a2) should be maximized with respect to a1 while keeping a2 constant, whereas V2(a1,a2) should be maximized with respect to a2 while keeping a1 constant. If V1(a1,a2) has a peak value with respect to a1, V1′a1(a1,a2)=−2a1 must be 0 in that point. V1′a1(a1,a2)=−2a1=0 gives a1=0−2=0. So a1=0 could represent a peak, but also a valley, since V1′a1(a1,a2) would be 0 in both. If V1′a1(a1,a2)=−2a1, V1′′a1(a1,a2)=−2<0. So a1=0 represents a local maximum in V1(a1,a2) (when a2 is held constant)! Since V1(a1,a2) is quadratic, we can be sure this local maximum is the global maximum too (so there are no values for a1 for which V1(a1,a2) is higher when a2 is held constant).
V2′a2(a1,a2)=−2a2+1=0 gives 2a2=1 and a2=12. V2′′a2(a1,a2)=−2<0, so a2=12 again represents a local maximum. V2(a1,a2) is quadratic, so this is a global maximum as well.
So a1=0 represents a global maximum for V1 (for a constant a2), and a2=12represents a global maximum for V2 (for a constant a1). That means a1=0 is a dominant strategy for player 1, a2=12 is a dominant strategy for player 2 and we have a Nash equilibrium in (a1=0,a2=12).
Making things a bit more complicated
Let's now define V1(a1,a2)=−a21∗a2 and V2(a1,a2)=−(a2−1)2. Then for V1′a1(a1,a2)=−2a2∗a1=0, we have a2=0∨a1=0. V1′′a1(a1,a2)=−2a2, which is negative when a2>0.
For V2′a2=−2a2+2=0 we have a2=1. V2′′a2=−2<0, so this is a local optimum - and also the global one, since V2(a1,a2) is quadratic. For a2=1, V1′a1=−2∗1∗a1=−2a1. Solving for 0 gives a1=0 (which we found earlier as well). And since a2=1>0 and therefore V1′′a1(a1,a2)<0, we now have a local maximum for V1(a1,a2)! For a constant a2, V1(a1,a2) is quadratic, so this is the global maximum as well. We found a Nash equilibrium: (a1=0,a2=1).