It occurred to me one day that the standard visualization of the Prisoner's Dilemma is fake.
The core of the Prisoner's Dilemma is this symmetric payoff matrix:
1: C | 1: D | |
2: C | (3, 3) | (5, 0) |
2: D | (0, 5) | (2, 2) |
Player 1, and Player 2, can each choose C or D. 1 and 2's utility for the final outcome is given by the first and second number in the pair. For reasons that will become apparent, "C" stands for "cooperate" and D stands for "defect".
Observe that a player in this game (regarding themselves as the first player) has this preference ordering over outcomes: (D, C) > (C, C) > (D, D) > (C, D).
D, it would seem, dominates C: If the other player chooses C, you prefer (D, C) to (C, C); and if the other player chooses D, you prefer (D, D) to (C, D). So you wisely choose D, and as the payoff table is symmetric, the other player likewise chooses D.
If only you'd both been less wise! You both prefer (C, C) to (D, D). That is, you both prefer mutual cooperation to mutual defection.
The Prisoner's Dilemma is one of the great foundational issues in decision theory, and enormous volumes of material have been written about it. Which makes it an audacious assertion of mine, that the usual way of visualizing the Prisoner's Dilemma has a severe flaw, at least if you happen to be human.
The classic visualization of the Prisoner's Dilemma is as follows: you are a criminal, and you and your confederate in crime have both been captured by the authorities.
Independently, without communicating, and without being able to change your mind afterward, you have to decide whether to give testimony against your confederate (D) or remain silent (C).
Both of you, right now, are facing one-year prison sentences; testifying (D) takes one year off your prison sentence, and adds two years to your confederate's sentence.
Or maybe you and some stranger are, only once, and without knowing the other player's history, or finding out who the player was afterward, deciding whether to play C or D, for a payoff in dollars matching the standard chart.
And, oh yes - in the classic visualization you're supposed to pretend that you're entirely selfish, that you don't care about your confederate criminal, or the player in the other room.
It's this last specification that makes the classic visualization, in my view, fake.
You can't avoid hindsight bias by instructing a jury to pretend not to know the real outcome of a set of events. And without a complicated effort backed up by considerable knowledge, a neurologically intact human being cannot pretend to be genuinely, truly selfish.
We're born with a sense of fairness, honor, empathy, sympathy, and even altruism - the result of our ancestors adapting to play the iterated Prisoner's Dilemma. We don't really, truly, absolutely and entirely prefer (D, C) to (C, C), though we may entirely prefer (C, C) to (D, D) and (D, D) to (C, D). The thought of our confederate spending three years in prison, does not entirely fail to move us.
In that locked cell where we play a simple game under the supervision of economic psychologists, we are not entirely and absolutely unsympathetic for the stranger who might cooperate. We aren't entirely happy to think what we might defect and the stranger cooperate, getting five dollars while the stranger gets nothing.
We fixate instinctively on the (C, C) outcome and search for ways to argue that it should be the mutual decision: "How can we ensure mutual cooperation?" is the instinctive thought. Not "How can I trick the other player into playing C while I play D for the maximum payoff?"
For someone with an impulse toward altruism, or honor, or fairness, the Prisoner's Dilemma doesn't really have the critical payoff matrix - whatever the financial payoff to individuals. (C, C) > (D, C), and the key question is whether the other player sees it the same way.
And no, you can't instruct people being initially introduced to game theory to pretend they're completely selfish - any more than you can instruct human beings being introduced to anthropomorphism to pretend they're expected paperclip maximizers.
To construct the True Prisoner's Dilemma, the situation has to be something like this:
Player 1: Human beings, Friendly AI, or other humane intelligence.
Player 2: UnFriendly AI, or an alien that only cares about sorting pebbles.
Let's suppose that four billion human beings - not the whole human species, but a significant part of it - are currently progressing through a fatal disease that can only be cured by substance S.
However, substance S can only be produced by working with a paperclip maximizer from another dimension - substance S can also be used to produce paperclips. The paperclip maximizer only cares about the number of paperclips in its own universe, not in ours, so we can't offer to produce or threaten to destroy paperclips here. We have never interacted with the paperclip maximizer before, and will never interact with it again.
Both humanity and the paperclip maximizer will get a single chance to seize some additional part of substance S for themselves, just before the dimensional nexus collapses; but the seizure process destroys some of substance S.
The payoff matrix is as follows:
1: C | 1: D | |
2: C | (2 billion human lives saved, 2 paperclips gained) | (+3 billion lives, +0 paperclips) |
2: D | (+0 lives, +3 paperclips) | (+1 billion lives, +1 paperclip) |
I've chosen this payoff matrix to produce a sense of indignation at the thought that the paperclip maximizer wants to trade off billions of human lives against a couple of paperclips. Clearly the paperclip maximizer should just let us have all of substance S; but a paperclip maximizer doesn't do what it should, it just maximizes paperclips.
In this case, we really do prefer the outcome (D, C) to the outcome (C, C), leaving aside the actions that produced it. We would vastly rather live in a universe where 3 billion humans were cured of their disease and no paperclips were produced, rather than sacrifice a billion human lives to produce 2 paperclips. It doesn't seem right to cooperate, in a case like this. It doesn't even seem fair - so great a sacrifice by us, for so little gain by the paperclip maximizer? And let us specify that the paperclip-agent experiences no pain or pleasure - it just outputs actions that steer its universe to contain more paperclips. The paperclip-agent will experience no pleasure at gaining paperclips, no hurt from losing paperclips, and no painful sense of betrayal if we betray it.
What do you do then? Do you cooperate when you really, definitely, truly and absolutely do want the highest reward you can get, and you don't care a tiny bit by comparison about what happens to the other player? When it seems right to defect even if the other player cooperates?
That's what the payoff matrix for the true Prisoner's Dilemma looks like - a situation where (D, C) seems righter than (C, C).
But all the rest of the logic - everything about what happens if both agents think that way, and both agents defect - is the same. For the paperclip maximizer cares as little about human deaths, or human pain, or a human sense of betrayal, as we care about paperclips. Yet we both prefer (C, C) to (D, D).
So if you've ever prided yourself on cooperating in the Prisoner's Dilemma... or questioned the verdict of classical game theory that the "rational" choice is to defect... then what do you say to the True Prisoner's Dilemma above?
Apparently 3 comments will be needed.
[9:51]
But, before you choose, you are told how the benefactor decided how much money to put in the opaque box- and that brings us to the science fiction part of the scenario. What the benefactor did was take a very detailed local snapshot of the state of the universe a few minutes ago, and then run a faster-than-real time simulation to predict with high accuracy to predict with high accuracy whether you would take both boxes, or just the opaque box. A million dollars was put in the opaque box if and only if you were predicted to take only the opaque box.
[10:22]
Admittedly the super-predictability here is a bit physically implausible, and goes beyond a mere stipulation of determinism. Still, at least it's not logically impossible- provided that the simulator can avoid having to simulate itself, and thus avoid a potential infinite regress. (The opaque box's opacity is important in that regard: it serves to insulate you from being effectively informed of the outcome of the simulation itself, so the simulation doesn't have to predict its own outcome in order to predict what you are going to have to do.) So, let's indulge the super-predictability assumption, and see what comes from it. Eventually, I'm going to argue that the real world is at least deterministic enough and predictable enough that some of the science-fiction conclusions do carry over to reality.
[11:12]
So, you now face the following choice: if you take the opaque box alone, then you can expect with high reliability that the simulation predicted you would do so, and so you expect to find a million dollars in the opaque box. If, on the other hand, you take both boxes, then you should expect the simulation to have predicted that, and you expect to find nothing in the opaque box. If and only if you expect to take the opaque box alone, you expect to walk away with a million dollars. Of course, your choice does not cause the opaque box's content to be one way or the other; according to the stipulated rules, the box content already is what it is, and will not change from that regardless of what choice you make.
[11:49]
But we can apply the lesson from the handraising example- the lesson that you sometimes have a choice about things your action does not change or cause- because you can reason about what would be the case if, perhaps contrary to fact, you were to take a particular hypothetical action. And, in fact, we can regard Newcomb's Problem as essentially harnessing the same past predicate consequence as in the handraising example- namely, if and only if you take just the opaque box, then the past state of the universe, at the time the predictor took the detailed snapshot was such that that state leads, by physical laws, to your taking just the opaque box. And, if and only if the past state was thus, the predictor would predict you taking the opaque box alone, and so a million dollars would be in the opaque box, making that the more lucrative choice. And it's certainly the case that people who would make the opaque box choice have a much higher expected gain from such encounters than those who take both boxes.
[12:47]
Still, it's possible to maintain, as many people do, that taking both boxes is the rational choice, and that the situation is essentially rigged to punish you for your predicted rationality- much as if a written exam were perversely graded to give points only for wrong answers. From that perspective, taking both boxes is the rational choice, even if you are then left to lament your unfortunate rationality. But that perspective is, at the very least, highly suspect in a situation where, unlike the hapless exam-taker, you are informed of the rigging and can take it into account when choosing your action, as you can in Newcomb's Problem.
[13:31]
And, by the way, it's possible to consider an even stranger variant of Newcomb's Problem, in which both boxes are transparent. In this version, the predictor runs a simulation that tentatively presumes that you'll see a million dollars in the larger box. You'll be presented with a million dollars in the box for real if and only if the simulation shows that you would then take the million dollar box alone. If, instead, the simulation predicts that you would take both boxes if you see a million dollars in the larger box, then the larger box is left empty when presented for real.
[14:12]
So, let's suppose you're confronted with this scenario, and you do see a million dollars in the box when it's presented for real. Even though the million dollars is already there, and you see it, and it can't change, nonetheless I claim that you should still take the million dollar box alone. Because, if you were to take both boxes instead, contrary to what in fact must be the case in order for you to be in this situation in the first place, then, also contrary to what is in fact the case, the box would not contain a million dollars- even though in fact it does, and even though that can't change! The same two-part reasoning applies as before: if and only if you were to take just the larger box, then the state of the universe at the time the predictor takes a snapshot must have been such that you would take just that box if you were to see a million dollars in that box. If and only if the past state had been thus, the Predictor would have put a million dollars in the box.
[15:07]
Now, the prescription here to take just the larger box is more shockingly counter-intuitive than I can hope to decisively argue for in a brief talk, but, do at least note that a person who agrees that it is rational to take just the one box here does fare better than a person who believes otherwise, who would never be presented with a million dollars in the first place. If we do, at least tentatively, accept some of this analysis, for the sake of argument to see what follows from it, then we can move on now to another toy scenario, which dispenses with the determinism and super-prediction assumptions and arguably has more direct real world applicability.
[15:42]
That scenario is the famous prisoner's dilemma. The prisoner's dilemma is a two player game in which both players make their moves simultaneously and independently, with no communication until both moves have been made. A move consists of writing down either the word “cooperate” or “defect.” The payoff matrix is as shown:
[insert image of Prisoner's Dilemma payoffs]
If both players choose cooperate, they both receive 99 dollars. If both defect, they both get 1 dollar. But if one player cooperates and the other defects, then the one who cooperates gets nothing, and the one who defects gets 100 dollars.
[16:25]
Crucially, we stipulate that each player cares only about maximizing her own expected payoff, and that the payoff in this particular instance of the game is the only goal, with no affect on anything else, including any subsequent rounds of the game, that could further complicate the decision. Let's assume that both players are smart and knowledgeable enough to find the correct solution to this problem and to act accordingly. What I mean by the correct answer is the one that maximizes that player's expected payoff. Let's further assume that each player is aware of the other player's competence, and their knowledge of their own competence, and so on. So then, what is the right answer that they'll both find?
[17:07]
On the face of it, it would be nice if both players were to cooperate, and receive close to the maximum payoff. But if I'm one of the players, I might reason that y opponent's move is causally independent of mine: regardless of what I do, my opponent's move is either to cooperate or not. If my opponent cooperates, I receive a dollar more if I defect than if I cooperate- 100$ vs 99$. Likewise if my opponent defects: I get a dollar more if I defect than if I cooperate, in this case 1 dollar vs nothing. So, in either case, regardless of what move my opponent makes, my defected causes me to get one dollar more than my cooperating causes me to get, which seemingly makes defected the right choice. Defecting is indeed the choice that's endorsed by standard game theory. And of course my opponent can reason similarly.
[18:06]
So, if we're both convinced that we only have a choice about what we can cause, then we're both rationally compelled to defect, leaving us both much poorer than if we both cooperated. So, here again, an exclusively causal view of what we have a choice about leads to us having to lament that our unfortunate rationality keeps a much better outcome out of our reach. But we can arrive at a better outcome if we keep in mind the lesson from Newcomb's problem or even the handraising example that it can make sense to act for the sake of what would be the case if you so acted, even if your action does not cause it to be the case. Even without the help of any super-predictors in this scenario, I can reason that if I, acting by stipulation as a correct solver of this problem, were to choose to cooperate, then that's what correct solvers of this problem do in such situations, and in particular that's what my opponent, as a correct solver of this problem, does too.
[19:05]
Similarly, if I were to figure out that defecting is correct, that's what I can expect my opponent to do. This is similar to my ability to predict what your answer to adding a given pair of numbers would be: I can merely add the numbers myself, and, given our mutual competence at addition, solve the problem. The universe is predictable enough that we routinely, and fairly accurately, make such predictions about one another. From this viewpoint, I can reason that, if I were to cooperate or not, then my opponent would make the corresponding choice- if... (read more)