Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The True Prisoner's Dilemma

45 Post author: Eliezer_Yudkowsky 03 September 2008 09:34PM

It occurred to me one day that the standard visualization of the Prisoner's Dilemma is fake.

The core of the Prisoner's Dilemma is this symmetric payoff matrix:

1: C 1:  D
2: C (3, 3) (5, 0)
2: D (0, 5) (2, 2)

Player 1, and Player 2, can each choose C or D.  1 and 2's utility for the final outcome is given by the first and second number in the pair.  For reasons that will become apparent, "C" stands for "cooperate" and D stands for "defect".

Observe that a player in this game (regarding themselves as the first player) has this preference ordering over outcomes:  (D, C) > (C, C) > (D, D) > (C, D).

D, it would seem, dominates C:  If the other player chooses C, you prefer (D, C) to (C, C); and if the other player chooses D, you prefer (D, D) to (C, D).  So you wisely choose D, and as the payoff table is symmetric, the other player likewise chooses D.

If only you'd both been less wise!  You both prefer (C, C) to (D, D).  That is, you both prefer mutual cooperation to mutual defection.

The Prisoner's Dilemma is one of the great foundational issues in decision theory, and enormous volumes of material have been written about it.  Which makes it an audacious assertion of mine, that the usual way of visualizing the Prisoner's Dilemma has a severe flaw, at least if you happen to be human.

The classic visualization of the Prisoner's Dilemma is as follows: you are a criminal, and you and your confederate in crime have both been captured by the authorities.

Independently, without communicating, and without being able to change your mind afterward, you have to decide whether to give testimony against your confederate (D) or remain silent (C).

Both of you, right now, are facing one-year prison sentences; testifying (D) takes one year off your prison sentence, and adds two years to your confederate's sentence.

Or maybe you and some stranger are, only once, and without knowing the other player's history, or finding out who the player was afterward, deciding whether to play C or D, for a payoff in dollars matching the standard chart.

And, oh yes - in the classic visualization you're supposed to pretend that you're entirely selfish, that you don't care about your confederate criminal, or the player in the other room.

It's this last specification that makes the classic visualization, in my view, fake.

You can't avoid hindsight bias by instructing a jury to pretend not to know the real outcome of a set of events.  And without a complicated effort backed up by considerable knowledge, a neurologically intact human being cannot pretend to be genuinely, truly selfish.

We're born with a sense of fairness, honor, empathy, sympathy, and even altruism - the result of our ancestors adapting to play the iterated Prisoner's Dilemma.  We don't really, truly, absolutely and entirely prefer (D, C) to (C, C), though we may entirely prefer (C, C) to (D, D) and (D, D) to (C, D).  The thought of our confederate spending three years in prison, does not entirely fail to move us.

In that locked cell where we play a simple game under the supervision of economic psychologists, we are not entirely and absolutely unsympathetic for the stranger who might cooperate.  We aren't entirely happy to think what we might defect and the stranger cooperate, getting five dollars while the stranger gets nothing.

We fixate instinctively on the (C, C) outcome and search for ways to argue that it should be the mutual decision:  "How can we ensure mutual cooperation?" is the instinctive thought.  Not "How can I trick the other player into playing C while I play D for the maximum payoff?"

For someone with an impulse toward altruism, or honor, or fairness, the Prisoner's Dilemma doesn't really have the critical payoff matrix - whatever the financial payoff to individuals.  (C, C) > (D, C), and the key question is whether the other player sees it the same way.

And no, you can't instruct people being initially introduced to game theory to pretend they're completely selfish - any more than you can instruct human beings being introduced to anthropomorphism to pretend they're expected paperclip maximizers.

To construct the True Prisoner's Dilemma, the situation has to be something like this:

Player 1:  Human beings, Friendly AI, or other humane intelligence.

Player 2:  UnFriendly AI, or an alien that only cares about sorting pebbles.

Let's suppose that four billion human beings - not the whole human species, but a significant part of it - are currently progressing through a fatal disease that can only be cured by substance S.

However, substance S can only be produced by working with a paperclip maximizer from another dimension - substance S can also be used to produce paperclips.  The paperclip maximizer only cares about the number of paperclips in its own universe, not in ours, so we can't offer to produce or threaten to destroy paperclips here.  We have never interacted with the paperclip maximizer before, and will never interact with it again.

Both humanity and the paperclip maximizer will get a single chance to seize some additional part of substance S for themselves, just before the dimensional nexus collapses; but the seizure process destroys some of substance S.

The payoff matrix is as follows:

1: C 1:  D
2: C (2 billion human lives saved, 2 paperclips gained) (+3 billion lives, +0 paperclips)
2: D (+0 lives, +3 paperclips) (+1 billion lives, +1 paperclip)

I've chosen this payoff matrix to produce a sense of indignation at the thought that the paperclip maximizer wants to trade off billions of human lives against a couple of paperclips.  Clearly the paperclip maximizer should just let us have all of substance S; but a paperclip maximizer doesn't do what it should, it just maximizes paperclips.

In this case, we really do prefer the outcome (D, C) to the outcome (C, C), leaving aside the actions that produced it.  We would vastly rather live in a universe where 3 billion humans were cured of their disease and no paperclips were produced, rather than sacrifice a billion human lives to produce 2 paperclips.  It doesn't seem right to cooperate, in a case like this.  It doesn't even seem fair - so great a sacrifice by us, for so little gain by the paperclip maximizer?  And let us specify that the paperclip-agent experiences no pain or pleasure - it just outputs actions that steer its universe to contain more paperclips.  The paperclip-agent will experience no pleasure at gaining paperclips, no hurt from losing paperclips, and no painful sense of betrayal if we betray it.

What do you do then?  Do you cooperate when you really, definitely, truly and absolutely do want the highest reward you can get, and you don't care a tiny bit by comparison about what happens to the other player?  When it seems right to defect even if the other player cooperates?

That's what the payoff matrix for the true Prisoner's Dilemma looks like - a situation where (D, C) seems righter than (C, C).

But all the rest of the logic - everything about what happens if both agents think that way, and both agents defect - is the same.  For the paperclip maximizer cares as little about human deaths, or human pain, or a human sense of betrayal, as we care about paperclips.  Yet we both prefer (C, C) to (D, D).

So if you've ever prided yourself on cooperating in the Prisoner's Dilemma... or questioned the verdict of classical game theory that the "rational" choice is to defect... then what do you say to the True Prisoner's Dilemma above?

Comments (88)

Sort By: Old
Comment author: pdf23ds 03 September 2008 09:54:22PM 13 points [-]

Those must be pretty big paperclips.

Comment author: stcredzero 29 May 2010 11:29:33PM 69 points [-]

I suspect that the True Prisoner's Dilemma played itself out in the Portugese and Spanish conquest of Mesoamerica. Some natives were said to ask, "Do they eat gold?" They couldn't comprehend why someone would want a shiny decorative material so badly, they'd kill for it. The Spanish were Shiny Decorative Material maximizers.

Comment author: Omegaile 02 April 2013 03:01:51PM 5 points [-]

That's a really insightful comment!

But I should correct you, that you are only talking about the Spanish conquest, not the Portuguese, since 1) Mesoamerica was not conquered by the Portuguese; 2) Portuguese possessions in America (AKA Brazil) had very little gold and silver, which was only discovered much later, when it was already in Portuguese domain.

Comment author: Allan_Crossman 03 September 2008 10:01:41PM 3 points [-]

I agree: Defect!

Clearly the paperclip maximizer should just let us have all of substance S; but a paperclip maximizer doesn't do what it should, it just maximizes paperclips.

I sometimes feel that nitpicking is the only contribution I'm competent to make around here, so... here you endorsed Steven's formulation of what "should" means; a formulation which doesn't allow you to apply the word to paperclip maximizers.

Comment author: Paul_Mohr 03 September 2008 10:03:17PM 1 point [-]

Very nice representation of the problem. I can't help but think there is another level that would make this even more clear, though this is good by itself.

Comment author: prunes 03 September 2008 10:05:48PM 5 points [-]


The other assumption made about Prisoner's Dilemma, that I do not see you allude to, is that the payoffs account for not only a financial reward, time spent in prison, etc., but every other possible motivating factor in the decision making process. A person's utility related to the decision of whether to cooperate or defect will be a function of not only years spent in prison or lives saved but ALSO guilt/empathy. Presenting the numbers within the cells as actual quantities doesn't present the whole picture.

Comment author: PrimIntelekt 04 February 2010 08:17:43PM *  1 point [-]

Important point.

Let's assume that your utility function (which is identical to theirs) simply weights and adds your payoff and theirs; that is, if you get X and they get Y, your function is U(X,Y) = aX+bY. In that case, working backwards from the utilities in the table, and subject to the constraint that a+b=1, here are the payoffs:

a/b=2: (you care twice as much about yourself)
(3,3) (-5,10)
(10,-5) (1,1)

(3,3) (-2.5,7.5)
(7.5,-2.5) (1,1)

Impossible. With both people being unselfish utilitarians, the utilities can never differ based on the same outcome.

b=0: (selfish)
The table as given in the post

I think the most important result is the case a=b: the dilemma makes no sense at all if the players weight both payoffs equally, because you can never produce asymmetrical utilities.

EDIT: My newbishness is showing. How do I format this better? Is it HTML?

Comment author: wnoise 04 February 2010 08:24:35PM 2 points [-]

It's not HTML, but "markdown" which gets turned into HTML.


Comment author: PrimIntelekt 05 February 2010 04:16:53AM 0 points [-]

Thank you!

Comment author: pdf23ds 03 September 2008 10:06:02PM 0 points [-]

Alan, I think you meant to link to this comment.

Comment author: Eliezer_Yudkowsky 03 September 2008 10:06:57PM 20 points [-]

I agree: Defect!

I didn't say I would defect.

Comment author: orthonormal 17 January 2011 11:00:48PM *  12 points [-]

I agree: Defect!

I didn't say I would defect.

By the way, this was an extremely clever move: instead of announcing your departure from CDT in the post, you waited for the right prompt in the comments and dropped it as a shocking twist. Well crafted!

Comment author: Allan_Crossman 03 September 2008 10:14:43PM 0 points [-]

Damnit, Eliezer nitpicked my nitpicking. :)

Comment author: Aron 03 September 2008 10:27:51PM 3 points [-]

It's likely deliberate that prisoners were selected in the visualization to imply a relative lack of unselfish motivations.

Comment author: denis_bider 03 September 2008 10:33:44PM 1 point [-]

An excellent way to pose the problem.

Obviously, if you know that the other party cares nothing about your outcome, then you know that they're more likely to defect.

And if you know that the other party knows that you care nothing about their outcome, then it's even more likely that they'll defect.

Since the way you posed the problem precludes an iteration of this dilemma, it follows that we must defect.

Comment author: TGGP4 03 September 2008 11:03:21PM 4 points [-]

How might we and the paperclip-maximizer credibly bind ourselves to cooperation? Seems like it would be difficult dealing with such an alien mind.

Comment author: bluej100 07 January 2013 10:25:24PM 3 points [-]

I think Eliezer's "We have never interacted with the paperclip maximizer before, and will never interact with it again" was intended to preclude credible binding.

Comment author: RobinHanson 03 September 2008 11:12:49PM 13 points [-]

The entries in a payoff matrix are supposed to sum up everything you care about, including whatever you care about the outcomes for the other player. Most every game theory text and lecture I know gets this right, but even when we say the right thing to students over and over, they mostly still hear it the wrong way you initially heard it. This is just part of the facts of life of teaching game theory.

Comment author: Eliezer_Yudkowsky 03 September 2008 11:17:13PM 27 points [-]

Robin, the point I'm complaining about is precisely that the standard illustration of the Prisoner's Dilemma, taught to beginning students of game theory, fails to convey those entries in the payoff matrix - as if the entries were merely money instead of utilons, which is not at all what the Prisoner's Dilemma is about.

The point of the True Prisoner's Dilemma is that it gives you a payoff matrix that is very nearly the standard matrix in utilons, not just years in prison or dollars in an encounter.

I.e., you can tell people all day long that the entries are in utilons, but until you give them a visualization where those really are the utilons, it's around as effective as telling juries to ignore hindsight bias.

Comment author: RobinHanson 03 September 2008 11:22:51PM 3 points [-]

Eliezer, I agree that your example makes more clear the point you are trying to make clear, but in an intro to game theory course I'd still start with the standard prisoner's dilemma example first, and only get to your example if I had time to make the finer point clearer. For intro classes for typical students the first priority is to be understood at all in any way, and that requires examples as simple clear and vivid as possible.

Comment author: billswift 03 September 2008 11:29:48PM 10 points [-]

I don't think Eliezer misunderstood. I think you are missing his point, that economists are defining away empathy in the way they present the problem, including the utilities presented.

Comment author: ChrisHibbert 04 September 2008 12:20:38AM 0 points [-]

In the universe I live in, there are both cooperators and defectors, but cooperators seem to predominate in random encounters. (If you leave yourself open to encounters in which others can choose to interact with you, defectors may find you an easy mark.)

In order to decide how to act with the paperclip maximizer, I have to figure out what kind of universe it is likely to inhabit. It's possible that a random super intelligence from a random universe will have few opportunities to cooperate, but I think it's more likely that there are far more SIs and universes in which cooperation is common.

But even though this is the direct answer to the question EY poses, I think it's more important to point out that his is a better (though not simpler to explain as RH says) depiction of the intended dilemma. It takes much more thought to figure out what about the context would make cooperation reasonable. Viscerally, it's nearly untenable.

Comment author: Allan_Crossman 04 September 2008 12:29:50AM 11 points [-]

Prase, Chris, I don't understand. Eliezer's example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips.

Basically, we're being asked to choose between a billion lives and two paperclips (paperclips in another universe, no less, so we can't even put them to good use).

The only argument for cooperating would be if we had reason to believe that the paperclip maximizer will somehow do whatever we do. But I can't imagine how that could be true. Being a paperclip maximizer, it's bound to defect, unless it had reason to believe that we would somehow do whatever it does. I can't imagine how that could be true either.

Or am I missing something?

Comment author: Sebastian_Hagen2 04 September 2008 12:34:52AM 6 points [-]

Definitely defect. Cooperation only makes sense in the iterated version of the PD. This isn't the iterated case, and there's no prior communication, hence no chance to negotiate for mutual cooperation (though even if there was, meaningful negotiation may well be impossible depending on specific details of the situation). Superrationality be damned, humanity's choice doesn't have any causal influence on the paperclip maximizer's choice. Defection is the right move.

Comment author: Robin3 04 September 2008 01:36:29AM 6 points [-]

It's clear that in the "true" prisoner it is better to defect. The frustrating thing about the other prisoner's dilemma is that some people use it to imply that it is better to defect in real life. The problem is that the prisoner's dilemma is a drastic oversimplification of reality. To make it more realistic you'd have to make it iterated amongst a person's social network, add a memory and a perception of the other player's actions, change the payoff matrix depending on the relationship between the players etc etc.

This versions shows cases in which defection has a higher expected value for both players, but it's more contrived and unlikely to come into existence than the other prisoner's dilemma.

Comment author: Allan_Crossman 04 September 2008 02:00:02AM 9 points [-]

Michael: This is not a prisoner's dilemma. The nash equilibrium (C,C) is not dominated by a pareto optimal point in this game.

I don't believe this is correct. Isn't the Nash equilibrium here (D,D)? That's the point at which neither player can gain by unilaterally changing strategy.

Comment author: conchis 04 September 2008 02:03:29AM 7 points [-]

michael webster,

You seem to have inverted the notation; not Eli.

(D,D) is the Nash equilibrium, not (C,C); and (D,D) is indeed Pareto dominated by (C,C), so this does seem to be a standard Prisoners' Dilemma.

Comment author: [deleted] 07 August 2012 06:14:59AM 0 points [-]

You're correct, Conchis, but the notation confused me for a moment too, so I thought I'd explain it in case anyone else ever has the same problem. At first glance I saw (C,C) as the Nash equilibrium. It's not:

I naturally want to read the payoff matrix as being in the form (x, y) where the first number determines the outcome for the player on the horizontal, and the second on the vertical. That's how all the previous examples I've seen are laid out. (Disclaimer: I'm not any kind of expert on game theory, just an interested layperson with a bit of prior knowledge)

Now, this particular payoff matrix does have the players labelled 1 and 2, just not in the order I've come to expect, and indeed if one actually reads and interprets the co-operate/defect numbers, they don't make any sense to a person having made the mistake I made above ^ which was what clued me in that I'd made it.

Comment author: eric_falkenstein2 04 September 2008 02:18:07AM 0 points [-]

To the extent one can induce one to empathize, cooperating is optimal. The repeated game does this by having them play again and again, and thus be able to realize gains from trade. You assert there's something hard wired. I suppose there are experiments that could distinguish between the two models, ie, rational self interest in repeated games, versus the intrinsic empathy function.

Comment author: Nominull3 04 September 2008 02:53:29AM 10 points [-]

I would certainly *hope* you would defect, Eliezer. Can I really trust you with the future of the human race?

Comment author: Eliezer_Yudkowsky 04 September 2008 02:56:11AM 31 points [-]

I would certainly *hope* you would defect, Eliezer. Can I really trust you with the future of the human race?

Ha, I was waiting for someone to accuse me of antisocial behavior for hinting that I might cooperate in the Prisoner's Dilemma.

But wait for tomorrow's post before you accuse me of disloyalty to humanity.

Comment author: linkhyrule5 10 July 2013 03:07:17AM 2 points [-]

On the off chance anyone actually sees this - I don't actually see a "next post" follow-up to this. Can anyone provide me with a link, and instructions as to how you got it?

Comment author: Eliezer_Yudkowsky 10 July 2013 03:24:40AM 8 points [-]

Article Navigation / By Author / right-arrow

Comment author: wedrifid 10 July 2013 06:48:45AM 3 points [-]

Ha, I was waiting for someone to accuse me of antisocial behavior for hinting that I might cooperate in the Prisoner's Dilemma.

It is fascinating looking at the conversation on this subject back in 2008, back before TDT and UDT had become part of the culture. The objections (and even the mistakes) all feel so fresh!

Comment author: Eliezer_Yudkowsky 10 July 2013 04:49:30PM 6 points [-]

At this point Yudkowsky sub 2008 has already (awfully) written his TDT manuscript (in 2004) and is silently reasoning from within that theory, which the margins of his post are too small to contain.

Comment author: Psy-Kosh 04 September 2008 04:56:59AM 1 point [-]

Hrm... not sure what the obvious answer is here. Two humans, well, the argument for non defecting (when the scores represent utilities) basically involves some notion of similarity. ie, you can say something to the effect of "that person there is similar to me sufficiently that whatever reasoning I use, there is at least some reasonable chance they are going to use the same type of reasoning. That is, a chance greater than, well, chance. So even though I don't know exactly what they're going to choose, I can expect some sort of correlation between their choice and my choice. So, in the extreme case, where our reasoning is sufficiently similar that it's more or less ensured that what I chose and what the other choses will be the same, clearly both cooperating is better than both defecting, and those two are (by the extreme case assumption) the only options"

It really isn't obvious to me whether a line of reasoning like that could validly be applied with a human vs a paperclip AI or Pebblesorter.

Now, if, by assumption, we're both equally rational, then maybe that's sufficient for the "whatever reasoning I use, they'll be using analogous reasoning, so we'll either both defect or both cooperate, so..." but I'm not sure on this, and still need to think on it more.

Personally, I find Newcomb's "paradox" to be much simpler than this since in that it's given to us explicitly that the predictor is perfect (or highly highly accurate) so is basically "mirroring" us.

Here, I have to admit to being a bit confused about how well this sort of reasoning can be applied when two minds that are genuinely rather alien to each other, were produced by different origins, etc. Part of me wants to say "still, rationality is rationality, so to the extent that the other entity, well, manages to work/exist successfully, it'll have rationality similar to mine (given the assumption that I'm reasonably rational. Though, of course, I provably can't trust myself :))

Comment author: Mixitup 04 September 2008 05:08:06AM 0 points [-]

Shouldn't you be on vacation?

just curious

Comment author: Dagon 04 September 2008 05:31:14AM 1 point [-]

I like this illustration, as it addresses TWO common misunderstandings. Recognizing that the payoff is in incomparable utilities is good. Even better is reinforcing that there can never be further iterations. None of the standard visualizations prevent people from extending to multiple interactions.

And it makes it clear that (D,D) is the only rational (i.e. WINNING) outcome.

Fortunately, most of our dilemmas repeated ones, in which (C,C) is possible.

Comment author: CarlJ 04 September 2008 06:34:18AM 1 point [-]

I want to defect, but so does the clip-maximizer. Since we both know that, and assuming that it is of equal intelligence than me, which will make it see through any of my attempt of an offer that would enable me to defect, I would try to find a way to give us the incentives to cooperate. That is - I don't believe we will be able to reach solution (D,C), so let's try for the next best thing, which is (C,C).

How about placing a bomb on two piles of substance S and giving the remote for the human pile to the clipmaximizer and the remote for its pile to the humans? In this scenario, if the clipmaximizer tries to take the humans' pieces of S, they destroy its share, thus enabling it to only have a maximum of two S, which is what it already has. Thus it doesn't want to try to defect, and the same for the humans.

Comment author: simpleton2 04 September 2008 08:17:45AM 5 points [-]

I apologize if this is covered by basic decision theory, but if we additionally assume:

- the choice in our universe is made by a perfectly rational optimization process instead of a human

- the paperclip maximizer is also a perfect rationalist, albeit with a very different utility function

- each optimization process can verify the rationality of the other

then won't each side choose to cooperate, after correctly concluding that it will defect iff the other does?

Each side's choice necessarily reveals the other's; they're the outputs of equivalent computations.

Comment author: Paul_Crowley2 04 September 2008 08:27:02AM 4 points [-]

Interesting. There's a paradox involving a game in which players successively take a single coin from a large pile of coins. At any time a player may choose instead to take two coins, at which point the game ends and all further coins are lost. You can prove by induction that if both players are perfectly selfish, they will take two coins on their first move, no matter how large the pile is. People find this paradox impossible to swallow because they model perfect selfishness on the most selfish person they can imagine, not on a mathematically perfect selfishness machine. It's nice to have an "intuition pump" that illustrates what *genuine* selfishness looks like.

Comment author: ata 17 January 2011 10:16:19PM *  4 points [-]

Hmm. We could also put that one in terms of a human or FAI competing against a paperclip maximizer, right? The two players would successively save one human life or create one paperclip (respectively), up to some finite limit on the sum of both quantities.

If both were TDT agents (and each knows that the other is a TDT agent), then would they successfully cooperate for the most part?

In the original version of this game, is it turn-based or are both players considered to be acting simultaneously in each round? If it is simultaneous, then it seems to me that the paperclip-maximizing TDT and the human[e] TDT would just create one paperclip at a time and save one life at a time until the "pile" is exhausted. Not quite sure about what would happen if the game is turn-based, but if the pile is even, I'd expect about the same thing to happen, and if the pile is odd, they'd probably be able to successfully coordinate (without necessarily communicating), maybe by flipping a coin when two pile-units remain and then acting in such a way to ensure that the expected distribution is equal.

Comment author: Vladimir_Nesov 04 September 2008 09:03:14AM 1 point [-]

Cooperate (unless paperclip decides that Earth is dominated by traditional game theorists...)

The standard argument looks like this (let's forget about the Nash equilibrium endpoint for a moment): (1) Arbiter: let's (C,C)! (2) Player1: I'd rather (D,C). (3) Player2: I'd rather (D,D). (4) Arbiter: sold!

The error is that this incremental process reacts on different hypothetical outcomes, not on actual outcomes. This line of reasoning leads to the outcome (D,D), and yet it progresses as if (C,C) and (D,C) were real options of the final outcome. It's similar to the Unexpected hanging paradox: you can only give one answer, not build a long line of reasoning where each step assumes a different answer.

It's preferrable to choose (C,C) and similar non-Nash equilibrium options in other one-off games if we assume that other player also bets on cooperation. And he will do that only if he assumes that first player does the same, and so on. This is a situation of common knowledge. How can Player1 come to the same conclusion as Player2? They search for the best joint policy that is stable under common knowledge.

Let's extract the decision procedures selected by both sides to handle this problem as self-contained policies, P1 and P2. Each of these policies may decide differently depending on what policy another player is assumed to use. The stable set of policies is where there is no thrashing, when P1=P1(P2) and P2=P2(P1). Players don't select outcomes, but policies, where policy may not reflect player's preferences, but joint policy (P1,P2) that players select is a stable policy that is preferable to other stable policies for each player. In our case, both policies for (C,C) are something like "decide self.C; if other.D, decide self.D". Works like iterated prisoner's dilemma, but without actual iteration, iteration happens in the model when it needs to be mutually accepted.

(I know it's somewhat inconclusive, couldn't find time to pinpoint it better given a time limit, but I hope one can construct a better argument from the corpse of this one.)

Comment author: Mikko 04 September 2008 09:54:10AM 0 points [-]

It is well known that answers to questions on morality sometimes depend on how the questions are framed.

I think Eliezer's biggest contribution is the idea that the classical presentation of Prisoner's Dilemma may be an intuition pump.

Comment author: Grant 04 September 2008 09:56:56AM 0 points [-]

I'm hoping we'd all defect on this one. Defecting isn't always a bad thing anyways; many parts of our society depend on defected prisoner's dilemmas (such as competition between firms).

When I first studied game theory and prisoner's dilemmas (on my own, not in a classroom) I had no problem imagining the payoffs in completely subjective "utils". I never thought of a paperclip maximizer, though.

I know this is quite a bit off-topic, but in response to:

We're born with a sense of fairness, honor, empathy, sympathy, and even altruism - the result of our ancestors adapting to play the iterated Prisoner's Dilemma.

Most of us are, but there is the small minority of the population (1-3%) that are specifically born without a conscious (or much of one). We call them sociopaths or psychopaths. This is seemingly advantageous because it allows those people to prey on the rest of us (i.e., defect where possible), provided they can avoid detection.

While I'm sure Eliezer knows this (and likely knows more about the subject than I), its omission in his post IMO highlights a widespread and costly bias: pretending these people don't exist, or pretending they can be "cured".

Comment author: Arnt_Richard_Johansen 04 September 2008 09:58:19AM 0 points [-]

This is off-topic, but Vladimir Nesov's referring to the paperclip-maximizing super-intelligence as just "paperclip" made me chuckle, because it conjured up images in my head of Clippy bent on destroying the Earth.

Comment author: RichardKennaway 04 September 2008 11:17:20AM 0 points [-]

In laboratory experiments of PD, the experimenter has the absolute power to decree the available choices and their "outcomes". (I use the scare quotes in reference to the fact that these outcomes are not to be measured in money or time in jail, but in "utilons" that already include the value to each party of the other's "outcome" -- a concept I think problematic but not what I want to talk about here. The outcomes are also imaginary, although (un)reality TV shows have scope to create such games with real and substantial payoffs.)

In the real world, a general class of moves that laboratory experiments deliberately strive to eliminate is moves that change the game. It is well-known that those who lead lives of crime, being faced with the PD every time the police pull them in on suspicion, exact large penalties on defectors. (To which the authorities respond with witness protection programmes, which the criminals try to penetrate, and so on.) In other words, the solution observed in practice is to destroy the PD.

1: C 1:  D 2: C (3, 3) (-20, 0) 2: D (0,-20) (-20,-20)

While the PD, one-off or iterated, is an entertaining philosophical study, an analysis that ignores game-changing moves surely limits its practical interest.

Comment author: Allan_Crossman 04 September 2008 12:05:38PM 3 points [-]

simpleton: won't each side choose to cooperate, after correctly concluding that it will defect iff the other does?

Only if they believe that their decision somehow causes the other to make the same decision.

CarlJ: How about placing a bomb on two piles of substance S and giving the remote for the human pile to the clipmaximizer and the remote for its pile to the humans?

It's kind of standard in philosophy that you aren't allowed solutions like this. The reason is that Eliezer can restate his example to disallow this and force you to confront the real dilemma.

Vladimir: It's preferrable to choose (C,C) [...] if we assume that other player also bets on cooperation.

No, it's preferable to choose (D,C) if we assume that the other player bets on cooperation.

decide self.C; if other.D, decide self.D

We're assuming, I think, that you don't get to know what the other guy does until after you've both committed (otherwise it's not the proper Prisoner's Dilemma). So you can't use if-then reasoning.

Comment author: Vladimir_Nesov 04 September 2008 12:16:52PM -1 points [-]

Allan: No, it's preferable to choose (D,C) if we assume that the other player bets on cooperation.

Which will happen only if the other player assumes that the first player bets on cooperation, which with your policy is incorrect. You can't bet on unstable model.

decide self.C; if other.D, decide self.D We're assuming, I think, that you don't get to know what the other guy does until after you've both committed (otherwise it's not the proper Prisoner's Dilemma). So you can't use if-then reasoning.

I can use reasoning, but not actual reaction on the facts, which are inaccessible. I debug my model of decision-making policies of both myself and other player, by requiring the outcome to be stable even if I assume that we both know which policy is used by another player (within a single model). Then I select the best stable model.

Comment author: Psy-Kosh 04 September 2008 12:26:44PM 2 points [-]

Alan: They don't have to believe they have such casual powers over each other. Simply that they are in certain ways similar to each other.

ie, A simply has to believe of B "The process in B is sufficiently similar to me that it's going to end up producing the same results that I am. I am not causing this, but simply that both computations are going to compute the same thing here."

Comment author: Allan_Crossman 04 September 2008 12:32:24PM 0 points [-]

[D,C] will happen only if the other player assumes that the first player bets on cooperation

No, it won't happen in any case. If the paperclip maximizer assumes I'll cooperate, it'll defect. If it assumes I'll defect, it'll defect.

I debug my model of decision-making policies [...] by requiring the outcome to be stable even if I assume that we both know which policy is used by another player

I don't see that "stability" is relevant here: this is a one-off interaction.

Anyway, lets say you cooperate. What exactly is preventing the paperclip maximizer from defecting?

Comment author: Allan_Crossman 04 September 2008 12:51:05PM 1 point [-]

Psy-Kosh: They don't have to believe they have such causal powers over each other. Simply that they are in certain ways similar to each other.

I agree that this is definitely related to Newcomb's Problem.

Simpleton: I earlier dismissed your idea, but you might be on to something. My apologies. If they were genuinely perfectly rational, or both irrational in precisely the same way, and could verify that fact in each other...

Then they might be able to know that they will both do the same thing. Hmm.

Anyway, my 3 comments are up. Nothing more from me for a while.

Comment author: Stuart_Armstrong 04 September 2008 01:06:55PM -2 points [-]

Despite the disguise, I think this is the same as the standard PD. In there (assuming full utilities, etc...), the obvious ideal for an impartial observer is to pick (C,C) as the best option, and for the prisoner to pick (D,C).

Here, (D,C) is "righter" than (C,C), but that's simply because we are no longer impartial obervers; humans shouldn't remain impartial when billions of lives are at stake. We are all in the role of "prisoners" in this situation, even as observers.

An "impartial observer" would simply be one that valued one billion human lives the same as one paper clip. They would see us as a simple prisoner, in the same situation as the standard PD, with the same overall solution - (C,C).

Comment author: RobbBB 03 February 2014 12:16:33PM *  0 points [-]

This is an old post and probably very out of date, but: I think if you try to define an impartial observer's preferences as whatever selects (C,C) in two other agents' PD, you get inconsistencies very rapidly once you have one of those agents stuck in two Prisoner's Dilemmas at once.

I also don't think we should use euphemisms like 'impartial' for an incredibly partial Cooperation Fetishist that's willing to give up everything else of value (e.g., billions of human lives) to go through the motions of satisfying non-sentient processes like sea slugs or paperclip maximizers.

Comment author: Stuart_Armstrong 03 February 2014 12:41:28PM 0 points [-]

you get inconsistencies very rapidly once you have one of those agents stuck in two Prisoner's Dilemmas at once.

Multi-player interactions are tricky and we don't have a good solution for them yet.

that's willing to give up everything else of value (e.g., billions of human lives)

It's not that its willing to give up everything of value - it's that it doesn't have our values. Without sharing our values, there's no reason for it to prefer our opinions over sea slugs.

Comment author: prase 04 September 2008 02:33:13PM 1 point [-]

A.Crossman: Prase, Chris, I don't understand. Eliezer's example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips. This is standard defense of defecting in a prisonner's dilemma, but if it were valid then the dilemma wouldn't be really a dilemma.

If you can assume that the maximizer uses the same decision algorithm as we do, we can also assume that it will come to the same conclusion. Given this, it is better to cooperate, since it will gain billion lives (and a paperclip). But we don't know whether the paperclipper uses the same algorithm.

Comment author: Sean_C. 04 September 2008 02:57:59PM 7 points [-]

I heard a funny story once (online somewhere, but this was years ago and I can't find it now). Anyway I think it was the psychology department at Stanford. They were having an open house, and they had set up a PD game with M&M's as the reward. People could sit at either end of a table with a cardboard screen before them, and choose 'D' or 'C', and then have the outcome revealed and get their candy.

So this mother and daughter show up, and the grad student explained the game. Mom says to the daughter "Okay, just push 'C', and I'll do the same, and we'll get the most M&M's. You can have some of mine after."

So the daughter pushes 'C', Mom pushes 'D', swallows all 5 M&M's, and with a full mouth says "Let that be a lesson! You can't trust anybody!"

Comment author: wedrifid 10 July 2013 06:33:50AM *  10 points [-]

So the daughter pushes 'C', Mom pushes 'D', swallows all 5 M&M's, and with a full mouth says "Let that be a lesson! You can't trust anybody!"

I have seen various variations of this story, some told firsthand. In every case I have concluded that they are just bad parents. They aren't clever. They aren't deep. They are incompetent and banal. Even if parents try as hard as they can to be fair, just and reliable they still fall short of that standard enough for children to be aware of that they can't be completely trusted. Moreover children are exposed to other children and other adults and so are able to learn to distinguish people they trust from people that they don't. Adding the parent to the untrusted list achieves little benefit.

I'd like to hear the follow up to this 'funny' story. Where the daughter updates on the untrustworthiness of the parent and the meaninglessness of her word. She then proceeds to completely ignore the mother's commands, preferences and even her threats. The mother destroyed a valuable resource (the ability to communicate via 'cheap' verbal signals) for the gain of a brief period of feeling smug superiority. The daughter (potentially) realises just how much additional freedom and power she has in practice when she feels no internal motivation to comply with her mother's verbal utterances.

(Bonus follow up has the daughter steal the mother's credit card and order 10kg of M&Ms online. Reply when she objects "Let that be a lesson! You can't trust anybody!")

I suppose the biggest lesson for the daughter to learn is just how significant the social and practical consequences of reckless defection in social relationships can be.

Comment author: RichardKennaway 10 July 2013 08:48:37AM 1 point [-]

The mother destroyed a valuable resource (the ability to communicate via 'cheap' verbal signals) for the gain of a brief period of feeling smug superiority.

And in addition, the supposed gain is trash anyway.

Comment author: Jef_Allbright 04 September 2008 04:00:47PM -1 points [-]

I see this discussion over the last several months bouncing around, teasingly close to a coherent resolution of the ostensible subjective/objective dichotomy applied to ethical decision-making. As a perhaps pertinent meta-observation, my initial sentence may promulgate the confusion with its expeditious wording of "applied to ethical decision-making" rather than a more accurate phrasing such as "applied to decision-making assessed as increasingly ethical over increasing context."

Those who in the current thread refer to the essential element of empathy or similarity (of self models) come close. It's important to realize that any agent always only expresses its nature within its environment -- assessments of "rightness" arise only in the larger context (of additional agents, additional experiences of the one agent, etc.)

Our language and our culture reinforce an assumption of an ontological "rightness" that pervades our thinking on these matters. An even greater (perceived) difficulty is that to relinquish ontological "rightness" entails ultimately relinquishing an ontological "self". But to relinquish such ultimately unfounded beliefs is to gain clarity and coherence while giving up nothing actual at all.

"Superrationality" is an effective wrapper around these apparent dilemmas, but even proponents such as Hofstadter confused description with prescription in this regard. Paradox is always only a matter of insufficient context. In the bigger picture all the pieces must fit. [Or as Eliezer has taken to saying recently: "It all adds up to normalcy."

Apologies if my brief pokings and proddings on this topic appear vague or even mystical. I can only assert within this limited space and bandwidth that my background in science, engineering and business is far from that of one who could harbor vagueness, relativism, mysticism, or postmodernist patterns of thought. I appreciate the depth and breadth of Eliezer's written explorations of this issue whereas I lack the time to do so myself.

Comment author: simpleton2 04 September 2008 04:12:03PM 4 points [-]

Allan Crossman: Only if they believe that their decision somehow causes the other to make the same decision.

No line of causality from one to the other is required.

If a computer finds that (2^3021377)-1 is prime, it can also conclude that an identical computer a light year away will do the same. This doesn't mean one computation caused the other.

The decisions of perfectly rational optimization processes are just as deterministic.

Comment author: ChrisHibbert 04 September 2008 05:16:09PM 0 points [-]

@Allan Crossman,

Eliezer's example is set up in such a way that, regardless of what the paperclip maximizer does, defecting gains one billion lives and loses two paperclips.

This same claim can be made about the standard prisoner's dilemma. In the standard version, I still cooperate because, even if this challenge won't be repeated, it's embedded in a social context for me in which many interactions are solo, but part of the social fabric. (tipping, giving directions to strangers, items left behind in a cafe are examples. I cooperate even though I expect not to see the same person again.) What is it about the social context that makes this so?

I don't fall back on an assumption that the other reasons the same as me. It could as easily be a psychopath, according to the standards of the universe it comes from. Making the assumption leaves you open to exploitation. But if there are reasons for the other to have habits that are formed by similar forces, then concluding that cooperation is the more likely behavior to be trained by its environment is a valuable result.

The question, for me, is what kind of social context does the other inhabit. The paperclip maximizer might be the only (or the most powerful) inhabitant of its universe, but that seems less likely than that it is embedded in some social context, and has to make trade-offs in interactions with others in order to get what it wants. It's hard for me to imagine a universe that would produce one powerful agent above all others. (Even though I've heard the argument in just the kind of discussion of SIs that raises the questions of friendliness and paperclip maximizers.)

[Sorry Allan, that you won't be able to reply. But you did raise the question before bowing out...]

Comment author: Tom_Crispin 04 September 2008 06:02:00PM 1 point [-]

A problem in moving from game-theoretic models to the "real world" is that in the latter we don't always know the other decision maker's payoff matrix, we only know - at best! - his possible strategies. We can only guess at the other's payoffs; albeit fairly well in social context. We are more likely to make a mistake because we have the wrong model for the opponent's payoffs than because we make poor strategic decisions.

Suppose we change this game so that the payoff matrix for the paperclips is chosen from a suitably defined random distribution. How will that change your decision whether to "cooperate" or to "defect"?

Comment author: Silas 04 September 2008 06:42:56PM 11 points [-]

By the way:

Human: "What do you care about 3 paperclips? Haven't you made trillions already? That's like a rounding error!" Paperclip Maximizer: "How can you talk about paperclips like that?"


PM: "What do you care about a billion human algorithm continuities? You've got virtually the same one in billions of others! And you'll even be able to embed the algorithm in machines one day!" H: "How can you talk about human lives that way?"

Comment author: RichardKennaway 04 September 2008 07:11:31PM 0 points [-]

Tom Crispin: The utility-theoretic answer would be that all of the randomness can be wrapped up into a single number, taking account not merely of the expected value in money units but such things as the player's attitude to risk, which depends on the scatter of the distribution. It can also wrap up a player's ignorance (modelled as prior probabilities) about the other player's utility function.

For that to be useful, though, you have to be a utility-theoretic decision-maker in possession of a prior distribution over other people's decision-making processes (including processes such as this one). If you are, then you can collapse the payoff matrix by determining a probability distribution for your opponent's choices and arriving at a single number for each of your choices. No more Prisoners' Dilemma.

I suspect (but do not have a proof) that adequately formalising the self-referential arguments involved will lead to a contradiction.

Comment author: Allan_Crossman 04 September 2008 07:47:00PM 0 points [-]

Chris: Sorry Allan, that you won't be able to reply. But you did raise the question before bowing out...

I didn't bow out, I just had a lot of comments made recently. :)

I don't like the idea that we should cooperate if it cooperates. No, we should defect if it cooperates. There are benefits and no costs to defecting.

But if there are reasons for the other to have habits that are formed by similar forces

In light of what I just wrote, I don't see that it matters; but anyway, I wouldn't expect a paperclip maximizer to have habits so ingrained that it can't ever drop them. Even if it routinely has to make real trade-offs, it's presumably smart enough to see that - in a one-off interaction - there are no drawbacks to defecting.

Simpleton: No line of causality from one to the other is required.

Yeah, I get your argument now. I think you're probably right, in that extreme case.

Comment author: Vladimir_Nesov 04 September 2008 07:58:00PM 4 points [-]

Allan: There are benefits and no costs to defecting.

This is the same error as in the Newcomb's problem: there is in fact a cost. In case of prisoner's dilemma, you are penalized by ending up with (D,D) instead of better (C,C) for deciding to defect, and in the case of Newcomb's problem you are penalized by having only $1000 instead of $1,000,000 for deciding to take both boxes.

Comment author: Allan_Crossman 04 September 2008 08:34:00PM 0 points [-]

Vladimir: In case of prisoner's dilemma, you are penalized by ending up with (D,D) instead of better (C,C) for deciding to defect

Only if you have reason to believe that the other player will do whatever you do. While that's the case in Simpleton's example, it's not the case in Eliezer's.

Comment author: michael_e_sullivan 05 September 2008 04:08:00PM -1 points [-]

Interesting. There's a paradox involving a game in which players successively take a single coin from a large pile of coins. At any time a player may choose instead to take two coins, at which point the game ends and all further coins are lost. You can prove by induction that if both players are perfectly selfish, they will take two coins on their first move, no matter how large the pile is.

I'm pretty sure this proof only works if the coins are denominated in utilons.

Comment author: potato 06 June 2012 06:11:47PM *  -1 points [-]

It's really about the iteration. I would continually cooperate with the paper clip maximizer if I had good reason to believe it would not defect. For instance, if I knew that Eliezer Yudkowsky without morals and with a great urge for paperclip creation was the paperclip maximizer, I would cooperate. Assuming that you know that playing with the defect button can make you loose 1 billion paperclips from here on, and i know the same for human lives, cooperating seems right. It has the highest expected payoff, if we're using each other's known intentions and plays as evidence about our future plays.

If there is only one trial, and I can't talk to the paper clip maximizer, I will defect.

Comment author: Multiheaded 07 August 2012 02:38:54AM *  -1 points [-]

[Public service announcement]

To any future readers, especially newcomers to LW: yes, Eliezer (with some others) has indeed formulated a solution of sorts for the True One-Shot Prisoner's Dilemma - for some rather specific cases of it, actually, but it was nonetheless very awesome of him. It is a fairly original solution for the field of decision theory (he says), yet it (very roughly) mirrors some religious thought from ages past.

In case you're unfamiliar with idiosyncratic local ideas, it's called "Timeless Decision Theory" - look it up.


Comment author: arundelo 07 August 2012 05:40:43AM 1 point [-]

See also

(My understanding is that TDT and UDT can both be seen as "implementations" of superrationality.)

Comment author: wedrifid 07 August 2012 06:07:28AM *  1 point [-]

p.s.: if you thought this was a useless/misleading comment, you should have bloody told me so instead of casting your silent and unhelpful -1.

Your comment is neither useless nor misleading (taking into account the significant use of qualifiers) but if I had happened to view your comment negatively I would not accept this obligation to 'bloody' explain myself. The main problem in this comment seems to be the swearing at downvoters. A query or even (in this case) an outright assertion that the judgement is flawed would come across better.

Comment author: fubarobfusco 07 August 2012 06:51:21AM 3 points [-]

[While we're addressing hypothetical future readers:]

See also Gary Drescher's Good and Real, one chapter of which defends cooperating in the one-shot Prisoner's Dilemma on the grounds of "subjunctive reciprocity" or "acausal self-interest": if defecting is the right choice for you, then it is the right choice for the other party; whereas cooperating is a means toward the end of the other party's cooperation towards you; you cannot cause the other's cooperation, but your own actions can entail it.

Drescher points out a connection between acausal self-interest and Kant's categorical imperative; and provides an intuitive (which is to say, familiar) distinction between acausal and causal self-interest by contrasting the ideas, "How would I like it if others treated me that way?" versus "What's in it for me?"

Comment author: Multiheaded 07 August 2012 07:32:47AM 3 points [-]

Added both Hofstadter and Drescher to my "LW canon that I should at least acquire a summary of" category. I mean, yeah, I do not doubt that the Sequences contain a good distillation already, and normally I wouldn't be bothered to trawl through mostly redundant plain text - but it's so much more prestigious to actually know where Eliezer got which part from.

Comment author: gwern 07 August 2012 03:06:31PM 7 points [-]

A while ago I took the time to type up a full copy of the relevant Hofstadter essays: http://www.gwern.net/docs/1985-hofstadter So now you have no excuse!

Comment author: Multiheaded 07 August 2012 03:27:40PM 6 points [-]

Great! Have a paperclip!

Comment author: Randaly 07 August 2012 04:17:57PM 4 points [-]

A decent summary of Drescher's ideas is his presentation at the 2009 Singularity Summit, here. For some reason I seem to have a transcript of most of it already made, copy + pasted below. (LW tells me that it is too long to go in one comment, so I'll put it in two.)

My talk this afternoon is about choice machines: machines such as ourselves that make choices in some reasonable sense of the word. The very notion of mechanical choice strikes many people as a contradiction in terms, and exploring that contradiction and its resolution is central to this talk. As a point of departure, I'll argue that even in a deterministic universe, there's room for choices to occur: we don't need to invoke some sort of free will that makes an exception to the determinism, no do we even need randomness, although a little randomness doesn't hurt. I'm going to argue that regardless of whether our universe is fully deterministic, it's at least deterministic enough that the compatibility of choice and full deterministic has some important ramifications that do apply to our universe. I'll argue that if we carry the compatibility of choice and determinism to its logical conclusions, we obtain some progressively weird corollaries: namely, that it sometimes makes sense to act for the sake of things that our actions cannot change and cannot cause, and that that might even suggest a way to derive an essentially ethical prescription: an explanation for why we sometimes help others even if doing so causes net harm to our own interests.


An important caveat in all this, just to manage expectations a bit, is that the arguments I'll be presenting will be merely intuitive- or counter-intuitive, as the case may be- and not grounded in a precise and formal theory. Instead, I'm going to run some intuition pumps, as Daniel Dennett calls them, to try to persuade you what answers a successful theory would plausibly provide in a few key test cases.


Perhaps the clearest way to illustrate the compatibility of choice and determinism is to construct or at least imagine a virtual world, which superficially resembles our own environment and which embodies intelligent or somewhat intelligent agents. As a computer program, this virtual world is quintessentially determinist: the program specifies the virtual world's initial conditions, and specifies how to calculate everything that happens next. So given the program itself, there are no degrees of freedom about what will happen in the virtual world. Things do change in the world from moment to moment, of course, but no event ever changes from what was determined at the outset. In effect, all events just sit, statically, in spacetime. Still, it makes sense for agents in the world to contemplate what would be the case were they to take some action or another, and it makes sense for them to select an action accordingly.


[image of virtual world]

For instance, an agent in the illustrated situation here might reason that, were it move to its right, which is our left, then the agent would obtain some tasty fruit. But, instead, if it moves to its left, it falls off a cliff. Accordingly, if its preferences scheme assigns positive utility to the fruit, and negative utility to falling off the cliff, that means the agent moves to its right and not to its left. And that process, I would submit, is what we more or less do ourselves when we engage in what we think of as making choices for the sake of our goals.


The process, the computational process of selecting an action according to the desirability of what would be the case were the action taken, turns to be what our choice process consists of. So, from this perspective, choice is a particular kind of computation. The objection that choice isn't really occurring because the outcome was already determined is just as much a non-sequitur as suggesting that any other computation, for example, adding up a list of numbers, isn't really occurring just because the outcome was predetermined.


So, the choice process takes place, and we consider that the agents has a choice about the action that the choice selects and has a choice about the associated outcomes, meaning that those outcomes occur as a consequence of the choice process. So, clearly an agent that executes a choice process and that correctly anticipates what would be the case if various contemplated actions were taken will better achieve its goals than one that, say, just acts at random or one that takes a fatalist stance, that there's no point in doing anything in particular since nothing can change from what it's already determined to be. So, if we were designing intelligent agents and wanted them to achieve their goals, we would design them to engage in a choice process. Or, if the virtual world were immense enough to support natural selection and the evolution of sufficiently intelligent creatures, then those evolved creatures could be expected to execute a choice process because of the benefits conferred.


So the inalterability of everything that will ever happen does not imply the futility of acting for the sake of what is desired. The key to the choice relation is the “would be-if” relation, also known as the subjunctive or counterfactual relation. Counterfactual because it entertains a hypothetical antecedent about taking a certain action, that is possibly contrary to fact- as in the case of moving to the agent's left in this example. Even thought the moving left action does not in fact occur, the agent does usefully reason about what would the case if that action were taken, and indeed it's that very reasoning that ensures that the action does not in fact occur.


There are various technical proposals for how to formally specific a “would be-if”relation- David Lewis has a classic formulation, Judea Pearl has a more recent one- but they're not necessarily the appropriate version of “would be-if” to use for purposes of making choices, for purposes of selecting an action based on the desirability of what would then be the case. And, although I won't be presenting a formal theory, the essence of this talk is to investigate some properties of “would be-if,” the counterfactual relation that's appropriate to use for making choices.


In particular, I want to address next the possibility that, in a sufficiently deterministic universe, you have a choice about some things that your action cannot cause. Here's an example: assume or imagine that the universe is deterministic, with only one possible history following from any given state of the universe at a given moment. And let me define a predicate P that gets applied to the total state of the universe at some moment. The predicate P is defined to be true of a universe state just in case the laws of physics applied to that total state specify that a billion years after that state, my right hand is raised. Otherwise, the predicate P is false of that state.

[image of predicate P]


Now, suppose I decide, just on a whim, that I would like that state of the universe a billion years ago to have been such that the predicate P was true of that past state. I need only raise my right hand now, and, lo and behold, it was so. If, instead, I want the predicate to have been false, then I lower my hand and the predicate was false. Of course, I haven't changed what the past state of the universe is or was; the past is what it is, and can never be changed. There is merely a particular abstract relation, a “would be-if” relation, between my action and the particular past state that is the subject of my whimsical goal. I cannot reasonably take the action and not expect that the past state will be in correspondence.


So, I can't change the past, nor does my action have any causal influence over the past- at least, not in the way we normally and usefully conceive of causality, where causes are temporally prior to effects, and where we can think of causal relations as essentially specifying how the universe computes its subsequent states from its previous states. Nonetheless, I have exactly as much choice about the past value of the predicate I have defined as I have, despite its inalterability, as I have about whether to raise my hand now, despite the inalterability of that too, in a deterministic universe. And if I were to believe otherwise, and were to refrain from raising my hand merely because I can't change the past even though I do have a whimsical preference about the past value of the specified predicate, then, as always with fatalist resignation, I'd be needlessly forfeiting an opportunity to have my goals fulfilled.


If we accept the conclusion that we sometimes have a choice about what you cannot change or even cause, or at least tentatively accept it in order to explore its ramifications, then we can go on now to examine a well-known science fiction scenario called Newcomb's Problem. In Newcomb's Problem, a mischevious benefactor presents you with two boxes: there is a small, transparent box, containing a thousand dollars, which you can see; and there is a larger, opaque box, which you are truthfully told contains either a million dollars or nothing at all. You can't see which; the box is opaque, and you are not allowed to examine it. But you are truthfully assured that the box has been sealed, and that its contents will not change from whatever it already is.


You are now offered a very odd choice: you can take either the opaque box alone, or take both boxes, and you get to keep the contents of whatever you take. That sure sounds like a no brainer:if we assume that maximizing your expected payoff in this particular encounter is the sole relevant goal, then regardless of what's in the opaque box, there's no benefit to foregoing the additional thousand dollars.

Comment author: Randaly 07 August 2012 04:19:11PM 3 points [-]

Apparently 3 comments will be needed.


But, before you choose, you are told how the benefactor decided how much money to put in the opaque box- and that brings us to the science fiction part of the scenario. What the benefactor did was take a very detailed local snapshot of the state of the universe a few minutes ago, and then run a faster-than-real time simulation to predict with high accuracy to predict with high accuracy whether you would take both boxes, or just the opaque box. A million dollars was put in the opaque box if and only if you were predicted to take only the opaque box.


Admittedly the super-predictability here is a bit physically implausible, and goes beyond a mere stipulation of determinism. Still, at least it's not logically impossible- provided that the simulator can avoid having to simulate itself, and thus avoid a potential infinite regress. (The opaque box's opacity is important in that regard: it serves to insulate you from being effectively informed of the outcome of the simulation itself, so the simulation doesn't have to predict its own outcome in order to predict what you are going to have to do.) So, let's indulge the super-predictability assumption, and see what comes from it. Eventually, I'm going to argue that the real world is at least deterministic enough and predictable enough that some of the science-fiction conclusions do carry over to reality.


So, you now face the following choice: if you take the opaque box alone, then you can expect with high reliability that the simulation predicted you would do so, and so you expect to find a million dollars in the opaque box. If, on the other hand, you take both boxes, then you should expect the simulation to have predicted that, and you expect to find nothing in the opaque box. If and only if you expect to take the opaque box alone, you expect to walk away with a million dollars. Of course, your choice does not cause the opaque box's content to be one way or the other; according to the stipulated rules, the box content already is what it is, and will not change from that regardless of what choice you make.


But we can apply the lesson from the handraising example- the lesson that you sometimes have a choice about things your action does not change or cause- because you can reason about what would be the case if, perhaps contrary to fact, you were to take a particular hypothetical action. And, in fact, we can regard Newcomb's Problem as essentially harnessing the same past predicate consequence as in the handraising example- namely, if and only if you take just the opaque box, then the past state of the universe, at the time the predictor took the detailed snapshot was such that that state leads, by physical laws, to your taking just the opaque box. And, if and only if the past state was thus, the predictor would predict you taking the opaque box alone, and so a million dollars would be in the opaque box, making that the more lucrative choice. And it's certainly the case that people who would make the opaque box choice have a much higher expected gain from such encounters than those who take both boxes.


Still, it's possible to maintain, as many people do, that taking both boxes is the rational choice, and that the situation is essentially rigged to punish you for your predicted rationality- much as if a written exam were perversely graded to give points only for wrong answers. From that perspective, taking both boxes is the rational choice, even if you are then left to lament your unfortunate rationality. But that perspective is, at the very least, highly suspect in a situation where, unlike the hapless exam-taker, you are informed of the rigging and can take it into account when choosing your action, as you can in Newcomb's Problem.


And, by the way, it's possible to consider an even stranger variant of Newcomb's Problem, in which both boxes are transparent. In this version, the predictor runs a simulation that tentatively presumes that you'll see a million dollars in the larger box. You'll be presented with a million dollars in the box for real if and only if the simulation shows that you would then take the million dollar box alone. If, instead, the simulation predicts that you would take both boxes if you see a million dollars in the larger box, then the larger box is left empty when presented for real.


So, let's suppose you're confronted with this scenario, and you do see a million dollars in the box when it's presented for real. Even though the million dollars is already there, and you see it, and it can't change, nonetheless I claim that you should still take the million dollar box alone. Because, if you were to take both boxes instead, contrary to what in fact must be the case in order for you to be in this situation in the first place, then, also contrary to what is in fact the case, the box would not contain a million dollars- even though in fact it does, and even though that can't change! The same two-part reasoning applies as before: if and only if you were to take just the larger box, then the state of the universe at the time the predictor takes a snapshot must have been such that you would take just that box if you were to see a million dollars in that box. If and only if the past state had been thus, the Predictor would have put a million dollars in the box.


Now, the prescription here to take just the larger box is more shockingly counter-intuitive than I can hope to decisively argue for in a brief talk, but, do at least note that a person who agrees that it is rational to take just the one box here does fare better than a person who believes otherwise, who would never be presented with a million dollars in the first place. If we do, at least tentatively, accept some of this analysis, for the sake of argument to see what follows from it, then we can move on now to another toy scenario, which dispenses with the determinism and super-prediction assumptions and arguably has more direct real world applicability.


That scenario is the famous prisoner's dilemma. The prisoner's dilemma is a two player game in which both players make their moves simultaneously and independently, with no communication until both moves have been made. A move consists of writing down either the word “cooperate” or “defect.” The payoff matrix is as shown:

[insert image of Prisoner's Dilemma payoffs]

If both players choose cooperate, they both receive 99 dollars. If both defect, they both get 1 dollar. But if one player cooperates and the other defects, then the one who cooperates gets nothing, and the one who defects gets 100 dollars.


Crucially, we stipulate that each player cares only about maximizing her own expected payoff, and that the payoff in this particular instance of the game is the only goal, with no affect on anything else, including any subsequent rounds of the game, that could further complicate the decision. Let's assume that both players are smart and knowledgeable enough to find the correct solution to this problem and to act accordingly. What I mean by the correct answer is the one that maximizes that player's expected payoff. Let's further assume that each player is aware of the other player's competence, and their knowledge of their own competence, and so on. So then, what is the right answer that they'll both find?


On the face of it, it would be nice if both players were to cooperate, and receive close to the maximum payoff. But if I'm one of the players, I might reason that y opponent's move is causally independent of mine: regardless of what I do, my opponent's move is either to cooperate or not. If my opponent cooperates, I receive a dollar more if I defect than if I cooperate- 100$ vs 99$. Likewise if my opponent defects: I get a dollar more if I defect than if I cooperate, in this case 1 dollar vs nothing. So, in either case, regardless of what move my opponent makes, my defected causes me to get one dollar more than my cooperating causes me to get, which seemingly makes defected the right choice. Defecting is indeed the choice that's endorsed by standard game theory. And of course my opponent can reason similarly.


So, if we're both convinced that we only have a choice about what we can cause, then we're both rationally compelled to defect, leaving us both much poorer than if we both cooperated. So, here again, an exclusively causal view of what we have a choice about leads to us having to lament that our unfortunate rationality keeps a much better outcome out of our reach. But we can arrive at a better outcome if we keep in mind the lesson from Newcomb's problem or even the handraising example that it can make sense to act for the sake of what would be the case if you so acted, even if your action does not cause it to be the case. Even without the help of any super-predictors in this scenario, I can reason that if I, acting by stipulation as a correct solver of this problem, were to choose to cooperate, then that's what correct solvers of this problem do in such situations, and in particular that's what my opponent, as a correct solver of this problem, does too.

Comment author: Randaly 07 August 2012 04:19:35PM *  3 points [-]


Similarly, if I were to figure out that defecting is correct, that's what I can expect my opponent to do. This is similar to my ability to predict what your answer to adding a given pair of numbers would be: I can merely add the numbers myself, and, given our mutual competence at addition, solve the problem. The universe is predictable enough that we routinely, and fairly accurately, make such predictions about one another. From this viewpoint, I can reason that, if I were to cooperate or not, then my opponent would make the corresponding choice- if indeed we are both correctly solving the same problem, my opponent maximizing his expected payoff just as I maximize mine. I therefore act for the sake of what my opponent's action would then be, even though I cannot causally influence my opponent to take one action or the other, since there is no communication between us. Accordingly, I cooperate, and so does my opponent, using similar reasoning, and we both do fairly well.


One problem with the Prisoner's Dilemma is that the idealized degree of symmetry that's postulated between the two players may seldom occur in real life. But there are some important generalizations that may apply much more broadly. In particular, in many situations, the beneficiary of your cooperation may not be the same as the person whose cooperation benefits you. Instead, your decision whether to cooperate with one person may be symmetric to a different person's decision to cooperate with you. Again, even in the absence of any causal influence upon your potential benefactors, even if they will never learn of your cooperation with others, and even, moreover, if you already know of their cooperation with you before you make your own choice. That is analogous to the transparent version of Newcomb's Problem: there too, you act for the same of something that you already know is already obtained.


Anyways, as many authors have noted with regards to the Prisoner's Dilemma, this is beginning to sound a little like the Golden Rule or the Categorical Imperative: act towards others as you would like others to act towards you, in similar situations. The analysis in terms of counterfactual reasoning provides a rationale, under some circumstances, for taking an action that causes net harm to your own interests and net benefit to others' interests although the choice is still ultimately grounded in your own goals because of what would be the case because of others' isomorphic behavior if you yourself were to cooperate or not. Having a deriveable rationale for ethical or moral behaviour would be desirable for all sorts of reasons, not least of which is to help us make the momentous decisions as to how or even whether to engineer the Singularity.

There's about 2 more minutes of his presentation before he finished, but it looks like he just made some comparisons with TDT, so I'm too lazy to copy it over.

Comment author: Pablo_Stafforini 07 August 2012 05:16:33PM 3 points [-]

Maybe you should post the transcript as an article. Other users have posted talk transcripts before, and they were generally well received.

Comment author: Randaly 07 August 2012 08:23:49PM 1 point [-]

Great idea, thanks!

Comment author: [deleted] 20 December 2012 09:31:38PM 0 points [-]

If there were a way I could communicate with it (e.g. it speaks english) I'd cooperate with it...not because I feel it deserves my cooperation, but because this is the only way I could obtain its cooperation. Otherwise I'd defect, as I'm pretty sure no amount of TDT would correlate its behavior with mine. Also, why are 4 billion humans infected if only 3 billion at most can be saved in the entire matrix? Eliezer, what are you planning...?

Comment author: Indon 10 June 2013 05:25:47PM 0 points [-]

That's a good way to clearly demonstrate a nonempathic actor in the Prisoner's Dilemma; a "Hawk", who views their own payoffs and only their own payoffs as having value and placing no value to the payoffs of others.

But I don't think it's necessary. I would say that humans can visualize a nonempathic human - a bad guy - more easily than they can visualize an empathic human with slightly different motives. We've undoubtedly had to, collectively, deal with a lot of them throughout history.

A while back I was writing a paper and came across a fascinating article about types of economic actors, and that paper concluded that there are probably three different general tendencies in human behavior, and thus three general groups of human actors who have those tendencies: one that tends to play 'tit-for-tat' (who they call 'conditional cooperators'), one that tends to play 'hawk' (who they call 'rational egoists'), and one that tends to play 'grim' (who they call 'willing punishers').

So there are paperclip maximizers among humans. Only the paperclips are their own welfare, with no empathic consideration whatsoever.

Comment author: Aiyen 07 January 2014 09:44:12PM 0 points [-]

Long time lurker, first post.

Isn't the rational choice on a True Prisoner's Dilemma to defect if possible, and to seek a method to bind the opponent to cooperate even if that binding forces one to cooperate as well? An analogous situation is law enforcement-one may well desire to unilaterally break the law, yet favor the existance of police that force all parties concerned to obey it. Of course police that will never interfere with one's own behavior would be even better, but this is usually impractical. Timeless Decision Theory adds that one should cooperate against a sufficiently simiilar agent, as such similar agents will presumably make the same decision, and (C,C) is obviously preferable to (D,D), but against a dissimilar opponent, I would think this would be the optimal strategy.

If you can't bind the paperclip maximizer, defect. If you can, do so, and still defect if possible. If the binding affects you as well, you are now forced to cooperate. And of course, if the clipper is also using TDT, cooperate.