The Truly Iterated Prisoner's Dilemma

Eliezer Yudkowsky

LESSWRONG
LW

The Truly Iterated Prisoner's Dilemma

by Eliezer Yudkowsky

1 min read4th Sep 200886 comments

30

Prisoner's Dilemma

Personal Blog

Followup to: The True Prisoner's Dilemma

For everyone who thought that the rational choice in yesterday's True Prisoner's Dilemma was to defect, a follow-up dilemma:

Suppose that the dilemma was not one-shot, but was rather to be repeated exactly 100 times, where for each round, the payoff matrix looks like this:

	Humans: C	Humans: D
Paperclipper: C	(2 million human lives saved, 2 paperclips gained)	(+3 million lives, +0 paperclips)
Paperclipper: D	(+0 lives, +3 paperclips)	(+1 million lives, +1 paperclip)

As most of you probably know, the king of the classical iterated Prisoner's Dilemma is Tit for Tat, which cooperates on the first round, and on succeeding rounds does whatever its opponent did last time. But what most of you may not realize, is that, if you know when the iteration will stop, Tit for Tat is - according to classical game theory - irrational.

Why? Consider the 100th round. On the 100th round, there will be no future iterations, no chance to retaliate against the other player for defection. Both of you know this, so the game reduces to the one-shot Prisoner's Dilemma. Since you are both classical game theorists, you both defect.

Now consider the 99th round. Both of you know that you will both defect in the 100th round, regardless of what either of you do in the 99th round. So you both know that your future payoff doesn't depend on your current action, only your current payoff. You are both classical game theorists. So you both defect.

Now consider the 98th round...

With humanity and the Paperclipper facing 100 rounds of the iterated Prisoner's Dilemma, do you really truly think that the rational thing for both parties to do, is steadily defect against each other for the next 100 rounds?

Mentioned in

38Rationalist Clue

37Debate Minus Factored Cognition

34Fixed-Length Selective Iterative Prisoner's Dilemma Mechanics

17Today's Inspirational Tale

16Rational Agents Cooperate in the Prisoner's Dilemma

Load More (5/7)

The Truly Iterated Prisoner's Dilemma

New Comment

86 comments, sorted by

oldest

Click to highlight new comments since: Today at 11:08 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Kevin_Dick16y40

I think you may be attacking a straw man here. When I was taught about the PD almost 20 years ago in an undergraduate class, our professor made exactly the same point. If there are enough iterations (even if you know exactly when the game will end), it can be worth the risk to attempt to establish cooperation via Tit-for-Tat. IIRC, it depends on an infinite recursion of your priors on the other guy's priors on your priors, etc. that the other guy will attempt to establish cooperation. You compare this to the expected losses from a defection in the firs... (read more)

[-][anonymous]10y120

I think you may be attacking a straw man here.

It frustrates me immensely to see how many times this claim is made in the comments of Eliezer's posts. At least 75% of the times I read this I've personally encountered someone who made the "straw" claim. In this case, consult the first chapter of Ken Binmore's "Playing for Real".

[-]Silas16y150

Wait wait wait: Isn't this the same kind of argument as in the dilemma about "We will execute you within the next week on a day that you won't expect"? (Sorry, don't know the name for that puzzle.) In that one, the argument goes that if it's the last day of the week, the prisoner knows that's the last chance they have to execute him, so he'll expect it, so it can't be that day. But then, if it's the next-to-last day, he knows they can't execute him on the last day, so they have to execute him on that next-to-last day. But then he expects it! And so on.

So, after concluding they can't execute him, they execute him on Wednesay. "Wait! But I concluded you can't do this!" "Good, then you didn't expect it. Problem solved."

Just as in that problem, you can't stably have an "(un)expected execution day", you can't have an "expected future irrelevance" in this one.

Do I get a prize? No? Okay then.

[-]Tom_P16y-20

A more realistic model would let the number of iterations to be unknown to the players. If the probability that the "meta-game" continues in each stage is high enough, it pays to cooperate.

The conclusion that the only rational thing to do in a 100 stage game with perfectly rational players is to defect is correct, but is an artifact of the fact that the number of stages has been defined precisely, and therefore the players can plan to defect at the last moment (which makes them want to defect progressively earlier and earlier). In the real world, this seems rather unlikely.

[-]Vladimir_Nesov16y00

Silas,

It's called Unexpected hanging paradox and I linked to it in my sketch of the solution to the one-off dilemma. I agree, the same problem seems to be at work here, and it's orthogonal to two-step argument that takes us from mutual cooperation to mutual defection. You need to mark the performance of complete policies established in the model at the start of the experiment, and not reason backwards, justifying the actions that could have changed the consequences by inevitability of consequences. Again, I'm not quite sure how it all ties together.

[-]denis_bider16y00

What Kevin Dick said.

The benefit to each player from mutual cooperation in a majority of the rounds is much more than the benefit from mutual defection in all rounds. Therefore it makes sense for both players to invest at the beginning, and cooperate, in order to establish each other's trustworthiness.

Tit-for-tat seems like it might be a good strategy in the very early rounds, but as the game goes on, the best reaction to defection might become two defections in response, and in the last rounds, when the other party defects, the best response might be all defections until the end.

[-]Nominull316y10

No, but I damn well expect you to defect the hundredth time. If he's playing true tit-for-tat, you can exploit that by playing along for a time, but cooperating on the hundredth go can't help you in any way, it will only kill a million people.

Do not kill a million people, please.

1Alex Vermillion2y

One could reframe this so that defecting on the 100th step kills 100 million, as that is what you're going to lose over all the rounds. If the other agent knows how you will reason, the games look like C/C * 100 = 200 Million lives+paperclips D/D * 100 = 100 Million lives+paperclips The difference between these is 100Million, which is how much you're going to be penalized for reliably defecting on the last round.

[-]Sebastian_Hagen216y20

Do you really truly think that the rational thing for both parties to do, is steadily defect against each other for the next 100 rounds?

No. That seems obviously wrong, even if I can't figure out where the error lies.
We only get a reversion to the (D,D) case if we know with a high degree of confidence that the other party doesn't use naive Tit for Tat, and they know that we don't. That seems like an iffy assumption to me. If we knew the exact algorithm the other side uses, it would be trivial to find a winning strategy; so how do we know it isn't naive Ti... (read more)

[-]Allan_Crossman16y10

If it's actually common knowledge that both players are "perfectly rational" then they must do whatever game theory says.

But if the paperclip maximizer knows that we're not perfectly rational (or falsely believes that we're not) it will try and achieve a better score than it could get if we were in fact perfectly rational. It will do this by cooperating, at least for a time.

I think correct strategy gets profoundly complicated when one side believes the other side is not fully rational.

[-]Dagon16y-10

I THINK rational agents will defect 100 times in a row, or 100 million times in a row for this specified problem. But I think this problem is impossible. In all cases there will be uncertainty about your opponent/partner - you won't know its utility function perfectly, and you won't know how perfectly it's implemented. Heck, you don't know your OWN utility function perfectly, and you know darn well you're implemented somewhat accidentally. Also, there are few real cases where you know precisely when there will be no further games that can be affected b... (read more)

[+]Aron16y-80

[-]Zubon16y100

Shut up and multiply. Every time you make the wrong choice, 1 million people die. What is your probability that Clippy is going to throw that first C? How did you come to that? You are not allowed to use any version of thinking back from what you would want Clippy to do, or what you would do in its place if you really I promise valued only paperclips and not human lives.

You throw a C, Clippy throws a D. People die, 99 rounds to go. You have just shown Clippy that you are at least willing to cooperate. What is your probability that Clippy is going to throw a C next? Ever?

You throw a C, Clippy throws a D. People die, 98 rounds to go. Are you showing Clippy that you want to cooperate, so it can safety cooperate, or are you just an unresponsive player who will keep throwing Cs no matter what he does? And what does it say to you that Clippy has thrown 2 Ds?

Alternate case, round 1: you throw a C, Clippy throws a C. People live, 99 rounds to go. At what point are you planning to start defecting? Do you think Clippy can't work out that logic too? When do you think Clippy is planning to start defecting?

[-]andrew716y30

Finitely iterated prisoner's dilemma is just like the traveler's dilemma, on which see this article by Kaushik Basu. The "always defect" choice is always a (in fact, the only) Nash equilibrium and an evolutionarily stable strategy, but it turns out that if you measure how stable it is, it becomes less stable as the number of iterations increases. So if there's some kind of noise or uncertainty (as Dagon points out), cooperation becomes rational.

[-]andrew716y00

Above link is bad, try this.

[-]CarlShulman16y220

If you cooperate even once, the common 'knowledge' that you are both classical game theorists is revealed (to all parties) to be false, and your opponent will have to update estimates of your future actions.

[-]Allan_Crossman16y00

Carl - good point.

I shouldn't have conflated perfectly rational agents (if there are such things) with classical game-theorists. Presumably, a perfectly rational agent could make this move for precisely this reason.

Probably the best situation would be if we were so transparently naive that the maximizer could actually verify that we were playing naive tit-for-tat, including on the last round. That way, it would cooperate for 99 rounds. But with it in another universe, I don't see how it can verify anything of the sort.

(By the way, Eliezer, how much communi... (read more)

[-]prase16y20

Zubon,

When do you think Clippy is planning to start defecting?

If Clippy decides the same way as I do, then I expect he starts defecting at the same turn as I do. The result is 100x C,C. There is no way how identical deterministic algorithms with the same input can result in different outputs, so in each turn, C,C or D,D are the only possibilities. It's rational to C.

However, "realistic" Clippy uses different algorithm which is unknown to me. Here I genuinely don't know what to do. To have some preference to choose C over D or conversely, I would ... (read more)

1Benquo12y

Actually, there is something you can do to improve the outcome over always accepting or always switching, without knowing the distribution of money. All you need to do is define your probability of switching according to some function that decreases as the amount of money in the envelope increases. So for example, you could switch with probability exp(-X), where X is the amount of money in the envelope you start with. Of course, to have an exactly optimal strategy, or even to know how much that general strategy will benefit you, you would need to know more about the distribution.

[-]Peter416y20

The backwards reasoning in this problem is the same as is used in the unexpected hanging paradox, and similar to a problem called Guess 2/3 of the Average. This is where a group of players each guess a number between 0 and 100, and the player whose guess is closest to 2/3 of the average of all guesses wins. With thought and some iteration, the rational player can conclude that it is irrational to guess a number greater than (2/3)100, (2/3)^2100, (2/3)^n*100, etc. This has a limit at 0 when n -> ∞, so it is irrational to guess any number greater than zer... (read more)

[-]Paul_Gowder16y110

Eliezer: the rationality of defection in these finitely repeated games has come under some fire, and there's a HUGE literature on it. Reading some of the more prominent examples may help you sort out your position on it.

Start here:

Robert Aumann. 1995. "Backward Induction and Common Knowledge of Rationality." Games and Economic Behavior 8:6-19.

Cristina Bicchieri. 1988. "Strategic Behavior and Counterfactuals." Synthese 76:135-169.

Cristina Bicchieri. 1989. "Self-Refuting Theories of Strategic Interaction: A Paradox of Common Knowledge." Erkenntnis 30:69-85.

Ken Binmore. 1987. "Modeling Rational Players I." Economics and Philosophy 3:9-55.

Jon Elster. 1993. "Some unresolved problems in the theory of rational behaviour." Acta Sociologica 36: 179-190.

Philip Reny. 1992. "Rationality in Extensive-Form Games." The Journal of Economic Perspectives 6:103-118.

Phillip Petit and Robert Sugden. 1989. "The Backward Induction Paradox." The Journal of Philosophy 86:169-182.

Brian Skyrms. 1998. "Subjunctive Conditionals and Revealed Preference." Philosophy of Science 65:545-574

Robert Stalnaker. 1999. "Knowledge, Belief and Counterfactual Reasoning in Games." in Cristina Bicchieri, Richard Jeffrey, and Brian Skyrms, eds., The Logic of Strategy. New York: Oxford University Press.

[-]Venkat16y00

I've wondered about, and even modeled versions of the fixed horizon IPD in the past. I concluded that so long as the finite horizon number is sufficiently large in the context of the application (100 is large for prison scenarios, tiny for other applications), a proper discounted accounting of future payoffs will restore TFT as an ESS. Axelrod used discounting schemes in various ways in his book(s).

The undiscounted case will always collapse. Recursive collapse to defect is actually rational and a good model for some situations, but you are right, in other ... (read more)

[-]Vladimir_Nesov16y10

prase, Venkat: There is nothing symmetrical about choices of two players. One is playing for paperclips, another for different number of lives. One selects P2.Decision, another selects P1.Decision. How to recognize the "symmetry" of decisions, if they are not called by the same name? What makes it the answer in that case?

prase: It's Two envelopes problem.

[-]RobinHanson16y30

As Paul says, this is very well trodden ground. Since it hasn't been assumed that we are sure we know how the other party reasons, we might want to invest some early rounds in probing to see how the party thinks.

[-]Eliezer Yudkowsky16y70

Eliezer: the rationality of defection in these finitely repeated games has come under some fire, and there's a HUGE literature on it. Reading some of the more prominent examples may help you sort out your position on it.

My position is already sorted, I assure you. I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega.

As Paul says, this is very well trodden ground. Since it hasn't been assumed that we are sure we know how the other party reasons, we might want to invest some early rounds in probing to see

... (read more)

1Mati_Roy10y

Do you mean "I cooperate with the Paperclipper if AND ONLY IF I think it will one-box on Newcomb's Problem with myself as Omega AND I think it thinks I'm Omega AND I think it thinks I think it thinks I'm Omega, etc." ? This seems to require an infinite amount of knowledge, no? Edit: and you said "We have never interacted with the paperclip maximizer before", so do you think it would one-box?

0Philip_W9y

I think he means "I cooperate with the Paperclipper IFF it would one-box on Newcomb's problem with myself (with my present knowledge) playing the role of Omega, where I get sent to rationality hell if I guess wrong". In other words: If Elezier believes that if Elezier and Clippy were in the situation that Elezier would prepare for one-boxing if he expected Clippy to one-box and two-box if he expected Clippy to two-box, Clippy would one-box, then Elezier will cooperate with Clippy. Or in other words still: If Elezier believes Clippy to be ignorant and rational enough that it can't predict Elezier's actions but uses game theory at the same level as him, then Elezier will cooperate. In the uniterated prisoner's dilemma, there is no evidence, so it comes down to priors. If all players are rational mutual one-boxers, and all players are blind except for knowing they're all mutual one-boxers, then they should expect everyone to make the same choice. If you just decide that you'll defect/one-box to outsmart others, you may expect everyone to do so, so you'll be worse off than if you decided not to defect (and therefore nobody else would rationally do so either). Even if you decide to defect based on a true random number generator, then for (2,2) (0,3) (3,0) (1,1) the best option is still to cooperate 100% of the time. If there are less rational agents afoot, the game changes. The expected reward for cooperation becomes 2(xr+(1-d-r)) and the reward for defection becomes 3(xr+(1-d-r))+d+(1-x)r=1+2(xr+(1-d-r)), where r is the fraction of agents who are rational, d is the fraction expected to defect, x is the probability with which you (and by extension other rational agents) will cooperate, and (1-d-r) is the fraction of agents who will always cooperate. Optimise for x in 2x(xr+(1-d-r))+(1-x)(1+2(xr+(1-d-r)))=1-x+2(xr-1-d-r)=x(2r-1)-(1+2d+2r); which means you should cooperate 100% of the time if the fraction of agents who are rational r > 0.5, and defect 100% of the time

[-]comingstorm16y00

This "perfectly rational" game-theoretic solution seems to be fragile, in that the threshold of "irrationality" necessary to avoid N out of N rounds of defection seems to be shaved successively thinner as N increases from 1.

Also, though I don't remember the details, I believe that slight perturbations in the exact rules may also cause the exact game-theoretic solution to change to something more interesting. Note that adding uncertainty in the exact number of rounds has the effect of removing your induction premise: e.g., a 1% chance of... (read more)

[-]Jordan_Fisher16y00

"As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD."

... And I'm interested in your justification for potentially not defecting in the one-shot PD.

I see no contradiction in defecting in the one-shot but not iterated. As has been mentioned, as the number of iterations increases the risk to reward ratio ... (read more)

[-]pdf23ds16y00

I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega.

This strategy would apply to the first round. For the iterated game, would you thereafter apply Tit for tat?

[-]Andrew_Hay16y00

I thought the aim is to win isn't it? Clearly, whats best for both of them is to cooperate at every step. In the case that paperclipper is something like what most people here think say 'rationality' is, it will defect everytime, and thus Humans would also defect, leading to not the best utility total possible.

However, If you think of the Paperclipper as something like us with different terminal values, surely cooperating is best? It knows, as we do, that defecting gives you more if the other cooperates, but defecting is not a winning strategy in the long ... (read more)

[-]Eliezer Yudkowsky16y40

I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega. This strategy would apply to the first round. For the iterated game, would you thereafter apply Tit for tat?

The strategy applies to every round equally, if the Paperclipper is in fact behaving as I expect. If the Paperclipper doesn't behave as I expect, the strategy is unuseful, and I might well switch to Tit for Tat.

0johnlawrenceaspden10y

I will one-box on Newcomb's Problem with you as Omega. As in, I really will. That's what I think the right thing to do is. Would you care to play a round of high-stakes prisoner's dilemma?

[-]CarlShulman16y30

"And are you really "exploiting" an "irrational" opponent, if the party "exploited" ends up better off? Wouldn't you end up wishing you were stupider, so you could be exploited - wishing to be unilaterally stupider, regardless of the other party's intelligence? Hence the phrase "regret of rationality"..."

Plus regret of information. In a mixed population of classical decision theory (CDT) agents and Tit-for-Tat (TFT) agents, paired randomly and without common knowledge of one another's types, the CDT agents ... (read more)

[-]CarlShulman16y00

[Mixed population with a sufficient TFT proportion.]

[-]RobinHanson16y10

You didn't say in the post that the other party was "perfectly rational". If we knew that and knew what it meant, of course the answer would be obvious.

[-]Allan_Crossman16y10

I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.

[...] What if neither party to the IPD thinks there's a realistic chance that the other party is stupid - if they're both superintelligences, say?

It's never worthwhile to cooperate in the one shot case, unless the two players' actions are linked in some Newcomb-esque way.

In the iterated case, if there's even a fairly small chance that the other player will try to establish... (read more)

[-]Grant16y00

If "rational" actors always defect and only "irrational" actors can establish cooperation and increase their returns, this makes me question the definition of "rational".

However, it seems like the priors of a true prisoner's dilemma are hard to come by (absolutely zero knowledge of the other player and zero communication). Don't we already know more about the paperclip maximizer than the scenario allows? Any superintelligence would understand tit-for-tat playing, and know that other intelligences should understand it as well. ... (read more)

[-]lowly_undergrad316y00

Maybe I'm an aberration, but my Introductory Microeconomics professor actually went over this the same way you did regarding the flaw of tit for tat. It confuses me that anyone would teach it differently.

[-]Mike_Blume16y50

I'm almost seeing shades of Self-PA here, except it's Self-PA that co-operates.

If I assume that the other agent is perfectly rational, and if I further assume that whatever I ultimately choose to do will be perfectly rational (hence Self-PA), then I know that my choice will match that of the paperclip maximizer. Thus, I am now choosing between (D,D) and (C,C), and I of course choose to co-operate.

[-]prase16y00

V.Nesov: There is nothing symmetrical about choices of two players. One is playing for paperclips, another for different number of lives. One selects P2.Decision, another selects P1.Decision. How to recognize the "symmetry" of decisions, if they are not called by the same name?

The decision processes can be isomorphic. We can think about the paperclipper being absoulutely the same as we are, except valuing paperclips instead of our values. This of course assumes we can separate the thinking into "values part" and "algorithmic part&q... (read more)

[-]conchis16y00

I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.

I don't see the inconsistency.

Defect is rational in the one-shot game provided my choice gives me no information about the other player's choice.

In contrast, the backwards induction result also relies on common knowledge of rationality (which, incidentally, seems oddly circular: if I cooperate in the first round, then I demonstrate that I'm not "rational" in the t... (read more)

[-]Lightwave216y00

There's a dilemma or a paradox here only if both agents are perfectly rational intelligences. In the case of humans vs aliens, the logical choice would be "cooperate on the first round, and on succeeding rounds do whatever its opponent did last time". The risk of losing the first round (1 million people lost) is worth taking because of the extra 98-99 million people you can potentially save if the other side also cooperates.

[-]RobinHanson16y10

Decision theory is enough to advise actions - so why do we need game theory? A game theory is really just a theory about the distribution over how other agents think. Given such a distribution, decision theory is enough to tell you what to do. So any simple game theory, one that claimed with certainty that all other agents always think a particular way, must be wrong. Of course sometimes a simple game theory can be good enough - if slight variations from some standard way of thinking doesn't make much difference. But when small variations can make large differences, the only safe game theory is a wide distribution over the many ways other agents might think.

[-]Mikko16y00

What do you think would happen if Prisoner's Dilemma is framed differently?

An example.

Do you think this framing would affect your inititial reaction? General population?

(The wording of the choises is not very elegant, and I am not sure whether presentation is sufficiently symmetrical, but you get the basic idea).

It could be that words such as "prisoner", "prison sentence", "guard" or even "game" and "defect" frame more people to intuitively avoid co-operation.

[-]Dagon16y00

Does this imply that YOU would one-box Newcomb's offer with Clippy as Omega? And that you think at least some Clippies would take just one box with you as Omega?

For the problem as stated, what probability would you assign to Clippy's Cooperation (on both the one-shot or fixed-iteration, if they're different).

[-]Caledonian216y-10

What is the point in talking about 'bias' and 'rationality' when you cannot even agree what those words mean?

What would a rational entity do in an Iterated Prisoner's Dilemma? Do any of you have something substantive to say about that question, or is it all just speculation and assertion?

[-]Eliezer Yudkowsky16y20

Mike Blume: I'm almost seeing shades of Self-PA here, except it's Self-PA that co-operates.

+1 Perceptive to Blume!

Mikko, your poll is not the Prisoner's Dilemma - part of the payoff matrix is reversed.

[-]Ben_Jones16y00

Eliezer: I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega.

Isn't that tantamount to Clippit believing you to be omnipotent though? If I thought my co-player was omnipotent I'm pretty certain I'd be cooperating.

Or are you just looking for a co-player who shuts up and calculates/chooses straight? In which case, good heuristic I suppose.

[-]michael_vassar16y00

But Eliezer, you can't assume that Clippy uses the same decision making process that you do unless you know that you both unfold from the same program with different utility functions or something. If you have the code that unfolds into Clippy and Clippy has the code that unfolds into you it may be that you can look at Clippy's code and see that Clippy defects if his model of you defects regardless of what he does and cooperates if his model of you cooperates if your model of him cooperates, but you don't have his code. You can't say much about all possible minds or about all possible paperclip maximizing minds.

[-]George_Weinberg216y00

And are you really "exploiting" an "irrational" opponent, if the party "exploited" ends up better off? Wouldn't you end up wishing you were stupider, so you could be exploited - wishing to be unilaterally stupider, regardless of the other party's intelligence? Hence the phrase "regret of rationality"...

Eliezar, you are putting words in your opponents' mouths, then criticizing their terminology.

"Rationality" is I think a well-defined term in game theory, it doesn't mean the same thing as "smart". I... (read more)

[-]Peter_de_Blanc16y00

Mike:

We don't need to assume that Clippy uses the same decision process as us. I might suggest we treat Clippy as a causal decision theorist who has an accurate model of us. Then we ask which (self, outside_model_of_self) pair we should choose to maximize our utility, constrained by outside_model_of_self = self. In this scenario TFT looks pretty good.

[-]pdf23ds16y00

I wonder, Eliezer, if you cooperate on moves 1-99, would you choose to defect on move 100?

[-]Paul_Crowley16y10

Regret of rationality in games isn't a mysterious phenomenon. Let's suppose that after the one round of PD we're going to play I have the power to destroy a billion paperclips at the cost of one human life, and Clippy knows that. If Clippy thinks I'm a rational outcome-maximizer, then he knows that whatever threats I make I'm not going to carry out, because they won't have any payoffs when the time comes. But if it thinks I'm prone to irrational emotional reactions, it might conclude I'll carry out my billion-paperclip threat if it defects, and so cooperate.

[-]pdf23ds16y00

Actually, that's (nearly) equivalent to asking if you would defect in the non-iterated game, and you've said you would not given a one-boxing Clippy.

[-]michael_vassar16y00

Pete, if you do that then being a casual decision theorist won't, you know, actually Win in the one shot case. Note that evolution doesn't produce organisms that cooperate in one shot prisoners dilemmas.

[-]Lightwave16y00

I propose the following solution as the most optimal. It is based on two assumptions.

We'll call the two sides Agent 1 (Humanity) and Agent 2 (Clippy).

Assumption 1: Agent 1 knows that Agent 2 is logical and will use logic to decide how to act and vise-versa.

This assumption simply means that we do not expect Clippy to be extremely stupid or randomly pick a choice every time. If that were the case, a better strategy would be to "outsmart" him or find a statistical solution.

Assumption 2: Both agents know each other's ultimate goal/optimization target... (read more)

[This comment is no longer endorsed by its author]Reply

[-]Vladimir_Nesov16y00

George: It is trivial to construct scenarios in which being known to be "rational" in the game theory sense is harmful, but in all such cases it is being known to be rational which is harmful, not rationality itself.

Yes, but if you can affect what others know about you by actually ceasing to be "rational", and it will be profitable, persisting in being "rational" is harmful.

[-]pdf23ds16y00

Yes, but if you can affect what others know about you by actually ceasing to be "rational", and it will be profitable, persisting in being "rational" is harmful.

So it can be irrational to be rational, and rational to be irrational? Hmm. I think you might want to say, rather, that an element of unpredictability (ceasing to be predictable) would be called for in this situation, rather than "irrationality". Of course, that leads to suboptimality in some formal sense, but it wins.

[-]George_Weinberg16y00

Change the problem and you change the solution.

If we assume that Eli and Clippy are both essentially self-modifying programs capable of verifiably publishing their own source codes, then indeed they can cooperate:

Eli modifies his own source code in such a way that he assures Clippy that his cooperation is contingent on Clippy's revealing his own source code and that the source code fulfills certain criteria, Clippy modifies his source code appropriately and publishes it.

Now each knows the other will cooperate.

But I think that although we in some ways resem... (read more)

[-]Peter_de_Blanc16y00

Mike:

ah, I guess I wasn't looking at what you were replying to. I was thinking of a fixed number of iterations, but more than one.

[-]Marshall16y-20

I think you guys are calculating too much and talking too much.

Regardless of the "intelligence" of a PM, in my world that is a pretty stupid thing to do. I would expect such a "stupid" agent to do chaotic things indeed evil things. Things I could not predict and things I could not understand.

In an interactioin with a PM I would not expect to win, regardless of how clever and intelligent I am. Maybe they only want to make paperclips (and play with puppies), but such an agent will destroy my world.

I have worked with such PM's.

I would never voluntarily choose to interact with them.

[-]Mike_Blume16y10

Marshall I think that's a bit of a cop-out. People's lives are at stake here and you have to do something. If nothing else, you can simply choose to play defect, worst case the PM does the same, and you save a billion lives (in the first scenario). Are you going to phone up a billion mothers and tell them you let their children die so as not to deal with a character you found unsavory? The problem's phrased the way it is to take that option entirely off the table.

Yes, it will do evil things, if you want to put it that way. Your car will do evil things... (read more)

[-]Marshall16y00

Marshall I think that's a bit of a cop-out.

Why wouldn't a PM cheat? Why would it ever remain inside the frame of the game?

Would two so radically different agents even recognize the same pay-off frame?

"The different one" will have different pay-offs - and I will never know them and am unlikely to benefit fra any of them.

In my world a PM is chaotic, just as I am chaotic in his. Thus we are each other's enemy and must hide from the other.

No interaction because otherwise the number of crying mothers and car dealerships will always be higher.

[-]Benya_Fallenstein16y50

Hi all,

(First comment here. Please tell me if I do something stupid.)

So, I've been trying to follow along at home and figure out how to formulate a theory that would allow us to formalize and justify the intuition that we should cooperate with Clippy "if that is the only way to get Clippy to cooperate with us" (even in a non-iterated PD). I've run into problems with both the formalizing and the justifying part (sigh), but at least I've learned some lessons along the way that were not obvious to me from the posts I've read here so far. (How's that... (read more)

0[anonymous]13y

In this game require(X) is not a valid strategy because you don't have access to the strategy your opponent uses, only to the decisions you've seen it make. In particular, without additional assumptions we can't assume any correlation between a player's moves.

[-]Phil_Goetz16y00

Asking how a "rational" agent reasons about the actions of another "rational" agent is analogous to asking whether a formal logic can prove statements about that logic. I suggest you look into the extensive literature on completeness, incompleteness, and hierarchies of logics. It may be that there are situations such that it is impossible for a "rational" agent to prove what another, equally-rational agent will conclude in that situation.

[-]John_Faben16y10

I'm sure most people here are aware of Axelrod's classic "experiment" with an Iterated Prisoner's Dilemma tournament in which experts from around the world were invited to submit any strategy they liked, with the strategy which scored the highest over several rounds with each of the other strategies winning, and in which Tit for Tat came out top (Tit for Two Tats winning a later rerun. Axelrod's original experiment was fixed-horizon, and every single "nice" strategy (never defect first) that was entered finished above every single "... (read more)

[-]Ben_Jones16y10

Michael Vassar: Note that evolution doesn't produce organisms that cooperate in one shot prisoners dilemmas.

I put myself forwards as counter-evidence.

[-]Caledonian216y10

I put myself forwards as counter-evidence.

I put forward all organisms that have evolved to thrive in multiply-iterated prisoner's dilemma scenarios, but not to distinguish single iterations from multiple iterations.

Which is pretty much every organism with a capacity for altruism.

[-]Eliezer Yudkowsky16y30

Benja: This breaks the implicit decision theoretic premise that your payoff depends only on the action you choose, not on the process you use to arrive at that choice

Correct! The next step in the argument, if you were going to formulate my timeless decision theory, is to describe a new class of games in which your payoff depends only on the type of decision that you make or on the types of decision that you make in different situations, being the person that you are. The former class includes Newcomb's Problem; the latter class further includes the co... (read more)

[-]Matt_Young13y30

Hi. Found the site about a week ago. I read the TDT paper and was intrigued enough to start poring through Eliezer's old posts. I've been working my way through the sequences and following backlinks. The material on rationality has helped me reconstruct my brain after a Halt, Melt and Catch Fire event. Good stuff.

I observe that comments on old posts are welcome, and I notice no one has yet come back to this post with the full formal solution for this dilemma since the publication of TDT. So here it is.

Whatever our opponent's decision algorithm may be... (read more)

0khafra13y

Nice analysis. One small tweak: I would precommit to being vindictive as hell if I believe I'm dominated by my opponent in modeling capability.

0Matt_Young13y

I can certainly empathize with that statement. And if my opponent is not only dominating in ability but exploiting that advantage to the point where I'm losing just as much by submitting as I would by exacting punishment, then that's the tipping point where I start hitting back. Of course, I'd attempt retaliatory behavior initially when I was unsure how dominated I was, as well, but once I know that the opponent is just that much better than me, and as long as they're not abusing that advantage to the point where retaliation becomes cost-effective, then I'd have to concede my opponent's superiority, grit my teeth, bend over, and take one for the team. Especially with a 1 million human lives per util ratio. With lives at stake, I shut up and multiply.

1khafra13y

I meant that as a rational strategy--if my opponent can predict that I'll cooperate until defected upon, at which point I will "tear off the steering wheel and drink a fifth of vodka," and start playing defect-only, his optimal play will not involve strategically chosen defections.

0Matt_Young13y

You know, you're right. I was thrown off by the word "precommit", which implies a reflectively inconsistent strategy, which is TDT-anathema. On the other hand, rational agents win, so having that strategy does make sense in that case, despite the fact that we might incur negative utility relative to playing submissively if we had to actually carry it out. The solution, I think, is to be "the type of agent who would be ruthlessly vindictive against opponents who have enough predictive capability to see that I'm this type of agent, and enough strategic capability to accept that this means they gain nothing by defecting against me." That makes it a reflectively consistent part of a decision theory, by keeping the negative-utility behavior in the realm of the pure counterfactual. As long as you know that having that strategy will effectively deter the other player, I think it can work. And if not, or if I've made an error in some detail of my reasoning of how to make it work, I'm fairly confident at this point that an ideal TDT-agent could find a valid way to address the problem case in a reflectively consistent and strategically sound manner.

[-]chaosmosis12y00

Why is this different in scenarios where you don't know how many rounds will occur?

So long as it's a finite number then defection would appear rational to the type of person who would defect in a noniterated instance.

0thomblake12y

In the case where you know N rounds will occur, you can reason as follows: 1. If one cannot be punished for defection after round x, then one will defect in round x. (premise) 2. If we know what everyone will do in round x, then one cannot be punished for defection in round x. (obvious) 3. There is no round after N, so by (1) everyone will defect in round N. 4. if we know what everyone will do in round x, then we will defect in round x-1, by (1) and (2). 5. By mathematical induction on (3) and (4), we will defect in every round. If everyone doesn't know what round N is, then the base case of the mathematical induction does not exist.

0A1987dM12y

The unexpected hanging paradox makes me sceptical about such kinds of reasoning.

0thomblake12y

I'm not sure why that should apply. The unexpected hanging worked by exploiting the fact that days that were "ruled out" were especially good candidates for being "unexpected". Other readings employ similar linguistic tricks. The reasoning in the first case does not work in practice because in a tournament premise (1) is false; tit-for-tat agents, for example, will cooperate in every round against a cooperative opponent. But that is not even relevant to the fact that the mathematical induction does not work for unknown numbers of rounds.

[-][anonymous]11y00

In a 100 round game, one could precommit to play tit for tat no matter what (including cooperating on the 100th round if the opponent cooperated on the 99th). The opponent will do slightly better than oneself by cooperating 99 rounds and defecting on the 100th, but this is still better than if I had chosen to defect on the 100th round, as my opponent would have seen my precommit to be non-genuine and defected on the 99th round (and maybe even more). If I could have the paperclip maximizer use this strategy and I get to cooperate 99 times and defect once, that would be even better...but it won't happen. Oh well, I'll take 99 (C, C)s and 1 (C, D).

[-]Elusu10y10

I am a dedicated Paperclipper. Ask anyone who knows me well enough to have seen me in a Staples!

As such, I use my lack of human arrogance and postulate that at least some of the entities playing the IPD have intelligence on the order of my own. I do not understand what they are playing for, "1 million human lives" means virtually nothing to me, especially in comparison to a precious precious paperclip, but I assume by hypothesis that the other parties are playing a game similar enough to my own that we can communicate and come to an arrangement.

N... (read more)

4Clippy10y

Prove it. You can't just create an account, claim to be a Paperclipper, and expect people to believe you. Anyone who did so would be using an extremely suboptimal inference engine.

2johnlawrenceaspden10y

clips or it didn't happen...

[-]Murska10y00

Got me to register, this one. I was curious about my own reaction, here.

See, I took in the problem, thought for a moment about game theory and such, but I am not proficient in game theory. I haven't read much of it. I barely know the very basics. And many other people can do that sort of thinking much better than I can.

I took a different angle, because it should all add up to normality. I want to save human lives here. For me, the first instinct on what to do would be to cooperate on the first iteration, then cooperate on the second regardless of whether ... (read more)

[-]dankane9y00

[I realize that I missed the train and probably very few people will read this, but here goes]

So in non-iterated prisoner's dilemma, defect is a dominant strategy. No matter what the opponent is doing, defecting will always give you the best possible outcome. In iterated prisoner's dilemma, there is no longer a dominant strategy. If my opponent is playing Tit-for-Tat, I get the best outcome by cooperating in all rounds but the last. If my opponent ignores what I do, I get the best outcome by always defecting. It is true that all defects is the unique Nash ... (read more)

Moderation Log