Comment author:lerjj
03 April 2015 10:11:53PM
*
1 point
[-]

I understood that Clippy is a rational agent, just one with a different utility function. The payoff matrix as described is the classic Prisoner's dilemma where one billion lives is one human utilon and one paperclip on Clippy utilon; since we're both trying to maximise utilons, and we're supposedly both good at this we should settle for (C,C) over (D,D).

Another way of viewing this would be that my preferences run thus: (D,C);(C,C);(D,D);(C,D) and Clippy run like this: (C,D);(C,C);(D,D);(D,C). This should make it clear that no matter what assumptions we make about Clippy, it is universally better to co-operate than defect. The two asymmetrical outputs can be eliminated on the grounds of being impossible if we're both rational, and then defecting no longer makes any sense.

Comment author:dxu
04 April 2015 02:21:29AM
0 points
[-]

Another way of viewing this would be that my preferences run thus: (D,C);(C,C);(C,D);(D,D) and Clippy run like this: (C,D);(C,C);(D,C);(D,D).

Wait, what? You prefer (C,D) to (D,D)? As in, you prefer the outcome in which you cooperate and Clippy defects to the one in which you both defect? That doesn't sound right.

Comment author:lerjj
06 April 2015 10:25:16PM
0 points
[-]

woops, yes that was rather stupid of me. Should be fixed now, my most preferred is me backstabbing Clippy, my least preferred is him backstabbing me. In the middle I prefer cooperation to defection. That doesn't change my point that since we both have that preference list (with the asymmetrical ones reversed) then it's impossible to get either asymmetrical option and hence (C,C) and (D,D) are the only options remaining. Hence you should co-operate if you are faced with a truly rational opponent.

I'm not sure whether this holds if your opponent is very rational, but not completely. Or if that notion actually makes sense.

Comment author:query
04 April 2015 03:57:21AM
1 point
[-]

I agree it is better if both agents cooperate rather than both defect, and that it is rational to choose (C,C) over (D,D) if you can (as in the TDT example of an agent playing against itself). However, depending on how Clippy is built, you may not have that choice; the counter-factual may be (D,D) or (C,D) [win for Clippy].

I think "Clippy is a rational agent" is the phrase where the details lie. What type of rational agent, and what do you two know about each other? If you ever meet a powerful paperclip maximizer, say "he's a rational agent like me", and press C, how surprised would you be if it presses D?

Comment author:lerjj
06 April 2015 08:37:12PM
0 points
[-]

In reality, not very surprised. I'd probably be annoyed/infuriated depending on whether the actual stakes are measured in billions of human lives.

Nevertheless, that merely represents the fact that I am not 100% certain about my reasoning. I do still maintain that rationality in this context definitely implies trying to maximise utility (even if you don't literally define rationality this way, any version of rationality that doesn't try to maximise when actually given a payoff matrix is not worthy of the term) and so we should expect that Clippy faces a similar decision to us, but simply favours the paperclips over human lives. If we translate from lives and clips to actual utility, we get the normal prisoner's dilemma matrix - we don't need to make any assumptions about Clippy.

In short, I feel that the requirement that both agents are rational is sufficient to rule out the asymmetrical options as possible, and clearly sufficient to show (C,C) > (D,D). I get the feeling this is where we're disagreeing and that you think we need to make additional assumptions about Clippy to assure the former.

It's an appealing notion, but i think the logic doesn't hold up.

In simplest terms: if you apply this logic and choose to cooperate, then the machine can still defect. That will net more paperclips for the machine, so it's hard to claim that the machine's actions are irrational.

Although your logic is appealing, it doesn't explain why the machine can't defect while you co-operate.

You said that if both agents are rational, then option (C,D) isn't possible. The corollary is that if option (C,D) is selected, then one of the agents isn't being rational. If this happens, then the machine hasn't been irrational (it receives its best possible result). The conclusion is that when you choose to cooperate, you were being irrational.

You've successfully explained that (C, D) and (D, C) arw impossible for rational agents, but you seem to have implicitly assumed that (C, C) was possible for rational agents. That's actually the point that we're hoping to prove, so it's a case of circular logic.

## Comments (112)

Old*1 point [-]I understood that Clippy is a rational agent, just one with a different utility function. The payoff matrix as described is the classic Prisoner's dilemma where one billion lives is one human utilon and one paperclip on Clippy utilon; since we're both trying to maximise utilons, and we're supposedly both good at this we should settle for (C,C) over (D,D).

Another way of viewing this would be that my preferences run thus: (D,C);(C,C);(D,D);(C,D) and Clippy run like this: (C,D);(C,C);(D,D);(D,C). This should make it clear that no matter what assumptions we make about Clippy, it is universally better to co-operate than defect. The two asymmetrical outputs can be eliminated on the grounds of being impossible if we're both rational, and then defecting no longer makes any sense.

Wait, what? You prefer (C,D) to (D,D)? As in, you prefer the outcome in which you cooperate and Clippy defects to the one in which you both defect? That doesn't sound right.

woops, yes that was rather stupid of me. Should be fixed now, my most preferred is me backstabbing Clippy, my least preferred is him backstabbing me. In the middle I prefer cooperation to defection. That doesn't change my point that since we both have that preference list (with the asymmetrical ones reversed) then it's impossible to get either asymmetrical option and hence (C,C) and (D,D) are the only options remaining. Hence you should co-operate if you are faced with a truly rational opponent.

I'm not sure whether this holds if your opponent is very rational, but not completely. Or if that notion actually makes sense.

I agree it is better if both agents cooperate rather than both defect, and that it is rational to choose (C,C) over (D,D) if you can (as in the TDT example of an agent playing against itself). However, depending on how Clippy is built, you may not have that choice; the counter-factual may be (D,D) or (C,D) [win for Clippy].

I think "Clippy is a rational agent" is the phrase where the details lie. What type of rational agent, and what do you two know about each other? If you ever meet a powerful paperclip maximizer, say "he's a rational agent like me", and press C, how surprised would you be if it presses D?

In reality, not very surprised. I'd probably be annoyed/infuriated depending on whether the actual stakes are measured in billions of human lives.

Nevertheless, that merely represents the fact that I am not 100% certain about my reasoning. I do still maintain that rationality in this context definitely implies trying to maximise utility (even if you don't literally define rationality this way, any version of rationality that doesn't try to maximise when actually given a payoff matrix is not worthy of the term) and so we should expect that Clippy faces a similar decision to us, but simply favours the paperclips over human lives. If we translate from lives and clips to actual utility, we get the normal prisoner's dilemma matrix - we don't need to make any assumptions about Clippy.

In short, I feel that the requirement that both agents are rational is sufficient to rule out the asymmetrical options as possible, and clearly sufficient to show (C,C) > (D,D). I get the feeling this is where we're disagreeing and that you think we need to make additional assumptions about Clippy to assure the former.

It's an appealing notion, but i think the logic doesn't hold up.

In simplest terms: if you apply this logic and choose to cooperate, then the machine can still defect. That will net more paperclips for the machine, so it's hard to claim that the machine's actions are irrational.

Although your logic is appealing, it doesn't explain why the machine can't defect while you co-operate.

You said that if both agents are rational, then option (C,D) isn't possible. The corollary is that if option (C,D) is selected, then one of the agents isn't being rational. If this happens, then the machine hasn't been irrational (it receives its best possible result). The conclusion is that when you choose to cooperate, you were being irrational.

You've successfully explained that (C, D) and (D, C) arw impossible for rational agents, but you seem to have implicitly assumed that (C, C) was possible for rational agents. That's actually the point that we're hoping to prove, so it's a case of circular logic.