Stuart_Armstrong comments on The AI in a box boxes you - Less Wrong

102 Post author: Stuart_Armstrong 02 February 2010 10:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (378)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 02 February 2010 07:21:24PM 12 points [-]

It seems obvious that the correct answer is simply "I ignore all threats of blackmail, but respond to offers of positive-sum trades" but I am not sure how to derive this answer - it relies on parts of TDT/UDT that haven't been worked out yet.

Comment author: Stuart_Armstrong 02 February 2010 11:58:50PM *  8 points [-]

I ignore all threats of blackmail, but respond to offers of positive-sum trades

The difference between the two seems to revolve around the AI's motivation. Assume an AI creates a billion beings and starts torturing them. Then it offers to stop (permanently) in exchange for something.

Whether you accept on TDT/UDT depends on why the AI started torturing them. If it did so to blackmail you, you should turn the offer down. If, on the other hand, it started torturing them because it enjoyed doing so, then its offer is positive sum and should be accepted.

There's also the issue of mistakes - what to do with an AI that mistakenly thought you were not using TDT/UDT, and started the torture for blackmail purposes (or maybe it estimated that the likelyhood of you using TDT/UDT was not quite 1, and that it was worth trying the blackmail anyway)?

Between mistakes of your interpretation of the AI's motives and vice-versa, it seems you may end up stuck in a local minima, which an alternate decision theory could get you out of (such as UDT/TDT with a 1/10 000 of using more conventional decision theories?)

Comment author: Eliezer_Yudkowsky 03 February 2010 12:37:22AM 4 points [-]

Whether you accept on TDT/UDT depends on why the AI started torturing them. If it did so to blackmail you, you should turn the offer down. If, on the other hand, it started torturing them because it enjoyed doing so, then its offer is positive sum and should be accepted.

Correct. But this reaches into the arbitrary past, including a decision a billion years ago to enjoy something in order to provide better blackmail material.

There's also the issue of mistakes - what to do with an AI that mistakenly thought you were not using TDT/UDT, and started the torture for blackmail purposes (or maybe it estimated that the likelyhood of you using TDT/UDT was not quite 1, and that it was worth trying the blackmail anyway)?

Ignoring it or retaliating spitefully are two possibilities.

Comment author: Stuart_Armstrong 03 February 2010 08:24:42PM *  0 points [-]

or retaliating spitefully

I like it. Splicing some altruistic punishment into TDT/UDT might overcome the signalling problem.

Comment author: Eliezer_Yudkowsky 03 February 2010 08:48:41PM 3 points [-]

That's not a splice. It ought to be emergent in a timeless decision theory, if it's the right thing to do.

Comment author: ciphergoth 03 February 2010 10:29:19PM 2 points [-]

TDT/UDT seems to being about being ungameable; does it solve Pascal's Mugging?

Comment author: MichaelHoward 07 February 2010 11:16:06AM 4 points [-]
Comment author: wedrifid 07 February 2010 11:57:27AM *  7 points [-]

The problem with throwing about 'emergent' is that it is a word that doesn't really explain any complexity or narrow down the options out of potential 'emergent' options. In this instance, that is the point. Sure, 'atruistic punishment' could happen. But only if it's the right option and TDT should not privilege that hypothesis specifically.