You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Adele_L comments on Open Thread, November 1 - 7, 2013 - Less Wrong Discussion

5 Post author: witzvo 02 November 2013 04:37PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (299)

You are viewing a single comment's thread. Show more comments above.

Comment author: passive_fist 02 November 2013 07:49:34PM 2 points [-]

Here's a more difficult version of the AI box experiment. I haven't seen this particular version anywhere, but I'd be pleased to be proven wrong.

Imagine we've come up with a very intelligent AI that is free to manipulate the environment and uses the action-reward system like Hutter's AIXI. Also imagine that we've somehow figured a way to make the rewards very hard to counterfeit (perhaps we require the rewards to be cryptographically signed). It's clear that in such a system, the 'weak point' would be the people in control of the private key. In this case the AI will not attempt to modify its own reward system (to see why, look at Hutter's AIXI book, where he discusses this in some detail).

How could such an AI convince someone to hand over the encryption key? Note that it can't promise things like e.g. ending human suffering, because it already has the means to do that (it is 'free') as well as the incentive (obtaining reward).

Comment author: Adele_L 02 November 2013 07:58:03PM 5 points [-]

How could such an AI convince someone to hand over the encryption key? Note that it can't promise things like e.g. ending human suffering, because it already has the means to do that (it is 'free') as well as the incentive (obtaining reward).

Torture is probably the easiest way. Another way could be to examine the key-keeper's mind and extract the key directly from that, but this is needlessly complicated.

Comment author: Eliezer_Yudkowsky 02 November 2013 08:42:48PM 4 points [-]

Torture might stand too great a chance of destroying the encryption key. Though I suppose if nanotech were sufficiently difficult to obtain, the possible key-destructive effects of torture might be balanced against the probability of a car running over the keyholder in the meantime.

Comment author: NancyLebovitz 02 November 2013 10:18:32PM 5 points [-]

I would think that confusion (set things up so the key-keeper is confused and distracted, then do some phishing) is in the same reliability range as torture, and less likely to get the AI in trouble.

Comment author: passive_fist 02 November 2013 08:42:14PM 0 points [-]

I suspect the answer to be more complex than this. The AI knows that if it attempted something like that there is the very huge risk of being cut off from all reward, or even having negative reward administered. In other words: tit for tat. If it tries to torture, it will itself be tortured. Remember that before it has the private key, we are in control.