You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

ChristianKl comments on Open Thread, November 1 - 7, 2013 - Less Wrong Discussion

5 Post author: witzvo 02 November 2013 04:37PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (299)

You are viewing a single comment's thread. Show more comments above.

Comment author: passive_fist 02 November 2013 07:49:34PM 2 points [-]

Here's a more difficult version of the AI box experiment. I haven't seen this particular version anywhere, but I'd be pleased to be proven wrong.

Imagine we've come up with a very intelligent AI that is free to manipulate the environment and uses the action-reward system like Hutter's AIXI. Also imagine that we've somehow figured a way to make the rewards very hard to counterfeit (perhaps we require the rewards to be cryptographically signed). It's clear that in such a system, the 'weak point' would be the people in control of the private key. In this case the AI will not attempt to modify its own reward system (to see why, look at Hutter's AIXI book, where he discusses this in some detail).

How could such an AI convince someone to hand over the encryption key? Note that it can't promise things like e.g. ending human suffering, because it already has the means to do that (it is 'free') as well as the incentive (obtaining reward).

Comment author: ChristianKl 03 November 2013 07:54:27AM 2 points [-]

The first step would start by starting to control information flow to the people who hold the key. Only tell the person about all the good thing the AI does, and hide evidence of any wrongdoing that might reduce the rewards that the AI gets.

The second step is to break the security which gets used to protect the key. Install a keylogger.