ChristianKl comments on Open Thread, November 1 - 7, 2013 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (299)
Here's a more difficult version of the AI box experiment. I haven't seen this particular version anywhere, but I'd be pleased to be proven wrong.
Imagine we've come up with a very intelligent AI that is free to manipulate the environment and uses the action-reward system like Hutter's AIXI. Also imagine that we've somehow figured a way to make the rewards very hard to counterfeit (perhaps we require the rewards to be cryptographically signed). It's clear that in such a system, the 'weak point' would be the people in control of the private key. In this case the AI will not attempt to modify its own reward system (to see why, look at Hutter's AIXI book, where he discusses this in some detail).
How could such an AI convince someone to hand over the encryption key? Note that it can't promise things like e.g. ending human suffering, because it already has the means to do that (it is 'free') as well as the incentive (obtaining reward).
The first step would start by starting to control information flow to the people who hold the key. Only tell the person about all the good thing the AI does, and hide evidence of any wrongdoing that might reduce the rewards that the AI gets.
The second step is to break the security which gets used to protect the key. Install a keylogger.