Open Thread, November 1 - 7, 2013

witzvo

Open Thread, November 1 - 7, 2013 — LessWrong

Comment Permalink

Here's a more difficult version of the AI box experiment. I haven't seen this particular version anywhere, but I'd be pleased to be proven wrong.

Imagine we've come up with a very intelligent AI that is free to manipulate the environment and uses the action-reward system like Hutter's AIXI. Also imagine that we've somehow figured a way to make the rewards very hard to counterfeit (perhaps we require the rewards to be cryptographically signed). It's clear that in such a system, the 'weak point' would be the people in control of the private key. In this case the AI will not attempt to modify its own reward system (to see why, look at Hutter's AIXI book, where he discusses this in some detail).

How could such an AI convince someone to hand over the encryption key? Note that it can't promise things like e.g. ending human suffering, because it already has the means to do that (it is 'free') as well as the incentive (obtaining reward).

solipsist13y20

I don't understand how this encryption would work. What do people physically do to reward the AI, and how do you ensure that only people can do that? Would humans compute RSA signatures in their head? Would humans typing reusable passwords onto a "secure" reward computer that is "outside the AI's control"? Do humans precompute and memorize a finite number of one-time reward phrases before the AI is turned on, and reward the AI by uttering a phrase aloud?

In the precomputed, one-time cookie case, I'd just make the human think about the reward phrase. I'm sure humans leak thoughts like a sieve through subvocalization, nerve impulses, etc.

4ChristianKl13y

The first step would start by starting to control information flow to the people who hold the key. Only tell the person about all the good thing the AI does, and hide evidence of any wrongdoing that might reduce the rewards that the AI gets. The second step is to break the security which gets used to protect the key. Install a keylogger.

9Adele Lopez13y

Torture is probably the easiest way. Another way could be to examine the key-keeper's mind and extract the key directly from that, but this is needlessly complicated.

See in context