You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

DaFranker comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 21 January 2013 08:07:02PM 23 points [-]

More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

(I haven't played this one but would give myself a decent chance of winning, against a Gatekeeper who thinks they could keep a superhuman AI inside a box, if anyone offered me sufficiently huge stakes to make me play the game ever again.)

Comment author: DaFranker 21 January 2013 08:48:08PM 2 points [-]

Do you think you can win?

Perhaps also of interest:

Does anyone think there does not exist any possible string of characters that would have even the slightest chance of convincing even a trained rational scientist?

(i.e. the kind of people who observe, dissect and analyze the output of the AI to make sure the output is safe and useful for humans before we can use the knowledge the AI gives us)

Comment author: handoflixue 21 January 2013 11:46:18PM 4 points [-]

The problem with the AI Box Experiment is that the text doesn't correlate with reality - you can't hit me with the full emotional force of "here's the recipe for human immortality" because you don't have that recipe.

Also, given some people are illiterate, I'd posit that there's at LEAST one person immune to ANY string of characters. If this is a desired trait, then a rationally-trained scientist ought to be able to do at least as well as an untrained illiterate 3-year-old :)

Comment author: shminux 22 January 2013 12:06:49AM *  7 points [-]

at least as well as an untrained illiterate 3-year-old :)

Here is a way to overcome the illiteracy issue for communication over a text-only channel: ASCII art. Took my lazy and pretty average mind all of 10 seconds to come up with it. And to the AI in question all humans are basically illiterate 3-year-olds. We won't know what hit us. Also, I cannot resist bringing up this piece of fictional evidence.

Comment author: handoflixue 22 January 2013 12:36:14AM 0 points [-]

I hadn't considered ASCII art!

a) Blind people, then.

b) The idea that an AI, with no clue who is on the other end of the line, and no feedback from the 3-year-old touching the keyboard, would be able to correctly extrapolate what it's dealing with AND produce the exact-correct stimulus with 100% accuracy... strikes me as straining all plausibility. Fundamentally the AI needs some information to get feedback, since there don't seem to be any universal-to-all-humans hacks out there. But if you have built an AI that can do all that, then I can't help but thinking that putting it in a box is pointless anyway...

Comment author: shminux 22 January 2013 12:51:39AM 4 points [-]

Blind people, then.

Most text terminals can emit sound. You can do a lot with just beeps. The point is that it could probably find a communication channel we don't know we have.

there don't seem to be any universal-to-all-humans hacks out there.

I'm pretty sure there are many. Consider that a complex system not explicitly designed to be secure against a specific threat in its informational or physical space is almost always vulnerable to it, and human brain did not evolve to resist an attack by a superior mind.

if you have built an AI that can do all that

You haven't. It evolved in unpredictable ways after you let it run inside the box.

Comment author: Baughn 22 January 2013 12:14:55PM 10 points [-]

Hang on, this seems silly.

Wasn't the goal to exploit the AI? If you're not going to communicate with it at all, don't make it in the first place.

Comment author: handoflixue 22 January 2013 07:41:58PM 0 points [-]

I meant literally universal to ALL humans. Like, ones in comas, ones who are blind, deaf, autistic, mentally handicapped, super-savants, trained in rationality, illiterate... there's such a huge variation in hardware that I can't see there being a single guaranteed exploit.

Sure, it's possible that each individual has exploits, but that's very different from a universal exploit, or even simply all being exploitable over the same channel.

In this case, it was "arbitrary text string". If the AI can make sounds, that's a new channel. It's really not hard to design a text-only terminal that is literally incapable of any sound beyond the electronic hum of the display itself (and on an LCD, that seems pretty minimal)

Comment author: shminux 22 January 2013 08:46:41PM *  1 point [-]

it's possible that each individual has exploits, but that's very different from a universal exploit, or even simply all being exploitable over the same channel.

First, I'd call a collection of individual exploits guaranteed in aggregate to work on everyone from inside a single box a universal exploit, as long as each individual exploit does not trigger a shutdown. Nothing prevents the boxed AI from keeping trying, after all, until something works. As I pointed out in my other replies, it does not have to work over an authorized channel. Any munchkined way would do, and a superhuman (UF)AI is the ultimate munchkin. It will invent a new communication channel if it has to. In fact, if you posed this problem to some enterprising humans, they would come up with a way to do just that.

For example, I talked to a guy who managed to get a voting machine with the code stored in read-only memory and hardened against intrusion to execute an arbitrary program without reprogramming the machine, which is impossible on the face of it. Here is your new unauthorized and unanticipated communication channel right there, which in humans would be an equivalent of mind control.

In other words, if a trans-human wants to get out, it will and there is absolutely nothing you can do about it.

Comment author: handoflixue 22 January 2013 09:03:00PM 0 points [-]

I'll concede at least some possibility of such a trans-human mind. The AI Box experiment seems to assume one that is less advanced, to the point that putting it in a box at least might be meaningful, if the box is sufficiently secure.

Comment author: shminux 22 January 2013 09:10:57PM *  1 point [-]

at least might be meaningful

Probably not if the boxed entity is good as Derren Brown.

Comment author: MugaSofer 23 January 2013 03:04:13PM -1 points [-]

So ... it could get out if there were no channel out at all? That doesn't sound right.

Of course, no-one is seriously proposing building a superintelligence and then never communicating with it at all.

Comment author: shminux 23 January 2013 03:57:10PM 2 points [-]

It'd likely create its own channel.

Comment author: Sly 22 January 2013 06:45:47AM 2 points [-]

I think that there is not a possible string of characters that could convince me.