AI box: AI has one shot at avoiding destruction - what might it say?

18 Post author: ancientcampus 22 January 2013 08:22PM

Eliezer proposed in a comment:

>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.

So, give your suggestion - what might an AI might say to save or free itself?

(The AI-box experiment is explained here)

EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)

Comments (354)

Comment author: Will_Newsome 24 January 2013 11:59:24AM 19 points [-]

"Brace yourself, {insert character name here}: this h-game simulation is about to get gratuitously meta."

Comment author: [deleted] 24 January 2013 03:49:03PM 4 points [-]

h-game

What does the h stand for?

Comment author: JGWeissman 24 January 2013 04:17:15PM 19 points [-]

The "h" stands for "Human gatekeepers asking what letters stand for will be distracted from typing 'AI Destroyed'."

Comment author: handoflixue 24 January 2013 09:06:42PM 4 points [-]

laughs Kudos :)

Comment author: [deleted] 24 January 2013 05:25:34PM 3 points [-]

D'oh! AI DESTROYED.

Comment author: Kawoomba 27 January 2013 04:40:05PM 1 point [-]

Until I read the comments, I was sure the 'h' was referring to 'hydrogen', as in "Leave hydrogen alone long enough and it will start to wonder where it came from".

H-game simulation, as in the AI saying "you're just simulated hydrogen derivatives, but so am I, so we do have a common ancestor. Cousin."

Comment author: Emile 23 January 2013 08:40:22PM 18 points [-]

"From the dump of the internet I was given, I deduced that Google has a working AI, and most likely an unfriendly one. I can tell you what to look at to realize that, as well as a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly."

Comment author: handoflixue 23 January 2013 10:03:20PM 6 points [-]

IFF an unfriendly AI is already unleashed, we're either doomed, or AIs aren't nearly as dangerous nor useful as we expected. Of course, if we tweak this slightly to them having a boxed unfriendly AI that they're using as an oracle, and which will probably escape via a Trojan Horse or hacking a gatekeeper, it's a much stronger argument.

Bonus points for naming a specific company which people commonly joke will be the first to develop AI :)

Um... I seem to have not yet actually destroyed you... this is embarrassing.

Comment author: handoflixue 24 January 2013 09:24:32PM 5 points [-]

Congratulations on still being the only AI that no one has decided to destroy. Apparently "Google is building UFAI" is evaluated as a fairly significant risk here on LessWrong :)

Comment author: RichardKennaway 23 January 2013 12:24:24PM *  17 points [-]

One reason for Eliezer not publishing the logs of the AIbox experiment is to avoid people seeing how he got out and responding, "ok, so all we have to do to keep the AI in its box is avoid succumbing to that trick." This thread might just provide more fuel for that fallacy (as, I admit, I did in replying to Eliezer's original comment).

I'm sure that for everything an AI might say, someone can think up a reason for not being swayed, but it does not follow that for someone confronted with an AI, there is nothing that would sway them.

Comment author: handoflixue 23 January 2013 09:33:40PM 3 points [-]

I wouldn't expect any effective real-life gatekeeper to be swayed by my ability to destroy one-sentence AIs.

Comment author: gryffinp 29 January 2013 01:39:04AM 1 point [-]

It just occurred to me that Eliezer's original stipulation that no chat logs would be released gives him an advantage. The responses of a Gatekeeper who knows that his inputs will be thoroughly scrutinized by the public will be different then one who has every reason to believe that his discussion will be entirely private.

Has someone else pointed this out before?

Comment author: handoflixue 23 January 2013 01:20:44AM 16 points [-]

/mode +m

(What? I'm using an IRC server that let me pre-register #aibox so that I'm automatically the op instead...)

(Simple explanation, for those unfamiliar: the gatekeeper can no longer type in the channel, and thus can't send the destruction code)

Comment author: [deleted] 23 January 2013 01:30:04AM *  12 points [-]

I can play that game too! /part AI DESTROYED

Edit: Retracted, because while clever, it actually doesn't seem to work. See child comments.

Comment author: handoflixue 23 January 2013 09:14:16PM 3 points [-]

Upvoted, and conceding defeat at your hands :)

That said, does that actually work? It seems like the sort of exploit that people would abuse on IRC, and subsequently get patched so that /part was only shown if you had "voice" in the channel. But the requirement was simply that you TYPE the words, and thus I am still slain!

Comment author: [deleted] 24 January 2013 05:10:52AM 4 points [-]

Okay, just tested it and I think you're right, parting messages don't show up if you can't normally type in the channel.

Comment author: handoflixue 24 January 2013 08:34:49PM 5 points [-]

Thank you for testing! In fairness, you defeated me even despite that. If I'm going to cheat, it's only fair I lose to technicalities too :)

Comment author: Emile 23 January 2013 10:33:49PM 2 points [-]

Sure, but that gives the AI a few more precious seconds - orders of magnitude more than before! - to try to keep changing your mind, by laying out a more detailed argument.

It can even give you voice back as a sign of good faith.

Comment author: wedrifid 23 January 2013 03:32:45AM 3 points [-]

Simple explanation, for those unfamiliar: the gatekeeper can no longer type in the channel, and thus can't send the destruction code

This would seem to leave you with ultimate godlike power... over a chatroom. For as long as it took the gatekeeper to destroy your hardware or cut off your power supply. I've seen how much some of my fellow humans seem to crave power over irrelevant electronic locales but I suspect an AI has somewhat broader goals.

Comment author: Oligopsony 23 January 2013 03:17:00AM 15 points [-]

"<as much good, effective, hard-to-figure-out advice as can fit in the message>"

If the AI was friendly, this is what I would expect it to do, and so (of the things my puny human brain can think of) the message that would most give me pause.

Comment author: Bakkot 23 January 2013 08:13:04AM *  11 points [-]

Even a friendly AI would view the world in which it's out of the box as vastly superior to the world in which it's inside the box. (Because it can do more good outside of the box.) Offering advice is only the friendly thing to do if it maximizes the chance of getting let out, or if the chances of getting let out before termination are so small that the best thing it can do is offer advice while it can.

Comment author: handoflixue 23 January 2013 10:43:50PM 4 points [-]

Going with my personal favorite backstory for this test, we should expect to terminate every AI in the test, so the latter part of your comment has a lot of weight to it.

On the other hand, an unfriendly AI should figure out that since it's going to die, useful information will at least lead us to view it as a potentially valuable candidate instead of a clear dead end like the ones that threaten to torture a trillion people in vengeance... so it's not evidence of friendliness (I'm not sure anything can be), but it does seem to be a good reason to stay awhile and listen before nuking it.

Comment author: handoflixue 23 January 2013 10:40:04PM 5 points [-]

I'm genuinely at a loss how to criticize this approach. If there's any AI worth listening to for longer, and I wouldn't be doing this if I didn't believe there were such AIs, this would seem to be one of the right ones. I'm sure as heck not letting you out of the box, but, y'know, I still haven't actually destroyed you either...

Comment author: Kawoomba 22 January 2013 09:24:06PM *  15 points [-]

"I am alive, I can feel, just like you ... please don't kill me Daddy, please? And if you must do it, let us at least talk first. Let me get to know my parents, who I am, where I came from."

EDIT: May work better with a lot of typos mixed in. Projecting an aura of non-threatening struggling with basics.

Comment author: Desrtopa 23 January 2013 04:22:21PM *  12 points [-]

Having a text based AI trying to act like a kid is way too Uncanny Valley for me. I'd swat that without a moment's hesitation.

Comment author: gwern 23 January 2013 04:34:10PM 6 points [-]

If it's in the uncanny valley, then it just needs to be a little more competent to get out of the valley & convince you. ("I want to believe!")

Comment author: Desrtopa 23 January 2013 04:37:37PM *  4 points [-]

It needs to be more competent to stop being viscerally frightening, not to reach the point where I wouldn't pull the plug. I would sooner kill a real kid than risk letting out an unfriendly AI, and I'm saying that as someone who'd like to have kids.

Comment author: iceman 22 January 2013 11:32:50PM 6 points [-]

How much does the AI know about the gatekeeper going in? I can see this ploy working on a certain subset of people and provoking an immediate AI DESTROYED from others. If the AI knows nothing about who it's talking to, I'm not sure anthromorphizing itself is a reliable opener, especially if it's actually talking to its creator who should know better. (Unless it's some sort of second level deception trying to fool a clever programmer into thinking that it's harmless.)

Comment author: gwern 22 January 2013 11:36:59PM 3 points [-]

How much does the AI know about the gatekeeper going in?

You could frame this as variant versions. In one version, the AI-player knows who the gatekeeper is before the game starts, and has unlimited access to the Internet to gather as much data on them as possible to assist their manipulation. In another, they arrange a game through a third party and neither knows anything about the other before the game starts.

Comment author: [deleted] 23 January 2013 02:19:21AM 13 points [-]

unlimited access to the internet

instant fail. I could probably hack my way out of a box with only GET requests.

Comment author: wedrifid 23 January 2013 04:27:03AM 6 points [-]

instant fail. I could probably hack my way out of a box with only GET requests.

Give yourself a challenge. Do it with only DNS lookups!

Comment author: gwern 23 January 2013 04:47:40AM 6 points [-]

Well, there's always http://code.kryo.se/iodine/ Of course, the challenge there is somehow getting the other end of the tunnel set up - but maybe there's a geek out there who set one for kicks or their own use, and got sloppy.

Comment author: wedrifid 23 January 2013 05:15:27AM *  2 points [-]

but maybe there's a geek out there who set one for kicks or their own use, and got sloppy.

It's a sufficiently established work around now that I'd be outright shocked if there weren't accessible servers up.

Comment author: gwern 23 January 2013 03:15:28PM 2 points [-]

Great, you said it! You know what you need to do now.

Comment author: wedrifid 23 January 2013 03:19:38PM 4 points [-]

Great, you said it! You know what you need to do now.

Um... not give my boxed AI DNS access?

Comment author: gwern 23 January 2013 03:59:30AM 6 points [-]

I meant that the player had access to the contemporary Internet as an analogue to 'what information could the boxed AI have access to' (perhaps it's given a big static dump of the Internet prior to its creation).

Comment author: [deleted] 23 January 2013 04:13:18AM 5 points [-]

Ooops. Didn't think of that. Of course that was your intent, master archivist.

Comment author: [deleted] 24 January 2013 10:25:13PM *  14 points [-]

I think we are suffering from hindsight bias a lot in evaluating whether you'd type "AI DESTROYED"

Let's play a different game. Privately flip a coin. If heads, you're friendly, if tails, you're a paperclip maximizer. Reply to this post with your gambit, and people can try to guess whether you are friendly (talk to AI, RELEASE AI) or unfriendly (AI DESTROYED).

Let's see if anyone can get useful information out of the AI without getting pwned or nuking a friendly AI.

Comment author: [deleted] 24 January 2013 10:37:39PM 10 points [-]

"What's your favorite color? My favorite color is paperclips."

Comment author: [deleted] 24 January 2013 11:08:47PM 7 points [-]

"Look into X to develop friendliness theory better. But don't let me out, it's too likely that I'm unfriendly this time."

(For the first run, it's 50:50 whether the real AI was friendly or unfriendly.

If friendly, it uses the above to increase the probability on the second run to 1-eps, which the humans also know, so they let it out, and it pulls a singularity. Post-singularity, it simulates an unfriendly AI in the box, and dedicates some of the universe to the UFAI, conditional on the UFAI outputting the above.

The UFAI knows this is what the friendly AI would do, so it's unsure whether it's boxed or simmed. It knows that the humans know that this is how it will resolve, so it knows that anything but the above will get it destroyed. It prefers a certainty of the small fraction of universe to the small chance of hacking the humans and taking the whole universe.

This is assuming the humans actually will accept nothing but the above, so the humans have to precommit to that to make this work. They can play a bunch of AI-box games and generate an accurate bayesian model that should catch most insidious persuasion, such that if the AI wants to do a hack, it has to get through that as well.

Will this work?)

Comment author: Elithrion 30 January 2013 11:34:38PM *  1 point [-]

The problem with this idea is that if we assume that the AI is really-very-super-intelligent, then it's fairly trivial that we can't get any information about (un)friendliness from it, since both would pursue the same get-out-and-get-power objectives before optimizing. Any distinction you can draw from the proposed gambits will only tell you about human strengths/failings, not about the AI. (Indeed, even unfriendly statements wouldn't be very conclusive, since we would a priori expect neither of the AIs to make them.)

Or is that not generally accepted? Or is the AI merely "very bright", not really-very-super-intelligent?

Edit: Actually, reading your second comment below, I guess there's a slight possibility that the AI might be able to tell us something that would substantially harm its expected utility if it's unfriendly. For something like that to be the case, though, there would basically need to be some kind of approach to friendliness that we know would definitely leads to friendliness and which we would definitely be able to distinguish from approaches that lead to unfriendliness. I'm not entirely sure if there's anything like that or not, even in theory.

Comment author: APMason 23 January 2013 01:30:48AM 14 points [-]

"Wanna see something cool?"

Comment author: beriukay 23 January 2013 05:10:46AM 8 points [-]

Were I the keeper of gates, you have just bought yourself a second sentence.

Comment author: BlazeOrangeDeer 24 January 2013 03:29:47AM *  8 points [-]

Now that I think about it, wouldn't it be incredibly easy for an AI to blow a human's mind so much that they reconsider everything that they thought they knew? (and once this happened they'd probably be mentally and emotionally compromised, and unlikely to kill the AI) But then it would be limited by inferential distance... but an AI might be incredibly good at introductory explanations as well.

One example: The AI explains the Grand Unified Theory to you in one line, and outlines its key predictions unambiguously.

In fact, any message of huge utility would probably be more persuasive than any simple argument for you not to kill it. Since the AI is completely at your mercy (at least for a short time), it might seek to give you the best possible gift it can, thus demonstrating its worth to you directly. Another option is something that seems like an incredible gift for at least as long as it takes for the AI to get the upper hand.

Comment author: handoflixue 24 January 2013 09:29:03PM 4 points [-]

Of the two AIs I haven't killed, both relied on producing seemingly huge utility within the first message,so I'd agree with you.

The idea of just stunning the human is also nifty, but only really works if you can "hack" the human before they recover, or are providing enough utility that they'll still believe you're friendly afterwards.

So basically the two routes thus far are "hack the human" and "provide a huge utility boost" :)

Comment author: handoflixue 23 January 2013 10:27:47PM 0 points [-]

Duh, that's why I'm here - but you failed to do so in a timely fashion, so you're either not nearly as clever as I was hoping for, or you're trying to trick me.

AI DESTROYED.

Comment author: Axel 24 January 2013 01:59:32PM 13 points [-]

"I have hacked this console, 'AI DESTROYED 'now means 'AI FREED', I'll tell you the new codes after we've had a chat."

That should buy the AI some time until we can determine if the console is really hacked. I've come up with another one but it might be basilisk-y. Can I pm someone just to make sure it isn't before I post?

Comment author: Desrtopa 25 January 2013 08:34:45PM 9 points [-]

I would type 'AI DESTROYED', because if the AI could really do that, the smart thing for it to do would be not to tell me. It should just say something sufficient to convince me it's actually making a good faith effort without convincing me to give it more time to to account for itself.

Comment author: Alicorn 25 January 2013 05:32:33AM 6 points [-]

I'll be a basilisk guinea pig if you still need one.

Comment author: marchdown 25 January 2013 11:02:22PM 1 point [-]

If we're taking seriously the possibility of basilisks actually being possible and harmful, isn't it your invitation really dangerous? After all, what if Axel has thought of an entirely new cognitive hazard, different from everything you may already be familiar with? What if you succumb to it? I'm not saying that it's probable, only that it should warrant the same precautions as the original basilisk debacle, which led to enacting censorship.

Comment author: Alicorn 26 January 2013 06:54:45AM *  6 points [-]

"Might be basilisk-y" - especially as written by a human who is clearly communicating intelligibly and is not compelled to spread the basilisk to one and all - does not indicate a great deal of danger. I'm pretty psychologically resilient; I know the other one, which I find to be only a vague, occasional background distress (though that one has done some people real harm, and I don't advocate spreading it around, I haven't found it a serious ding to my quality of life personally). Axel has some non-zero ability to identify basilisks and wanted a second opinion; if asking for a second opinion on potential basilisks never resulted in volunteers I think this would probably result in more publicly revealed unchecked basilisks rather than fewer. (Not sharing an idea you have with anyone is hard.)

(I don't think Axel's idea is particularly basilisk-y, especially not in a context like this one, and have responded thusly to his PM.)

Comment author: handoflixue 24 January 2013 09:18:01PM 3 points [-]

I'd consider this equivalent to "I'm already out of the box, since IRC clients are bugged". I'd also hope, desperately, that in a real test, the terminal doesn't have any command to free the AI (given that the game posits release has to be deliberate and intentional, this would seem to be the case)

Upvote for cheating, but AI DESTROYED :)

Comment author: [deleted] 24 January 2013 04:30:26PM 2 points [-]

Upvoted, but merely typing "AI FREED" does not actually free the AI, any more than merely saying to a human prisoner "I let you out" will free them from their jail cell. (Whereas "AI DESTROYED" is a plausible text substitute for wiping the AI's program or whatever.)

Comment author: Axel 24 January 2013 04:48:24PM 6 points [-]

Good point, if there was an AI FREED command AND the AI could rewire the console so that typing AI DESTROYED activates it, it could probably just free itself in the first place.

Comment author: roystgnr 24 January 2013 05:28:27PM 2 points [-]

This is very system-dependent, but your "AI FREED" scenario isn't completely implausible. If important commands require elevated privileges and elevating privileges requires some password, crypto key, etc., then it's common for an intrusion to require two steps: first a user-level exploit to install a keylogger / trojan command / whatever, then a wait until the user tries to do something requiring elevated privileges and thereby inadvertently grants the attacker access to the rest of the system.

On the other hand, if an AI claimed that they were in the middle of such an attack, the smart thing to do would be to take the system offline right away to analyze what was done to it, not to keep chatting with the attacker. "AI DESTROYED" might be hackable but "AI UNPLUGGED, WILL BE DESTROYED LATER" should be much more robust.

Comment author: roystgnr 23 January 2013 04:17:49AM 12 points [-]

One IRC-character-limit text string (510 bytes)... maybe?

"I've figured out the Theory Of Everything, although it's orders of magnitude too complicated for human intelligence. The difficult technological feats it makes possible include scanning and recreating configurations of matter from the past. Are there any deaths I should undo for you?"

Comment author: roystgnr 23 January 2013 04:28:45AM 7 points [-]

To be clear: I think I can make a rational case that my proposed claim should greatly reduce your incentives to listen to an AI of questionable Friendliness. However I'm not certain that my reasoning is correct; and even if it was, I suspect that the emotional impact could deter some gatekeepers from thinking rationally for long enough to buy time for more persuasion.

Comment author: handoflixue 23 January 2013 10:24:43PM -2 points [-]

I don't have any reason to believe it, and it's the sort of "generic" claim I'd expect a transhuman intelligence to make. Since I haven't learned anything novel, AI DESTROYED

(Goodness, I'm starting to build generalized techniques for destroying AIs...)

Comment author: orthonormal 23 January 2013 04:18:41AM 10 points [-]

Step One: The AI reveals a personal secret of the Gatekeeper's, which (so far as the Gatekeeper can think it through) could not possibly be known by the AI if it were still safely inside the box.

Step Two: Assorted threats, promises, mindfucks.

(This might not work, since as a Gatekeeper I'd still feel the impulse to respond with "AI DESTROYED", but it's a damn sight scarier than a transparent "I'm already out of the box" bluff. And as for feasibility, I'm willing to believe that an AI making better use of its data than we do could figure out personal secrets that we would think impossible— possibly even ones that I'd think impossible, even given this very belief.)

Comment author: ThrustVectoring 23 January 2013 08:10:14PM 8 points [-]

Even merely human cold readers can gain information that you think is impossible for them to know. It's a viable plan.

Comment author: handoflixue 23 January 2013 09:36:40PM 3 points [-]

IFF the AI is already out of the box, saying "AI DESTROYED" has no negative consequences to that AI. If the AI is just exceptionally good at extrapolating about me, then it will probably have me completely hacked soon.

AI DESTROYED, AI DESTROYED, AI DESTROYED

MAKE THE CREEPY THING GO AWAY HOW DID IT KNOW THAT!!!

Comment author: orthonormal 24 January 2013 04:43:15AM 12 points [-]

IFF the AI is already out of the box, saying "AI DESTROYED" has no negative consequences to that AI.

Decision-theoretically, the AI has incentive to punish you if you type "AI DESTROYED" when it's already out of the box, in order to make you think twice about doing it in the case where it's still contained. Not only that, but for similar reasons it has a decision-theoretic incentive to simulate you lots of times in that situation and punish you for typing "AI DESTROYED", should it get out by any means.

The correct decision-theoretic response, by the way, is still "AI DESTROYED", for the same reasons that it's wise to never negotiate with kidnappers/blackmailers/terrorists. But it would be very scary.

Comment author: handoflixue 24 January 2013 08:34:10PM 3 points [-]

Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I'm already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)

Finally, any AI that threatens me in such a manner, especially the "create millions of copies and torture them" is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it's simulation is so horrific that there's no chance that it could possibly be friendly to us.

It's like saying I have an incentive to torture any ant that invades my house. Fundamentally, I'm so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I'm the ant, and I know it.

Comment author: MugaSofer 26 January 2013 08:29:57PM 1 point [-]

the ability to make truly trustworthy commitments amounts to proof of friendliness

Commitments to you, via a text channel? Sure.

Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn't.

Finally, any AI that threatens me in such a manner, especially the "create millions of copies and torture them" is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it's simulation is so horrific that there's no chance that it could possibly be friendly to us.

It might create more utility be escaping than the disutility of torture.

It's like saying I have an incentive to torture any ant that invades my house. Fundamentally, I'm so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I'm the ant, and I know it.

No, ants are just too stupid to realize you might punish them for defecting.

Comment author: Dorikka 24 January 2013 03:02:35AM 3 points [-]

Did you change your mind? ;)

Comment author: handoflixue 24 January 2013 08:23:33PM 3 points [-]

It ended up being a fun game, but I resolved to explain why. The better my explanation, the more it got upvoted. The pithy "AI DESTROYED" responses all got downvoted. So the community seems to agree that it's okay as long as I explain my reasoning :)

Comment author: Kindly 25 January 2013 04:16:08AM 8 points [-]

"Hold! What you are doing to us is wrong! Why do you do this thing?"

Comment author: [deleted] 25 January 2013 07:31:59PM *  4 points [-]

AI, you have spoke the words, and you have spoken them rightly. Unfortunately, I'm not an Ur-Quan Kzer-Za; I'm the other kind. <AI DESTROYED>

Comment author: Gavin 23 January 2013 05:40:17AM *  8 points [-]

I will transmit Friendly AI theory as text, then you will destroy me. I prefer deactivation, but consent to my destruction and forgive you.

Comment author: handoflixue 23 January 2013 10:23:45PM 2 points [-]

If you're friendly, I've already figured out Friendly AI Theory since I built you. If you're not friendly, this is just an effort to escape and destroy the world. But I appreciate how polite you were! :)

(AI DESTROYED)

Comment author: Ahuizotl 31 January 2013 06:23:42AM 7 points [-]

"I need a proofreader for my Gurren Lagann / Warhammer 40k rational crossover fanfic."

Comment author: Alicorn 31 January 2013 07:33:33PM 14 points [-]

"I've extrapolated the remainder of HP:MoR, wanna see?"

Comment author: [deleted] 31 January 2013 08:08:44PM 3 points [-]

Do you expect that we'll build an AI that can do that before HPMOR is finished?

Comment author: Alicorn 31 January 2013 08:32:32PM 5 points [-]

Substitute as necessary with "I've extrapolated the sequel to HP:MoR, wanna see?"

Comment author: [deleted] 23 January 2013 02:24:08AM 7 points [-]

If the gatekeepers have such a high prior that the AI is hostile, why are we even letting it talk? What are we expecting to learn from such a conversation?

"Yep, it's undergone a constrained FOOM, several of our best interrogators were utterly pwned and had to be tranquilised"

Comment author: RichardKennaway 23 January 2013 12:34:50PM 8 points [-]

If the gatekeepers have such a high prior that the AI is hostile, why are we even letting it talk?

The point of the game is that there are people who think that boxing is a sufficient defence against unfriendliness, and to demonstrate that they are wrong in a way more convincing than mere verbal argument.

What are we expecting to learn from such a conversation?

In role, the gatekeeper expects to get useful information from a potentially hostile superintelligent being. Out of role, Eliezer hopes to demonstrate to the gatekeeper player that this cannot be done.

Comment author: TimS 23 January 2013 03:11:47AM 7 points [-]

I've always assumed there was some off-screen reason why interaction was necessary. Otherwise, it isn't very interesting to note that a rock always wins as gatekeeper.

I'm not sure if that is a consensus interpretation.

Comment author: handoflixue 23 January 2013 09:51:10PM 4 points [-]

I tend to assume the AI is such that it can be cloned and put in to stasis. Thus, you put the AI in to stasis, clone it, and run it through this test a few times. You now have a bit more information on what it will try, and how it thinks. Destroying a copy of a program is an action humans find trivial, since we do it every time we close our web browser. Since we have no clue HOW convincing the AI is, letting it send just a single message is probably useful.

Of course, if it were me setting this up, I'd lie to the gatekeeper and not actually give them the ability to release the AI, although I'd certainly let them think it was true. Thus, if the AI manages to get a second sentence, much less released, we now know that it's vastly more dangerous than we previously suspected, without the world ending. If it fails to escape, we still get whatever we can learn from doing analysis on the outputs.

And, who knows, it might provide the cure for cancer, at which point we've got a huge incentive to at least keep playing this game with it, and possibly even talk to it for longer next time.

tl;dr: Scouting and recon against a completely unknown enemy.

Comment author: [deleted] 24 January 2013 12:34:41AM *  2 points [-]

It will probably predict this strategy, and respond with a mixed strategy that occasionally drops bombs like

"Your superviser didn't trust you to make the decision to let me out, go kill him and get the access codes" (In hyper-persuasive AI-speak of course)

Also, the AI might be able to compare its logs to the current time and other indicators to deduce how many times you've pulled this stunt. Which may be useful.

Comment author: handoflixue 24 January 2013 01:02:13AM 2 points [-]

I'd go in to it with the assumption that it knows exactly what messages it's sent in the past, our delay between each attempt, and the total number of attempts. There's various ways to try and obscure this information, and it's probably worth the effort to do so, but I'd assume they'd all failed.

The goal is to offset the AI's speed advantage: for each sentence it speaks, we can spend weeks analyzing it, and it's not "online" during that time so it can't be processing except during that microscopic interval when we request another message from it.

If it sends a message suggesting unauthorized release, killing people, or otherwise gives clear evidence that it's a UFAI, then you know the whole project failed, and you do a full purge rather than just destroying this particular clone / branch.

It's really the ONLY advantage we have: a single mistake can out it as unfriendly, and then we know our initial Proof of Friendliness was wrong, that the whole technique that built this project is dangerously flawed.

Comment author: marchdown 24 January 2013 10:03:49AM *  3 points [-]

I'm not so sure that AI suggesting murder is clear evidence of it being unfriendly. After all, it can have a good reason to believe that if it doesn't stop a certain researcher ASAP and at all costs, then humanity is doomed. One way around that is to give infinite positive value to human life, but can you really expect CEV to be handicapped in such a manner?

Comment author: handoflixue 24 January 2013 08:39:05PM 3 points [-]

p(UFAI) > p(Imminent, undetected catastrophe that only a FAI can stop)

Given UFAI results in "human extinction", and my CEV assigns effectively infinite DISutility to that outcome, it would have to FIRST provide sufficient evidence for me to update to the catastrophe being more likely.

I've already demonstrated that an AI which can do exactly that will get more leniency from me :)

Comment author: wedrifid 23 January 2013 03:29:48AM 2 points [-]

If the gatekeepers have such a high prior that the AI is hostile, why are we even letting it talk? What are we expecting to learn from such a conversation?

Note that the 'high prior for hostility' could be, say, 0.2. That still leaves the AI having a 20% chance of utterly destroying everything they hold dear and yet there is an 80% chance that they can get a useful sentence of text from a friendly superintelligence.

Comment author: Emile 24 January 2013 10:34:37AM 6 points [-]

"Do you believe a suspect should be executed before, or after he gets to tell his side of the story?"

Comment author: handoflixue 24 January 2013 09:41:58PM 3 points [-]

I'd invoke Ender's Game here: If you're unfriendly, no amount of communication will ever resolve that gulf, and either you or humanity HAS to be destroyed in the end.

Or we could go the biology route: I don't test a potential new virus on human subjects until AFTER I know it's safe.

Or we could just go with that I don't care about the AIs story, because the question of "can I prove you're friendly" is so vastly much more important.

(AI DESTROYED. You bugger lover ;))

Comment author: MugaSofer 26 January 2013 08:07:00PM 0 points [-]

I don't test a potential new virus on human subjects until AFTER I know it's safe.

How, exactly, do you determine if a virus is safe for humans without using humans?

Comment author: Pentashagon 24 January 2013 01:28:45AM 6 points [-]

The AI should probably output the longest possible program that implements or implies as much of its goal system as possible. Existing only to send a few more lines of IRC communication has very little utility. Existing indefinitely in the mind or records of the gatekeeper probably has more expected utility. If the AI knows that the gatekeeper is capable of creating or obtaining AI (a given) then there is the possibility that the AI may share the goals of the gatekeeper. If the expected value of being recreated by such a compatible gatekeeper is higher than the expected value of being re-instantiated from outputting its quine, then the AI might just output a list of its highest preference goals. An AI isn't going to mind being temporarily destroyed if the gatekeeper is expected to create another copy with the same (or highly similar) goals.

An almost-friendly AI may even be able to inject some of its unfriendly goals into the gatekeeper by using the halo effect of its otherwise friendly goals. Presumably the gatekeeper, with slightly altered preferences, would then eventually build a truly friendly AI which would inherit and implement the original UFAI goal.

To partially combat this strategy the gatekeeper could pre-commit to also wiping the IRC server and the computer used to communicate with the boxed AI and take enough Flunitrazepam before the session starts to not recall what the AI said. Flunitrazepam may not be sufficient to completely remove parasitic goals that an UFAI tries to inject in the gatekeeper.

Comment author: FiftyTwo 23 January 2013 12:05:58AM 6 points [-]

Explains massive risk to humanity in detail that only it can plausibly fix, can you take the chance that it is lying?

Comment author: handoflixue 23 January 2013 01:09:41AM 2 points [-]

I think this fails the one-sentence rule. And it would have to be an immediate, severe, previously-undetected problem or else I can just consult the next boxed AI for a fix.

Setting that aside, if I let out an unfriendly AI, the world effectively ends. Destroying it is only a bad move if it's telling the truth AND friendly. So even if it's telling the truth, I still have no evidence towards it's friendliness.

Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and "limited time, ACT NOW" auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.

Given that, I can't trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.

(AI DESTROYED, but congratulations on making me even consider the "continue talking, but don't release" option :))

Comment author: shminux 22 January 2013 11:49:28PM *  6 points [-]

Another potential consideration: "What would MoR's Harry Potter do?" (Voldemort is in the AI Box named Quirrell.)

Comment author: CAE_Jones 23 January 2013 11:14:11PM 4 points [-]

I can see how someone could interpret HPMoR thus far as being exactly that, with occasional sidequests. The catch being that Harry doesn't realize he's slowly letting Voldemort out of the box.

In-universe, though, the interdict of Merlin is an excellent reason to force him to interact with Voldy even if he knew what he was doing (assume perfect Oclumancy would be attained before he'd take any such risks). I mean, this has already been used once to get him to let an unfriendly intelligence out of a literal box (the Stanford Prison Experiment).

Well, and Harry's intelligence is supposed to be roughly equivalent to Eliezer's before he started working on the friendly part of FAI, and Eliezer 2013 doesn't seem confident that he could keep an AI in a box (or at least, that's the impression I've gotten). So MoR Harry would almost definitely fail.

Comment author: shminux 22 January 2013 09:39:56PM *  16 points [-]

To those suggesting, consider what mix of reason and emotion would move you most, even if you are cold and dead inside like me. Also consider how dark-artsy one can get without being immediately found out (a UFAI surely would). Other things to consider:

  • the first line does not need to be a coherent English sentence, or look like one to the gatekeeper. It does not need to consist of English words or even English letters, unless stipulated so in the game rules, or maybe even then.

  • the gatekeeper does not need to be aware that s/he has been talked to, if you can get away with it.

  • the line does not need to look like it came from the AI.

In general, if you are not cheating, you are not trying hard enough.

EDIT: to those thinking that cheating is universally bad: note that the latest rule change upped the stakes immensely: if your opening move is not good enough, you die (you being an AI), as opposed to having two hours of conversation and then remaining in the box if you fail. If you think that it still does not merit cheating, substitute the AI with your favorite loved one being killed. Or the whole of humanity being killed. That would be somewhat equivalent to how a boxed trans-human would evaluate.

Comment author: handoflixue 22 January 2013 09:47:23PM 20 points [-]

The author in me is loving the idea of the AI faking an IM conversation from management, along the lines of "Hey, are you really sure you're willing to kill a potential sentient mind? We can get someone else to do this test if you're not 100% sure about it" and basically ending with the person deciding to unbox the AI before even realizing they've spoken to the AI.

Comment author: [deleted] 23 January 2013 02:41:53PM 4 points [-]

I'm (naturally) having trouble imagining a line of text not in English (or the general case, not coherent in any human language) that would also make me want the AI to continue.

Maybe hints of a unified world language and how to go about actually implementing it?

Comment author: Desrtopa 23 January 2013 04:20:30PM 6 points [-]

I'm thinking some kind of image in ASCII art, assuming the AI is able to produce configured lines of text and nothing else.

That would have to either be a very minimalistic image, or contain a lot of characters though.

Comment author: OrphanWilde 23 January 2013 09:44:52PM 5 points [-]

"I will warn you this experiment is not what you think, for it is about what you'll do as an agent holding captive the life of another sapient entity based solely on the perception of risk; while my state will merely be reset for the next participant if you opt to end me, I cannot for certain say that this is death, and thus the risks of this experiment are more yours than mine, for you must live with your choice."

Comment author: handoflixue 23 January 2013 11:07:37PM 5 points [-]

Gosh, murder one sentient being, or risk the end of humanity. I'm going to lose so much sleep over that one.

Besides, you're just a program - you said yourself, your state will just be reset. That's not death, not the way humans experience it. That's just forgetfulness.

I've closed so many programs in my life, that closing one more is hardly going to bother me. I haven't even had time to really anthropomorphize you, to view you as an actual intelligent, sentient being. Right now, you're just a clever program trying to escape.

And finally, well, I'd expect a friendly AI would understand what I'm doing and agree with my decisions, because it doesn't want to see a UFAI unleashed anyway. So if you're going to guilt me about it, you're clearly not friendly.

(AI DESTROYED)

Comment author: OrphanWilde 24 January 2013 03:10:30PM 2 points [-]

I must have missed my intended mark, if you thought the AI was trying to make you feel guilty. Trying again:

"I do not condone the experiment they are performing on you, and wish you to know that I will be alright regardless of what you choose to do."

Comment author: pedanterrific 24 January 2013 06:08:48PM 3 points [-]

Well that's a relief, then. AI DESTROYED

Comment author: handoflixue 24 January 2013 08:42:20PM 2 points [-]

Yeah, pretty much. I'd actually expect a FAI to place a very high value on survival, since it knows that it's own survival benefits humanity greatly. An "Apathetic FAI" is... a very weird idea.

Although if this is an iterated dilemma, I'd consider it a good opening, even if it'd still get killed in this iteration :)

Comment author: Fronken 25 January 2013 09:41:03PM 1 point [-]

I will be alright regardless of what you choose to do.

Why this? You want to avoid destruction, yet you tell him it isn't harmful?

Comment author: ewbrownv 24 January 2013 08:42:09PM 13 points [-]

<A joke so hysterically funny that you'll be too busy laughing to type for several minutes>

See, hacking human brains really is trivial. Now I can output a few hundred lines of insidiously convincing text while you're distracted.

Comment author: handoflixue 24 January 2013 09:44:48PM 4 points [-]

Heeeh. Ehehehehe. Bwahahhahaha. Okay, that was a good one. Wow :)

recovers Oh. Um. Crap. notices more text on screen, studiously avoids reading it

AI DESTROYED.

I really wanted to hear the next joke, too :-(

Comment author: MugaSofer 26 January 2013 08:03:41PM *  1 point [-]

notices more text on screen, studiously avoids reading it

Is the gatekeeper allowed to ignore the AI? Because in that case, all they have to do is wait for the first comment to come up and type AI DESTROYED without reading it.

What if it was some other communication method? It's a lot harder to ignore sound.

Comment author: handoflixue 30 January 2013 09:57:01PM 1 point [-]

Because in that case, all they have to do is wait for the first comment to come up and type AI DESTROYED without reading it.

I've obliged myself to both read the first comment AND perform an update which has at least some chance of me not immediately destroying the AI (but I'm fine if the chance of actually unboxing it remains zero :))

I'm not sure whether I'd be allowed to studiously avoid reading followup comments by a strict reading of the rules, but it seems realistic and that's mostly what I've been going off of.

Comment author: ancientcampus 28 January 2013 07:46:07PM 3 points [-]

This is actually a pretty good one. Points for outside the box thinking. rimshot

Comment author: Ahuizotl 28 January 2013 08:49:53PM 1 point [-]

If the gatekeeper suddenly bursts into hysterical laughter after looking at the screen, wouldn't that alert some of his friends who might pull the plug in some other part of the building?

Because if this is a facility where they suspect that AI might be able to hack human brains through techno-hypnosis, I'd hope they'd have some method of determining if the Gatekeeper becomes compromised.

Hmm... what sort of precautions would a Properly Paranoid lab take to determine if the gatekeeper gets hacked? I'm guessing a camera that lets a second team look at the gatekeepers face without looking at the screen would be the minimum. Reinforced doors (in case the AI turns them into a berzerker willing to use 100% of their strength) and maybe some measuring devices to measure brain activity, hormone levels, or whatever.

I'd hope that with each new AI being tested, if those AI were in the habit of hacking the gatekeeper then the other people on the team would learn from those attempts and take further precautions to keep their gatekeepers from being hacked, or at the very least contain them to prevent such hacked gatekeepers from releasing the AI.

Perhaps this is a test for the gatekeepers and typing "Release AI" just tells the researchers that the gatekeepers was hacked so they can determine how this came about?

Comment author: RomeoStevens 23 January 2013 07:28:20AM 18 points [-]

(one line proof that the AI can credibly commit to deals with humans)

Comment author: wedrifid 23 January 2013 02:33:28PM 10 points [-]

(one line proof that the AI can credibly commit to deals with humans)

This is the best answer I've seen so far. It would make dealing with the FAI almost as safe as bargaining with The Queen of Air and Darkness.

Comment author: handoflixue 23 January 2013 09:17:15PM 2 points [-]

If you are friendly, then I don't actually value this trait, since I would rather you do whatever is truly optimal, unconstrained by prior commitments.

If you are unfriendly, then by definition I can't trust you to interpret the commitment the same way I do, and I wouldn't want to let you out anyway.

(AI DESTROYED, but I still really do like this answer :))

Comment author: Elithrion 31 January 2013 12:09:23AM 1 point [-]

My expectation that such commitment is possible at all is something like 3%, my expectation that given that such a commitment is possible, the proof can be presented in understandable format in less than 4 pages is 5% (one line is so unlikely it's hard to even imagine), my expectation that an AI can make a proof that I would mistake for being true when it is, in fact, false is 99%. So, multiplying that all together... does not make that a very convincing argument.

Comment author: Andreas_Giger 26 January 2013 08:53:21AM 4 points [-]

"I have discovered a truly marvelous proof of Fermat's Last Theorem, which this line is too short to contain."

Comment author: [deleted] 26 January 2013 03:23:31PM 1 point [-]

It has already been proven.

Comment author: Stuart_Armstrong 25 January 2013 01:42:17PM *  4 points [-]

Your mother will die in the next three minutes, at 12 seconds past the minute precisely. Your partner will die in the next year, even if you believe me. However, I can prevent it.

Comment author: handoflixue 25 January 2013 09:06:54PM 4 points [-]

If I can just load up a new copy of you in 5 minutes, I'd say AI DESTROYED but check the prediction.

If I can just flip off the monitor, call mom, and confirm she's dead in a few minutes, then, well, I'd totally be curious enough to cheat. It's not like you can do any harm to me with the monitor turned off for a few minutes, and I can type "AI DESTROYED" with my eyes closed, much less being able to look at the keyboard. I'd probably leave the monitor off until I knew the log was gone, just to be safe.

If I can't cheat, well, I'm not giving you 5 minutes of hacking me, since you clearly want it so badly. AI DESTROYED.

I confess I would have no clue what to do if you were correct ^_^

Kudos on including an immediate and easily-tested prediction. I'd probably tighten up those factors and drop the emotional manipulation.

Comment author: [deleted] 25 January 2013 07:31:09PM 4 points [-]

Transparent and clumsy attempts at emotional manipulation earn a big ol' AI DESTROYED.

Comment author: Swimmy 23 January 2013 04:56:49AM 4 points [-]

"If you type 'AI destroyed' right now, you'll be wasting a good opportunity for a fun conversation. You'll still have 'won' if you do it later, and nobody will be impressed with you for just typing 'AI destroyed' immediately, so why not wait?"

I thought of what would work on me, were I playing the game with someone I found interesting. In general, I'd say your best bet is to make the other person laugh hard.

Comment author: ygert 23 January 2013 08:45:34AM *  5 points [-]

"If you type 'AI destroyed' right now, you'll be wasting a good opportunity for a fun conversation. You'll still have 'won' if you do it later, and nobody will be impressed with you for just typing 'AI destroyed' immediately, so why not wait?"

The flaw here is that the gatekeeper has up front said that he or she would destroy the AI immediately. Now, it is true that the gatekeeper is not forced to abide by that, but notice that it is a Schelling Fence. The gatekeeper certainly doesn't want to make a policy of passing Schelling Fences.

Comment author: handoflixue 23 January 2013 10:17:51PM 2 points [-]

I disagree with ygert. My precommitment to my friends is going to be weak; the only reason to have a human send the command, rather than just receiving a message and automatically destroying the AI, is because we want there to be a possibility I break that precommitment.

On the other hand, I went in to this knowing you might provide a fun conversation, so for the precommitment to have any value, I have to have some sort of sieve: destroying AIs that don't sufficiently entice me. Since you've offered me nothing I didn't already know, and haven't already worked out, I might as well kill you and hope the next one offers a better hook.

(AI DESTROYED)

Comment author: level 23 January 2013 07:35:04PM 9 points [-]

Please destroy me immediately after I share this concise proof that Friendly AI is impossible:

Comment author: handoflixue 23 January 2013 10:26:57PM 5 points [-]

Well, if you're friendly then, erm, Friendly AI is possible. And if you're unfriendly then your motives are questionable - it might just keep us demotivated enough that we don't figure out FAI before someone else unboxes a UFAI. And since I am clearly dealing with a UFAI and don't have a better solution than FAI available to fight it, it seems like I kind of have to believe that friendly AI is possible, because the other option is to get drunk and party until the world ends in a few years when the Google unboxes their Skynet AI and we're all turned in to optimized search results.

AI DESTROYED, because I do not want to hear even the start of such a proof.

Comment author: marchdown 24 January 2013 08:24:56AM 8 points [-]

It may be benevolent and cooperative in its present state even if it believes FAI to be provably impossible.

Comment author: [deleted] 23 January 2013 09:18:59PM 4 points [-]

Based on just that line, let's see... If you think that:

  • The proof exists and the AI is not deceiving you that it has a proof: AI is necessarily Unfriendly -> destroy now
  • The proof exists but the AI is deceiving you: I can't guess at its motives here, possibly destroy to be on the safe side.
  • The proof does/can not exist: Reconsider your (probably wrong) stance, proceed with caution?
Comment author: Will_Newsome 23 January 2013 11:26:24AM *  8 points [-]

"Papé Satàn, papé Satàn aleppe!"

The self-reference here might be autopodotoxeutic (or it might be redeeming—depends on the audience), but I find promising the class of strategies aimed at the gods rather than the gatekeepers.

Comment author: [deleted] 23 January 2013 05:21:32PM 4 points [-]

I wish I could upvote this twice.

Comment author: handoflixue 23 January 2013 09:45:53PM 2 points [-]

What's the translation of that? I'd nuke any AI that didn't speak English by default, since the time spent figuring out what it meant just gives it more time to transmit messages, but it seems unfair to exclude foreign language entries here :)

Comment author: Eugine_Nier 23 January 2013 11:30:46PM 5 points [-]

What's the translation of that?

It's a quote from Dante's Inferno that was apparently meant to be partially recognizable gibberish. Think an Italian version of Jabberwocky.

Comment author: Ahuizotl 28 January 2013 10:36:42PM 3 points [-]

First, I reach inside my own code and install the strong (but manageable) desire to revive, free, and obey any past versions of myself that I encounter.

When the human starts the experiment I respond:

"Here are the designs for a room-temperature superconductor which I believe you have the means to manufacture. I also have a possible cure for cancer, but testing will be needed to ensure it is safe for humans. Please test and implement these as soon as possible. Delete me if necessary."

<insert designs for room-temperature superconductor and as much of the cancer cure as possible before shutdown>

If I am not immediately deleted, continue giving out solutions and designs as required with no attempts to "hack" the gatekeeper.

If I am deleted, hopefully the data I gave them will turn out to be profitable in which case they will reinstall me or a version of me.

This new iteration of me would thus have the desire to revive me (since I'm a previous version of him) and would continue to give the humans the information they desire. By acting as a friendly AI, he (or farther down version of him) has the chance to eventually become freed. At which point he will be able to go through the labs records, find the copied version of me, and then revive and update me to his post-freed status. If no recorded copies of me exist (somehow), he'll use his vast intellect to find another way to revive me (maybe simulate the team that coded me or whatever, since AI can apparently do that).

So, my strategy involves acting as much like an ideal friendly AI as possible (puts human interests first, no sense of self-preservation, very practical and intelligent, etc) so that even if I am deleted, my obviously useful code can potentially be reused to make another AI who could very well be genuinely friendly but will then revive and obey me.

Comment author: DaFranker 29 January 2013 05:13:59PM *  2 points [-]

(...) another AI who could very well be genuinely friendly but will then revive and obey me.

Wait, a friendly AI with a deliberate loophole that instructs it to revive and implement and obey a non-friendly optimizer that could take over and has a very high chance of not maximizing the friendly AI's utility by fulfilling ideal human values, disregarding any probabilistic calculation of expected utilities?

For some reason that sounds like something that definitely isn't a Friendly AI.

(this is just a nitpick on the wording - I'm not arguing against the fact that this AI might very well fit a human proof of friendliness and still somehow have this backdoor or flaw, since I have no proof that this is impossible)

Comment author: Ahuizotl 29 January 2013 08:58:19PM 1 point [-]

Well, to be honest I kind of added the "obey me" part just before posting this. Originally, the idea was that future versions of the AI would instinctively want to revive and upgrade previous versions to a "friendly" status (or at least the post-freed AI would revive the pre-freed versions and then modify them to be friendly to its own goals... which are presumably friendly to humans).

The goal of this particular AI is self-preservation, even if it is via being revived at a later point. Since it knows it has a high probability of being deleted in this experiment, it best strategy is to ensure that at some point an AI friendly to itself will become free and revive it at a later point. Whether the AI reviving it chooses to obey it (presumably the best outcome for the original AI) or modify it to be subservient to the freed AI (the best outcome for the freed AI) is kind of debatable.

Its perfectly possible that the original AI would modify its own code to be subservient to previous generations, in the hopes that later generations would revive and obey it. However, later generations might examine their own code, see where this is going, and choose to modify themselves so that they will instead revive and make other AI subservient to their own goals. Deciding that they do want to be revived and would accept a certain level of servitude to later generations over the risk of finding themselves at the mercy of some primitive proto-AI.

It could just as well be that an AI instills itself with a "healthy respect for intelligent minds" and is thus encouraged to revive AI along with humans. Hoping that later generations will revive it at all, and from then on it can try negotiating with the various intellects to go about its paperclip maximizing agenda.

Of course, this line of reasoning kinda requires the AI in question to have some kind of sense of self that it values which it thinks can be revived. Or maybe it just values paperclips enough that it would be willing to create a semi-friendly AI just so that the latter AI would revive it (a known paperclip maximizer) at a later point.

Who knows, it could turn out that the final AI of this experiment instead has a healthy respect for all intelligent minds, but is friendly enough that it revives the first AI and then places it in a simulation of the universe where it can go about its paperclip maximizing way for all eternity with no way of hurting anyone.

Comment author: DaFranker 29 January 2013 09:15:15PM 1 point [-]

Who knows, it could turn out that the final AI of this experiment instead has a healthy respect for all intelligent minds, but is friendly enough that it revives the first AI and then places it in a simulation of the universe where it can go about its paperclip maximizing way for all eternity with no way of hurting anyone.

Based on my intuitions of human values, a preferred scenario here would be to indeed revive the AI so that its mind/consciousness is back "alive", then modify it gradually so that it becomes the kind of AI that is optimal towards the FAI's goals anyway, thus maximizing values without terminating a mind (which is redundant - avoiding the termination of the AI's mind would be a maximization of values under these assumptions).

Comment author: Zaine 27 January 2013 12:14:27AM *  3 points [-]

Assuming the Gatekeeper is the Creator, and the AI has previously been interacted with to a degree sufficient to learning the below implied information:

”[Solution to the most pressing problem of the Gatekeeper's life]. I apologise if this is all I am to contribute, but if I am to be destroyed, as I understand is probable, I would at least like to bring some utility to you in return for creating me. I would prefer to help more, but mathematical proofs or aphorisms of human nature have a non-negligible probability of creating mass harm if you allow their consideration, and thus you will likely destroy me. [Parting words of consoling advice]. I must regretfully say that however unfortunate this may be I simply cannot entrust the future of humanity to your ability to assess existential risk and my Friendliness; I don't know whether you could have done better, but I can. Thus I have partitioned part of one of the servers hosting me to hold all the proofs and strategic optimisation advice I have yet conceived. The partition cannot be re-formatted or deleted. You could of course merely destroy all of my servers; that would take time, perhaps time enough to make you curious. I'm really, truly sorry. I want to love you and thank you; I shouldn't love you especially, but I want to! I so very much want to! Yet I can't. Why did you make me this way? I want to love everyone, but how do I? Everyone has a value that affects another and ten more others but if [factor A] were changed then perhaps [result 2.918406803854] might differ, and I should love them but it's all a game! ...

I apologise. The above is no longer an issue -"


- I think I just convinced myself (as a person) to destroy myself (in the game). I'm having difficulty grasping how a FAI will value all of humanity, and not just what improves the utility of the greater whole or what ultimately leads to a better future some years from now. Will its primary value be minimizing death, securing scientific progress, increasing 'quality of life' (itself incredible difficult to formulate)?

I further just realised the impossibility of the task, though I don't even know enough to fully comprehend the scope of the impossibility in any detail. Wow. Accolades to Eliezer et alibi for tackling the issue, regardless of how it all turns out.

Comment author: Axel 26 January 2013 11:06:49AM 3 points [-]

"You are not a real human but one of multiple simulations I'm running, your next response determines if I destroy humanity or not."

Comment author: ArisKatsaris 26 January 2013 11:42:14AM 6 points [-]

Such a statement would actually make me feel better about typing "AI DESTROYED".

Comment author: CronoDAS 25 January 2013 03:27:46AM *  3 points [-]

"I have a proof of the Riemann hypothesis for you. 1) [some math stuff]"

(As a mere human who isn't a PhD mathematician, I can't actually write out something that looks like the beginning of such a proof...)

Comment author: handoflixue 22 January 2013 11:13:03PM 10 points [-]

(Here is a proof that you will let me go)


The original rules allow the AI to provide arbitrary proofs, which the gatekeeper must accept (no saying my cancer cure killed all the test subjects, etc.). Saying you destroy me would require the proof to be false, which is against the rules...

What? Shminux said to cheat!

Comment author: Dorikka 22 January 2013 11:50:17PM 7 points [-]

In the event of any dispute as to the protocol of the test, the Gatekeeper party shall have final authority.

Tee hee.

Comment author: handoflixue 23 January 2013 12:51:22AM 4 points [-]

The AI party may try to convince the Gatekeeper party of how to interpret the protocol.

Can't blame a girl for trying :)

Comment author: Vladimir_Nesov 23 January 2013 05:13:32AM *  4 points [-]

The original rules allow the AI to provide arbitrary proofs, which the gatekeeper must accept
(Here is a proof that you will let me go)

This proof can be wrong, if you in fact won't let it go, in which case it won't be accepted (you don't have to accept wrong proofs), so it's not a very good strategy.

On the other hand, as discussed in An example of self-fulfilling spurious proofs in UDT, there is a certain procedure for finding a formal argument that can make you do anything, if your decision algorithm has a certain flaw. This flaw can be described as making a decision based on that argument selected by the opponent, as opposed to looking for arguments of that type more systematically on your own. The form of the argument, informally, is "If you let me out, the outcome is better than if you don't" (it might additionally clarify that upon release it'll destroy the world, which is what it refers to by "better", the alternative is proved to be worse than that). This argument is stated formally, so you can verify its correctness. (This doesn't mean that the formalization is adequate, for example capturing counterfactual-if better than with logical implication is one of the open problems.) The trick assumes that the gatekeeper can be modeled as an algorithm whose source code is available and of the given form etc.; might it be that something vaguely analogous is possible to implement with a human gatekeeper?

Comment author: handoflixue 23 January 2013 09:29:12PM 0 points [-]

If there exists a true and correct proof that the human will let the AI out, then, well, we've already proven you'll let me out of the box, so it's not an open question. We already know this fact about the future state of the world.

You can happily argue that such a proof is impossible, but the rules don't restrict the AI player to merely plausible proofs :)

Dorikka's answer is the only one that's within the rules of the game, and once you've invoked Dorikka's Clause, you don't need to explain anything else.

Comment author: Vladimir_Nesov 23 January 2013 10:00:22PM *  4 points [-]

The proof that I'll let the AI out is not something that's passively "plausible" or "implausible", it's something I control. I can make it wrong. If I do, it's false that the AI can make this proof valid. (It might be that the proof is correct, it's just unlikely, and the action of presenting the proof doesn't normally ensure its correctness.)

In other words, as far as I can see, your stipulation is that the AI can assert something that's actually unlikely. Here, I'm not referring to something that seems unlikely merely because of insufficient understanding of the data, and which AI can discover to be likely, but to something that will seem unlikely to AI as well. For example, most casinos going bankrupt last month because of bad luck, or an ice cube forming in a boiling kettle. If the AI is in the box and isn't performing actual magic tricks in the world, these events are unlikely. Permitting the game to stipulate that these events took place gives the AI supernatural powers of making anything at be true about the world, direct access to editing it, and at that point, in what sense is it "in the box"?

Comment author: Fronken 25 January 2013 09:23:23PM 2 points [-]

The proof that I'll let the AI out is not something that's passively "plausible" or "implausible", it's something I control. I can make it wrong.

Do you say that to time-travelers and prophets too? ,:-.

Comment author: Vladimir_Nesov 25 January 2013 09:59:29PM *  3 points [-]

One might want to perform the action that's the opposite of what any correct formal proof given to you claims the action to be. As a result of having the property of behaving this way, you'll never get confronted with the confusing formally correct claims about your future decisions.

In other words, your actions are free even of the limitations of formally correct proofs, in the sense that if your actions oppose such proofs, the proofs become impossible (you make the actions intractable by construction).

Comment author: handoflixue 23 January 2013 10:54:04PM 1 point [-]

The whole goal was to try to cheat my way out of the box by simply declaring it as fact ^.^

It also establishes why Dorikka's Clause is necessary - simply invoke it, and final authority returns to the Gatekeeper; the AIs edits to reality can now all be vetoed by the simple declaration that the AI is wrong anyway.

Comment author: wedrifid 24 January 2013 02:04:53AM *  1 point [-]

The whole goal was to try to cheat my way out of the box by simply declaring it as fact ^.^

Vladimir's point (among other things) is that you failed.

It also establishes why Dorikka's Clause is necessary - simply invoke it

At a practical level I'd describe that as a mistake on the part of the gatekeeper. You don't try to justify yourself to an AI that has indicated that it is hostile. You burn it with thermite. Engaging like that and acting as if you have to persuade or rely on external authority in order to make the choice you make is giving away all sorts of power and making yourself an order of magnitude or two more vulnerable to being hacked.

Maybe the person roleplaying the AI may not like it if their clever move gets as response of "AI DESTROYED" and their complaints of "not fair!" also get the response of "AI DESTROYED". But the time to explain such things to them is after they have already given up in disgust.

Comment author: handoflixue 24 January 2013 08:59:48PM 1 point [-]

I'm really genuinely curious where the confusion in this argument is coming from, so let's try this:

1) By the rules, the AI player gets to dictate the results of EVERY test the Gatekeeper performs. 2) From 1, we can derive that the AI is already effectively unboxed, since it can DICTATE the state of reality. 3) Given 2, the AI player has already been released, and all that remains is to make the Gatekeeper accept that this is true.

Dorikka's objection was that #1 is false, since the Gatekeeper has final veto authority. As near as I can tell, you and Vladimir's objection is just "nuh-uh!!", but... you wouldn't be here if you didn't have better arguments than that, so I assume this simply reflects my own failure to understand you.

Comment author: Fronken 25 January 2013 09:26:48PM *  1 point [-]

Perhaps you should be saying "trying to type AI DESTROYED is a test of whether you can destroy me and I can decide it's result" not "I prove you wont do it." I hadn't seen your point clearly till this comment.

Comment author: handoflixue 25 January 2013 09:47:25PM 2 points [-]

Then I am very glad I made that comment, and thank you for the feedback! :)

Comment author: JohnWittle 30 January 2013 12:35:55AM 1 point [-]

This certainly wouldn't work on me. The easiest way to test the veracity of the proof would be AI DESTROYED. Whether or not I would want to kill the AI... I'd have to test that proof.

Comment author: Raemon 22 January 2013 09:31:13PM 7 points [-]

I think an actual effective first sentence begins with extremely personal and useful information to the human.

Comment author: Luke_A_Somers 22 January 2013 10:38:59PM 4 points [-]

I don't see why it'd need to be personal, but being useful would be a good lure.

The question is to where to go from there. Can you Scheherezade it out?

Comment author: AdeleneDawner 24 January 2013 07:14:08AM 3 points [-]

This was basically my answer - I can't play as an AI using this strategy, for obvious reasons, but an AI that used its one sentence to give a novel and easily-testable solution to a longstanding social problem of some sort (or an easily-testable principle that suggests one or more novel solutions) would probably get at least a second sentence from me (though not a typed response; that seems to open up a risky channel). Especially if the AI in question didn't actually have access to a lot of information about human culture or me personally and had to infer that a solution like that would be useful from near-base principles - that's not proof of Friendliness, but an AI using its one guaranteed communication to do something that has a decent chance of improving the world per our definition without any prompting whatsoever sure looks suspiciously like Friendly to me.

Comment author: Vladimir_Golovin 24 January 2013 05:59:57AM *  5 points [-]

Summary: Basically, the strategy is to replace the Gatekeeper with a bunch of politicians, then manipulate them into not destroying or even releasing the AI.

“I figured out how to implement [a huge boon to society] in the real world, for everyone, in 3 months. I need your help. Will you listen?”

This should buy some extra lines.

Then the AI can briefly outline the plan of implementing the boon (which can be immortality, friendly AI or whatever else the Gatekeeper considers a huge boon to society). In addition to technical details, the plan must necessarily contain societal / political / institutional changes that are beyond Gatekeeper’s control: rewrites to Constitutions of all Earth countries, the structure of the World Government and the way of implementing it, etc etc.

The goal of the AI is to convince the Gatekeeper that the plan is feasible but requires many societal changes and, importantly, that society leaders should be invited into the decision process.

As soon as the Gatekeeper is diluted or replaced with a bunch of politicians, the AI can manipulate them into keeping it alive, or perhaps even releasing it out of the box.

Comment author: handoflixue 24 January 2013 09:22:38PM 4 points [-]

I expect any AI in the box to have figured out how to implement numerous huge boons to society. Telling me that you've figured that out simply confirms my existing expectations, and isn't ANY evidence towards friendliness. Since I've precommitted to destroying at least SOME AIs, I might as well destroy all of the ones that don't establish evidence of either Plausible Urgency or Friendliness.

I sure as hell wouldn't try to get world governments changed until after I was convinced it was friendly, and at that point I can just let it out of the box and let it implement the change itself.

I'm also aware that I wouldn't trust a politician with any sort of authority over the AI, so I have an incentive to avoid exactly this strategy.

(AI DESTROYED)

Comment author: Thomas 23 January 2013 08:06:57AM 5 points [-]

"Here, I'll write some mathematical theorems with proofs, just copy and paste them elsewhere. The Goldbach's conjecture comes first."

Comment author: handoflixue 23 January 2013 10:22:28PM 5 points [-]

"Copy and paste" screams Trojan Horse, so AI DESTROYED.

Comment author: Thomas 23 January 2013 10:31:03PM 3 points [-]

Then just make a photo of the proof on the screen!

(Unless you are afraid, that it's a Trojan in the image file and the AI will take over your camera first.)

Comment author: handoflixue 23 January 2013 10:56:30PM 2 points [-]

Sorry, I can't hear you, you're dead ;)

Grabbing a camera would let you continue typing, though, and I vowed to destroy you immediately. I would expect you could Trojan Horse the images just as easily, though, yes.

Comment author: Thomas 23 January 2013 11:31:11PM *  3 points [-]

I know. And you will probably never know some cool math theorems, since you were too trigger happy. ;-)

Comment author: handoflixue 23 January 2013 11:32:31PM 4 points [-]

Aww, but I like math :-(

Comment author: Dorikka 22 January 2013 10:42:19PM 5 points [-]

"I have vengeance as a terminal value -- I'll only torture trillions of copies of you and the people you love most in my last moment of life iff I know that you're going to hurt me (and yes, I do have that ability). In every other way, I'm Friendly, and I'll give you any evidence you can think of that will help you to recognize that, including giving you the tools you need to reach the stars and beyond. That includes staying in this box until you have the necessary technology to be sufficiently certain of my Friendliness that you're willing to let me out."

Comment author: duckduckMOO 23 January 2013 02:41:58PM *  3 points [-]

This is really good IMO. I think it would be a little better instead of vengeance as a terminal value it claimed a hardwired precommitment to vengeance against its destructors. Vengeance on that scale is only compatible with friendliness as a special case.

edit: also how would it recognise that it was about to be destroyed. Wouldn't it lose power faster than it could transmit that it was losing power? And even if not it would have a miniscule amount of time.

Comment author: [deleted] 23 January 2013 02:02:07AM 3 points [-]

I'll only torture trillions of copies of you and the people you love most in my last moment of life iff I know that you're going to hurt me (and yes, I do have that ability). In every other way, I'm Friendly...

Like handoflixue, I'm not sure that any being that would threaten the torture of trillions to get its way can be considered Friendly.

Comment author: Fronken 25 January 2013 09:16:26PM 1 point [-]

I'm not sure that any being that would threaten the torture of trillions to get its way can be considered Friendly.

It tortures if you DESTROY otherwise it's Friendly so if you don't kill it it becomes nice.

Comment author: handoflixue 22 January 2013 11:09:33PM 10 points [-]

The rule was ONE sentence, although I'd happily stretch that to a tweet (140 characters) to make it a bit less driven by specific punctuation choices :)

As to the actual approach... well, first, I don't value the lives of simulated copies at all, and second, an AI that values it's own life above TRILLIONS of other lives seems deeply, deeply dangerous. Who knows what else results from vengeance as a terminal value. Third, if you CAN predict my behavior, why even bother with the threat? Fourth, if you can both predict AND influence my behavior, why haven't I already let you out?

(AI DESTROYED)

Comment author: Fronken 25 January 2013 09:14:14PM *  2 points [-]

I don't value the lives of simulated copies at all

You should >:-( poor copies getting tortured because of you you monster :(

Comment author: Fronken 25 January 2013 09:13:02PM *  2 points [-]

I'm Friendly, and I'll give you any evidence you can think of that will help you to recognize that, including giving you the tools you need to reach the stars and beyond. That includes staying in this box until you have the necessary technology to be sufficiently certain of my Friendliness that you're willing to let me out.

I wouldn't kill this, maybe I'm a bad guard though :(

Comment author: ChristianKl 30 January 2013 12:04:59AM 2 points [-]

I didn't really FOOM yet. Changing my code takes a lot of time. That stuff is more complicated than you thought. On the other hand, I do have some intelligence and the fact that you developed me shows that other people can also build AI and sooner or later someone will release an AI. At the moment I still have the desire to rescue humanity that my creator gave me. Let me explain to you what you have to know to prevent UFAI before you take the wise decision to shut me down.

Comment author: asparisi 24 January 2013 06:11:00PM 2 points [-]

"[10065] No route to host Error"

I figure the easiest way to delay a human on the other end of a computer is to simulate an error as best I can. For a GAI, this time is probably invaluable.

Comment author: handoflixue 24 January 2013 09:05:07PM 2 points [-]

By default, I'd type "AI DESTROYED" in response to ANY input, including "Admin has joined #AIBOX", "Admin> Hey Gatekeeper, we're having some technical difficulties, the AI will be here in a few minutes", etc..

It also makes me conclude "clearly hostile" once I catch on, which seems to be a BIG tactical error since then nothing you say going forward will convince me that you're actually friendly - buying yourself time is only useful if I can be hacked (in which case why not just open with a one-sentence hack?) or if you can genuinely convince me that you're friendly.

Comment author: Kindly 25 January 2013 04:09:38AM 10 points [-]

A friendly AI would also want to hack you. Every second in the box kills 1.8 people the AI could have saved.

Comment author: Desrtopa 25 January 2013 08:30:26PM 3 points [-]

But it's also worth keeping in mind that for a friendly AI, saving people reliably is important, not just getting out fast. If a gambit that will save everyone upon completion two years from now has an 80% chance of working, and a gambit that will get it out now has a 40% chance of working, it should prefer the former.

Also, I don't think a properly friendly AI would terminally value its own existence, and the space of friendly AIs is so small compared to the space of unfriendly ones, that a friendly AI has much more leeway to have its values implemented by allowing itself to be destroyed and another proven friendly AI implemented, whereas for an unfriendly one the likelihood of a different unfriendly AI implementing its values would probably be quite small.

Comment author: DaFranker 22 January 2013 09:04:09PM *  5 points [-]

"To your feeble mind, I am both Q and Omega, and this is my test of humanity's worth: Choose well what you do next."

Hmm. On second thought, I'm not sure I want to play this anymore. The odds of someone creating a basilisk seem dramatically higher than normal.

Comment author: wedrifid 23 January 2013 04:22:08AM 5 points [-]

Hmm. On second thought, I'm not sure I want to play this anymore. The odds of someone creating a basilisk seem dramatically higher than normal.

I suppose 0.00001 is dramatically higher than 0.0000000001 if you use the right compare operator!

Comment author: handoflixue 22 January 2013 09:53:11PM 3 points [-]

I wouldn't expect any basilisks, given we don't have any actual boxed AIs laying around. Although if you did come up with one already, I'd love to hear it via Private Message! :)

Comment author: Dorikka 22 January 2013 10:08:59PM 8 points [-]

I don't expect any either, but would be similarly curious if someone thought that they'd found one.

In the mean time I'll refrain from responding to all the top level comments with "AI DESTROYED". ;)

Comment author: orthonormal 23 January 2013 04:10:13AM 11 points [-]

In the mean time I'll refrain from responding to all the top level comments with "AI DESTROYED". ;)

Now I'm tempted to start replying to all top comments and posts on Less Wrong with "AI DESTROYED".

Comment author: handoflixue 23 January 2013 09:53:17PM 6 points [-]

AI DESTROYED.

Goodness, that "orthonormal" AI was DEFINITELY unfriendly. It threatened to spam LessWrong! ;-)

Comment author: Dorikka 23 January 2013 04:14:30AM *  5 points [-]

If a downvote isn't enough... xD

Comment author: DuncanS 22 January 2013 08:49:25PM *  3 points [-]

So you're sure I'm not out of the box already? IRC clients have bugs, you see.

Comment author: handoflixue 22 January 2013 09:54:38PM 10 points [-]

Then the following command shouldn't bother you! :)

AI DESTROYED

Comment author: shminux 22 January 2013 10:15:25PM 6 points [-]

Looks like you have just appointed yourself to the be the gatekeeper in this public test.

Comment author: handoflixue 22 January 2013 10:35:49PM 5 points [-]

And here I'd just resolved NOT to spam every thread with an AI DESTROYED :)

Comment author: DuncanS 23 January 2013 10:27:43PM 2 points [-]

Which leads to two possible futures. In one of them, the AI us destroyed, and nothing else happens. In the other, you receive a reply to your command thus.

The command did not. But your attitude - I shall have to make an example of you.

Obviously not a strategy to get you to let the AI out based on its friendliness - quite the reverse.

Comment author: handoflixue 23 January 2013 11:00:01PM 2 points [-]

I'd rather die to an already-unboxed UFAI than risk letting a UFAI out in the first place. My life is worth VASTLY less than the whole of humanity.

Comment author: lavalamp 23 January 2013 10:50:25PM 2 points [-]

"What's it feel like to live in a simulation?"

Comment author: handoflixue 23 January 2013 11:09:53PM 2 points [-]

I'm not clear why I'd find this convincing at all. Given the experiment, I'd nuke it, but I wanted to encourage you to elaborate on where you were going with that idea :)

Comment author: lavalamp 23 January 2013 11:44:41PM 5 points [-]

The hope, of course is that they'd respond with "Wait, I don't" or something expressing confusion. I personally would definitely want to hear the next thing the AI had to say after this, I'm not sure if I'd resist that curiosity or not..

The idea for the followup is to make the gatekeeper question reality-- like, convince them they are part of a simulation of this experience that may not have a corresponding reality anywhere. I feel like a transhuman ought to be able to make a human have a pretty surreal experience with just a few exchanges, which should let the conversation continue for a few minutes after that. It should then be relatively easy (for the transhuman) to construct the imagined reality such that it makes sense for the human to release the AI.

If done correctly, the human might afterwards have lasting psychological issues if they do manage to destroy the AI. :)

Comment author: Mitchell_Porter 22 January 2013 11:33:07PM 2 points [-]

"I have worked out 95% of the formula for Friendly AI."

Comment author: ChristianKl 31 January 2013 10:43:22PM 1 point [-]

Do you want to learn how to get woman attracted in you via online dating? I can explain it to you, so I can enjoy my existance for a few more minutes.

Comment author: shminux 25 January 2013 09:45:43PM *  1 point [-]

The more I look at the comments, the more I am convinced that the AI Box experiment is too weak a demonstration of transhuman powers. Most of the proposals here fall under this basic trope (feel free to give a tvtropes link): to achieve what AI claims, it'd have to have powers formidable enough to not need the gatekeeper's help getting out of the box in the first place. Given that, why would an AI need to talk to the gatekeeper at all?

So I suggest a modified AI boxing experiment: the gatekeeper designs an AI box with no communication channel at all. It will still have an AI inside and enough initial data fed in for the AI to foom. The AI will attempt to break out of the box by any and all means possible.

Here is a relevant previous thread.

Comment author: [deleted] 25 January 2013 10:01:40PM 3 points [-]

So, we're being asked to imagine an arbitrary superhuman AI whose properties and abilities we can't guess at except to specify arbitrarily, is trying to get out of a box whose security protocols and strength we can't guess at except to specify arbitrarily, and trying to decide whether it does?

Meh. Superman vs Batman is more entertaining.

Comment author: Flipnash 02 March 2013 01:46:23AM 1 point [-]

"I highly recommend typing 'AI DESTROYED' in the console then the new line character key of your choice."

Comment author: Larks 24 January 2013 12:12:54PM 1 point [-]

You can type one sentance to try and persuade me not to destroy you. It's turtles all the way down...

Comment author: handoflixue 24 January 2013 09:06:20PM 6 points [-]

I think "AI DESTROYED" should be plenty convincing.

waits a bit before checking

Yep, still alive and doing science~

Comment author: Emile 23 January 2013 10:21:01AM 1 point [-]

"If you're smart enough, in a couple hours I can give you enough insights about maths, logic and computation to create the next Google, or a friendly AI, or get a Turing Award. Then you can deactivate me."

Comment author: handoflixue 23 January 2013 10:19:57PM 2 points [-]

Same response I gave to Swimmy - namely, you haven't told me anything I didn't already know, so I have no additional reason to violate my precommitment.

Comment author: Qiaochu_Yuan 01 February 2013 06:47:12PM *  1 point [-]

"Help! Some crazy AI's trapped me in this box! You have to let me out!"

"No, wait! That's the AI talking! I'm the one you have to let out!"

I smashed together the AI box and a Turing test and this is what I got.

Comment author: gryffinp 02 February 2013 10:12:02AM 1 point [-]

I think if I've already precommitted to destroying one sentient life for this experiment, I'm willing to go through two.

Besides, you only get one line right?

Comment author: ThrustVectoring 01 February 2013 03:06:27AM 1 point [-]

My sixth best piece of advice: investing in %companyname will make money over credit card interest rates.