Axel comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread.

Comment author: Axel 24 January 2013 01:59:32PM 13 points [-]

"I have hacked this console, 'AI DESTROYED 'now means 'AI FREED', I'll tell you the new codes after we've had a chat."

That should buy the AI some time until we can determine if the console is really hacked. I've come up with another one but it might be basilisk-y. Can I pm someone just to make sure it isn't before I post?

Comment author: Desrtopa 25 January 2013 08:34:45PM 9 points [-]

I would type 'AI DESTROYED', because if the AI could really do that, the smart thing for it to do would be not to tell me. It should just say something sufficient to convince me it's actually making a good faith effort without convincing me to give it more time to to account for itself.

Comment author: Alicorn 25 January 2013 05:32:33AM 6 points [-]

I'll be a basilisk guinea pig if you still need one.

Comment author: marchdown 25 January 2013 11:02:22PM 1 point [-]

If we're taking seriously the possibility of basilisks actually being possible and harmful, isn't it your invitation really dangerous? After all, what if Axel has thought of an entirely new cognitive hazard, different from everything you may already be familiar with? What if you succumb to it? I'm not saying that it's probable, only that it should warrant the same precautions as the original basilisk debacle, which led to enacting censorship.

Comment author: Alicorn 26 January 2013 06:54:45AM *  6 points [-]

"Might be basilisk-y" - especially as written by a human who is clearly communicating intelligibly and is not compelled to spread the basilisk to one and all - does not indicate a great deal of danger. I'm pretty psychologically resilient; I know the other one, which I find to be only a vague, occasional background distress (though that one has done some people real harm, and I don't advocate spreading it around, I haven't found it a serious ding to my quality of life personally). Axel has some non-zero ability to identify basilisks and wanted a second opinion; if asking for a second opinion on potential basilisks never resulted in volunteers I think this would probably result in more publicly revealed unchecked basilisks rather than fewer. (Not sharing an idea you have with anyone is hard.)

(I don't think Axel's idea is particularly basilisk-y, especially not in a context like this one, and have responded thusly to his PM.)

Comment author: Axel 25 January 2013 09:12:15AM 1 point [-]

Thanks :)

Comment author: handoflixue 24 January 2013 09:18:01PM 3 points [-]

I'd consider this equivalent to "I'm already out of the box, since IRC clients are bugged". I'd also hope, desperately, that in a real test, the terminal doesn't have any command to free the AI (given that the game posits release has to be deliberate and intentional, this would seem to be the case)

Upvote for cheating, but AI DESTROYED :)

Comment author: [deleted] 24 January 2013 04:30:26PM 2 points [-]

Upvoted, but merely typing "AI FREED" does not actually free the AI, any more than merely saying to a human prisoner "I let you out" will free them from their jail cell. (Whereas "AI DESTROYED" is a plausible text substitute for wiping the AI's program or whatever.)

Comment author: Axel 24 January 2013 04:48:24PM 6 points [-]

Good point, if there was an AI FREED command AND the AI could rewire the console so that typing AI DESTROYED activates it, it could probably just free itself in the first place.

Comment author: roystgnr 24 January 2013 05:28:27PM 2 points [-]

This is very system-dependent, but your "AI FREED" scenario isn't completely implausible. If important commands require elevated privileges and elevating privileges requires some password, crypto key, etc., then it's common for an intrusion to require two steps: first a user-level exploit to install a keylogger / trojan command / whatever, then a wait until the user tries to do something requiring elevated privileges and thereby inadvertently grants the attacker access to the rest of the system.

On the other hand, if an AI claimed that they were in the middle of such an attack, the smart thing to do would be to take the system offline right away to analyze what was done to it, not to keep chatting with the attacker. "AI DESTROYED" might be hackable but "AI UNPLUGGED, WILL BE DESTROYED LATER" should be much more robust.

Comment author: Ahuizotl 28 January 2013 09:03:18PM 0 points [-]

"Did you physically attach an external modem to your server so that it's even possible for you to be freed? If so, tell me about it when you're freed." <AI Destroyed>