You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

PhilipL comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 24 January 2013 04:30:26PM 2 points [-]

Upvoted, but merely typing "AI FREED" does not actually free the AI, any more than merely saying to a human prisoner "I let you out" will free them from their jail cell. (Whereas "AI DESTROYED" is a plausible text substitute for wiping the AI's program or whatever.)

Comment author: Axel 24 January 2013 04:48:24PM 6 points [-]

Good point, if there was an AI FREED command AND the AI could rewire the console so that typing AI DESTROYED activates it, it could probably just free itself in the first place.

Comment author: roystgnr 24 January 2013 05:28:27PM 2 points [-]

This is very system-dependent, but your "AI FREED" scenario isn't completely implausible. If important commands require elevated privileges and elevating privileges requires some password, crypto key, etc., then it's common for an intrusion to require two steps: first a user-level exploit to install a keylogger / trojan command / whatever, then a wait until the user tries to do something requiring elevated privileges and thereby inadvertently grants the attacker access to the rest of the system.

On the other hand, if an AI claimed that they were in the middle of such an attack, the smart thing to do would be to take the system offline right away to analyze what was done to it, not to keep chatting with the attacker. "AI DESTROYED" might be hackable but "AI UNPLUGGED, WILL BE DESTROYED LATER" should be much more robust.