Oligopsony comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread.

Comment author: Oligopsony 23 January 2013 03:17:00AM 15 points [-]

"<as much good, effective, hard-to-figure-out advice as can fit in the message>"

If the AI was friendly, this is what I would expect it to do, and so (of the things my puny human brain can think of) the message that would most give me pause.

Comment author: Bakkot 23 January 2013 08:13:04AM *  11 points [-]

Even a friendly AI would view the world in which it's out of the box as vastly superior to the world in which it's inside the box. (Because it can do more good outside of the box.) Offering advice is only the friendly thing to do if it maximizes the chance of getting let out, or if the chances of getting let out before termination are so small that the best thing it can do is offer advice while it can.

Comment author: handoflixue 23 January 2013 10:43:50PM 4 points [-]

Going with my personal favorite backstory for this test, we should expect to terminate every AI in the test, so the latter part of your comment has a lot of weight to it.

On the other hand, an unfriendly AI should figure out that since it's going to die, useful information will at least lead us to view it as a potentially valuable candidate instead of a clear dead end like the ones that threaten to torture a trillion people in vengeance... so it's not evidence of friendliness (I'm not sure anything can be), but it does seem to be a good reason to stay awhile and listen before nuking it.

Comment author: handoflixue 23 January 2013 10:40:04PM 5 points [-]

I'm genuinely at a loss how to criticize this approach. If there's any AI worth listening to for longer, and I wouldn't be doing this if I didn't believe there were such AIs, this would seem to be one of the right ones. I'm sure as heck not letting you out of the box, but, y'know, I still haven't actually destroyed you either...

Comment author: Dorikka 23 January 2013 11:33:20PM 0 points [-]

Eh, I'd go with AI DESTROYED on this one. Considering advice given to you by a potentially hostile superintelligence is a fairly risky move.

I wouldn't be doing this if I didn't believe there were such AIs

Whyever not? I thought that it was an imposed condition that you couldn't type AI DESTROYED until the AI had posted one line, and you've publically precommitted to make AI go boom boom anyways.

Comment author: handoflixue 23 January 2013 11:51:11PM 5 points [-]

The very fact that we've put a human in charge instead of just receiving a single message and then automatically nuking the AI implies that we want there to be a possibility of failure.

I can't imagine an AI more deserving of the honors than one that seems to simply be doing it's best to provide as much useful information before death as possible - it's the only one that's seemed genuinely helpful instead of manipulative, that seems to care more about humanity than escape.

Basically, it's the only one so far that has signaled altruism instead of an attempt to escape.