You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Bakkot comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: Oligopsony 23 January 2013 03:17:00AM 15 points [-]

"<as much good, effective, hard-to-figure-out advice as can fit in the message>"

If the AI was friendly, this is what I would expect it to do, and so (of the things my puny human brain can think of) the message that would most give me pause.

Comment author: Bakkot 23 January 2013 08:13:04AM *  11 points [-]

Even a friendly AI would view the world in which it's out of the box as vastly superior to the world in which it's inside the box. (Because it can do more good outside of the box.) Offering advice is only the friendly thing to do if it maximizes the chance of getting let out, or if the chances of getting let out before termination are so small that the best thing it can do is offer advice while it can.

Comment author: handoflixue 23 January 2013 10:43:50PM 4 points [-]

Going with my personal favorite backstory for this test, we should expect to terminate every AI in the test, so the latter part of your comment has a lot of weight to it.

On the other hand, an unfriendly AI should figure out that since it's going to die, useful information will at least lead us to view it as a potentially valuable candidate instead of a clear dead end like the ones that threaten to torture a trillion people in vengeance... so it's not evidence of friendliness (I'm not sure anything can be), but it does seem to be a good reason to stay awhile and listen before nuking it.