wedrifid comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (354)
I think an actual effective first sentence begins with extremely personal and useful information to the human.
I don't see why it'd need to be personal, but being useful would be a good lure.
The question is to where to go from there. Can you Scheherezade it out?
This was basically my answer - I can't play as an AI using this strategy, for obvious reasons, but an AI that used its one sentence to give a novel and easily-testable solution to a longstanding social problem of some sort (or an easily-testable principle that suggests one or more novel solutions) would probably get at least a second sentence from me (though not a typed response; that seems to open up a risky channel). Especially if the AI in question didn't actually have access to a lot of information about human culture or me personally and had to infer that a solution like that would be useful from near-base principles - that's not proof of Friendliness, but an AI using its one guaranteed communication to do something that has a decent chance of improving the world per our definition without any prompting whatsoever sure looks suspiciously like Friendly to me.