RichardKennaway comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (354)
If the gatekeepers have such a high prior that the AI is hostile, why are we even letting it talk? What are we expecting to learn from such a conversation?
"Yep, it's undergone a constrained FOOM, several of our best interrogators were utterly pwned and had to be tranquilised"
The point of the game is that there are people who think that boxing is a sufficient defence against unfriendliness, and to demonstrate that they are wrong in a way more convincing than mere verbal argument.
In role, the gatekeeper expects to get useful information from a potentially hostile superintelligent being. Out of role, Eliezer hopes to demonstrate to the gatekeeper player that this cannot be done.