pleeppleep comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (244)
Open logs is a pretty strong constraint on the AI. You'd have to restrict yourself to strategies that wouldn't make everyone you know hate you, prevent you from getting hired in the future, etc.
I can't imagine anything I could say that would make people I know hate me without specifically referring to their personal lives. What kind of talk do you have in mind?
Psychological torture.
Could you give me a hypothetical? I really can't imagine anything I could say that would be so terrible.
I'd prefer not to. If I successfully made my point, then I'd have posted exactly the kind of thing I said I wouldn't want to be known as being capable of posting.
A link to a movie clip might do.
Finding such a movie clip sounds extremely unpleasant and I would need more of an incentive to start trying. (Playing the AI in an AI box experiment also sounds extremely unpleasant for the same reason.)
I know it sounds like I'm avoiding having to justify my assertion here, and... that's because I totally am. I suspect on general principles that most successful strategies for getting out of the box involve saying horrible, horrible things, and I don't want to get much more specific than those general principles because I don't want to get too close to horrible, horrible things.
Like when you say "horrible, horrible things". What do you mean?
Driving a wedge between the gatekeeper and his or her loved ones? Threats? Exploiting any guilt or self-loathing the gatekeeper feels? Appealing to the gatekeeper's sense of obligation by twisting his or her interpretation of authority figures, objects of admiration, and internalized sense of honor? Asserting cynicism and general apathy towards the fate of mankind?
For all but the last one it seems like you'd need an in-depth knowledge of the gatekeeper's psyche and personal life.
Of course. How else would you know which horrible, horrible things to say? (I also have in mind things designed to get a more visceral reaction from the gatekeeper, e.g. graphic descriptions of violence. Please don't ask me to be more specific about this because I really, really don't want to.)
You don't have to be specific, but how would grossing out the gatekeeper bring you closer to escape?
But you wouldn't actually be posting it, you would be posting the fact that you conceive it possible for someone to post it, which you've clearly already done.
I'm not sure what you mean by "a hypothetical," then. Is "psychological torture" not a hypothetical?