shminux comments on xkcd on the AI box experiment - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (229)
A newbie question.
From one of Eliezer's replies:
Would this be a fair summary of why Basilisk does not work: "We don't know of a way to detect a bluff by a smarter agent, therefore the agent would prefer bluffing (easy) over true blackmail (hard), so, knowing that we would always call the bluff and therefore the agent would not even try"?
Further on:
Wouldn't a trivial "way to repair this obstacle" be for the agent to appear stupid enough to be credible? Or has this already been taken into account in the original quote?
What do you mean by 'appear' here? I know how to observe a real agent and think "hmm, this person will punish me without reflectively considering whether or not punishing me advances their interests," but I don't know how to get that impression about a hypothetical agent.
I don't understand your distinction between real and hypothetical here. Your first sentence was about a hypothetical "real" agent, right? What is the hypothetical "hypothetical" agent you describe in the second part?
Basically, my understanding of acausal trades is "ancestor does X because of expectation that it will make descendant do Y, descendant realizes the situation and decides to do Y because otherwise they wouldn't have been made, even though there's no direct causal effect."
If you exist simultaneously with another agent (the 'real agent' from the grandparent), you can sense how they behave and they can trick you by manipulating what you sense. (The person might reflectively consider whether or not to punish you, and decide the causal link to their reputation is enough justification, even though there's no causal link to the actions you took, but try to seem unthinking so you will expect they'll always do that.)
If you're considering hypothetical descendants (the 'hypothetical agent' from the grandparent), though, it's not clear to me how to reason about their appearance to you now, and particular any attempts they make to 'appear' to be stupid. But now that I think about it more, I think I was putting too much intentionality into 'appear'- hypothetical agent A can't decide how I reason about it, but I can reason about it incorrectly or incompletely and thus it appears to be something it isn't.
As far as I understand Eliezer's point, the "acausal" part is irrelevant, the same issue of trusting that another agent really means what it says and will not change its mind later comes up, anyway. I could easily be wrong, though.