You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

shminux comments on xkcd on the AI box experiment - Less Wrong Discussion

15 Post author: FiftyTwo 21 November 2014 08:26AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (229)

You are viewing a single comment's thread.

Comment author: shminux 21 November 2014 09:22:38PM 3 points [-]

A newbie question.

From one of Eliezer's replies:

As I presently understand the situation, there is literally nobody on Earth, including me, who has the knowledge needed to set themselves up to be blackmailed if they were deliberately trying to make that happen. Any potentially blackmailing AI would much prefer to have you believe that it is blackmailing you, without actually expending resources on following through with the blackmail, insofar as they think they can exert any control on you at all via an exotic decision theory. Just like in the oneshot Prisoner's Dilemma the "ideal" outcome is for the other player to believe you are modeling them and will cooperate if and only if they cooperate, and so they cooperate, but then actually you just defect anyway. For the other player to be confident this will not happen in the Prisoner's Dilemma, for them to expect you not to sneakily defect anyway, they must have some very strong knowledge about you.

Would this be a fair summary of why Basilisk does not work: "We don't know of a way to detect a bluff by a smarter agent, therefore the agent would prefer bluffing (easy) over true blackmail (hard), so, knowing that we would always call the bluff and therefore the agent would not even try"?

Further on:

I have written the above with some reluctance, because even if I don't yet see a way to repair this obstacle myself, somebody else might see how to repair it now that I've said what it is.

Wouldn't a trivial "way to repair this obstacle" be for the agent to appear stupid enough to be credible? Or has this already been taken into account in the original quote?

Comment author: Vaniver 21 November 2014 10:33:04PM 0 points [-]

Wouldn't a trivial "way to repair this obstacle" be for the agent to appear stupid enough to be credible?

What do you mean by 'appear' here? I know how to observe a real agent and think "hmm, this person will punish me without reflectively considering whether or not punishing me advances their interests," but I don't know how to get that impression about a hypothetical agent.

Comment author: shminux 21 November 2014 10:39:30PM *  0 points [-]

I don't understand your distinction between real and hypothetical here. Your first sentence was about a hypothetical "real" agent, right? What is the hypothetical "hypothetical" agent you describe in the second part?

Comment author: Vaniver 22 November 2014 01:02:00AM 0 points [-]

I don't understand your distinction between real and hypothetical here.

Basically, my understanding of acausal trades is "ancestor does X because of expectation that it will make descendant do Y, descendant realizes the situation and decides to do Y because otherwise they wouldn't have been made, even though there's no direct causal effect."

If you exist simultaneously with another agent (the 'real agent' from the grandparent), you can sense how they behave and they can trick you by manipulating what you sense. (The person might reflectively consider whether or not to punish you, and decide the causal link to their reputation is enough justification, even though there's no causal link to the actions you took, but try to seem unthinking so you will expect they'll always do that.)

If you're considering hypothetical descendants (the 'hypothetical agent' from the grandparent), though, it's not clear to me how to reason about their appearance to you now, and particular any attempts they make to 'appear' to be stupid. But now that I think about it more, I think I was putting too much intentionality into 'appear'- hypothetical agent A can't decide how I reason about it, but I can reason about it incorrectly or incompletely and thus it appears to be something it isn't.

Comment author: shminux 22 November 2014 02:05:47AM 0 points [-]

As far as I understand Eliezer's point, the "acausal" part is irrelevant, the same issue of trusting that another agent really means what it says and will not change its mind later comes up, anyway. I could easily be wrong, though.