Jiro comments on The AI in a box boxes you - Less Wrong

102 Post author: Stuart_Armstrong 02 February 2010 10:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (378)

You are viewing a single comment's thread. Show more comments above.

Comment author: dxu 17 August 2015 02:39:25PM *  1 point [-]

So, if an agent hears of your pre-commitment, then that agent merely needs to ensure that you don't hear that it has heard of your pre-commitment in order to be able to blackmail you?

If you're uncertain about whether or not your blackmailer has heard of your pre-commitment, then you should act as if they have, and ignore their blackmail accordingly. This also applies to agents who have deleted knowledge of your pre-commitment from their memories; you want to punish agents who spend time trying to think up loopholes in your pre-commitment, not reward them. The harder part, of course, is determining what threshold of uncertainty is required; to this I freely admit that I don't know the answer.

EDIT: More generally, it seems that this is an instance of a broader problem: namely, the problem of obtaining information. Given perfect information, the decision theory works out, but by disallowing my agent access to certain key pieces of information regarding the blackmailer, you can force a sub-optimal outcome. Moreover, this seems to be true for any strategy that depends on your opponent's epistemic state; you can always force that strategy to fail by denying it the information it needs. The only strategies immune to this seem to be the extremely general ones (like "Defect in one-shot Prisoner's Dilemmas"), but those are guaranteed to produce a sub-optimal result in a number of cases (if you're playing against a TDT/UDT-like agent, for example).

Comment author: Jiro 17 August 2015 03:05:41PM 1 point [-]

If you make me play the Iterated Prisoner's Dilemma with shared source code, I can come up with a provably optimal solution against whatever opponent I'm playing against

Doesn't that implicate the halting problem?

Comment author: dxu 17 August 2015 03:15:24PM 1 point [-]

Argh, you ninja'd my edit. I have now removed that part of my comment (since it seemed somewhat irrelevant to my main point).