Newcomb's Problem vs. One-Shot Prisoner's Dilemma

Wei Dai

Continuation of: http://lesswrong.com/lw/7/kinnairds_truels/i7#comments

Eliezer has convinced me to one-box Newcomb's problem, but I'm not ready to Cooperate in one-shot PD yet. In http://www.overcomingbias.com/2008/09/iterated-tpd.html?cid=129270958#comment-129270958, Eliezer wrote:

PDF, on the 100th [i.e. final] move of the iterated dilemma, I cooperate if and only if I expect the paperclipper to cooperate if and only if I cooperate, that is:

Eliezer.C <=> (Paperclipper.C <=> Eliezer.C)

The problem is, the paperclipper would like to deceive Eliezer into believing that Paperclipper.C <=> Eliezer.C, while actually playing D. This means Eliezer has to expend resources to verify that Paperclipper.C <=> Eliezer.C really is true with high probability. If the potential gain from cooperation in a one-shot PD is less than this cost, then cooperation isn't possible. In Newbomb’s Problem, the analogous issue can be assumed away, by stipulating that Omega will see through any deception. But in the standard game theory analysis of one-shot PD, the opposite assumption is made, namely that it's impossible or prohibitively costly for players to convince each other that Player1.C <=> Player2.C.

It seems likely that this assumption is false, at least for some types of agents and sufficiently high gains from cooperation. In http://www.nabble.com/-sl4--prove-your-source-code-td18454831.html, I asked how superintelligences can prove their source code to each other, and Tim Freeman responded with this suggestion:

Entity A could prove to entity B that it has source code S by consenting to be replaced by a new entity A' that was constructed by a manufacturing process jointly monitored by A and B. During this process, both A and B observe that A' is constructed to run source code S. After A' is constructed, A shuts down and gives all of its resources to A'.

But this process seems quite expensive, so even SIs may not be able to play Cooperate in one-shot PD, unless the stakes are pretty high. Are there cheaper solutions, perhaps ones that can be applied to humans as well, for players in one-shot PD to convince each other what decision systems they are using?

On a related note, Eliezer has claimed that truly one-shot PD is very rare in real life. I would agree with this, except that the same issue also arises from indefinitely repeated games where the probability of the game ending after the current round is too high, or the time discount factor is too low, for a tit-for-tat strategy to work.

Continuation of: http://lesswrong.com/lw/7/kinnairds_truels/i7#comments

PDF, on the 100th [i.e. final] move of the iterated dilemma, I cooperate if and only if I expect the paperclipper to cooperate if and only if I cooperate, that is:

Eliezer.C <=> (Paperclipper.C <=> Eliezer.C)

Entity A could prove to entity B that it has source code S by consenting to be replaced by a new entity A' that was constructed by a manufacturing process jointly monitored by A and B. During this process, both A and B observe that A' is constructed to run source code S. After A' is constructed, A shuts down and gives all of its resources to A'.

I don't think saturn's method works, unfortunately, because I can't tell why Jeffreyssai has played C in the past. It could be because he actually uses a decision theory that plays C in one-shot PD if Player1.C <=> Player2.C, or because he just wants others to think that he uses such a decision theory. The difference would become apparent if the outcome of the particular one-shot PD I'm going to play with Jeffreyssai won't be made public.

14

Newcomb's Problem vs. One-Shot Prisoner's Dilemma

14

14

14

Newcomb's Problem vs. One-Shot Prisoner's Dilemma

14

14