Continuation of: http://lesswrong.com/lw/7/kinnairds_truels/i7#comments
Eliezer has convinced me to one-box Newcomb's problem, but I'm not ready to Cooperate in one-shot PD yet. In http://www.overcomingbias.com/2008/09/iterated-tpd.html?cid=129270958#comment-129270958, Eliezer wrote:
PDF, on the 100th [i.e. final] move of the iterated dilemma, I cooperate if and only if I expect the paperclipper to cooperate if and only if I cooperate, that is:
Eliezer.C <=> (Paperclipper.C <=> Eliezer.C)
The problem is, the paperclipper would like to deceive Eliezer into believing that Paperclipper.C <=> Eliezer.C, while actually playing D. This means Eliezer has to expend resources to verify that Paperclipper.C <=> Eliezer.C really is true with high probability. If the potential gain from cooperation in a one-shot PD is less than this cost, then cooperation isn't possible. In Newbomb’s Problem, the analogous issue can be assumed away, by stipulating that Omega will see through any deception. But in the standard game theory analysis of one-shot PD, the opposite assumption is made, namely that it's impossible or prohibitively costly for players to convince each other that Player1.C <=> Player2.C.
It seems likely that this assumption is false, at least for some types of agents and sufficiently high gains from cooperation. In http://www.nabble.com/-sl4--prove-your-source-code-td18454831.html, I asked how superintelligences can prove their source code to each other, and Tim Freeman responded with this suggestion:
Entity A could prove to entity B that it has source code S by consenting to be replaced by a new entity A' that was constructed by a manufacturing process jointly monitored by A and B. During this process, both A and B observe that A' is constructed to run source code S. After A' is constructed, A shuts down and gives all of its resources to A'.
But this process seems quite expensive, so even SIs may not be able to play Cooperate in one-shot PD, unless the stakes are pretty high. Are there cheaper solutions, perhaps ones that can be applied to humans as well, for players in one-shot PD to convince each other what decision systems they are using?
On a related note, Eliezer has claimed that truly one-shot PD is very rare in real life. I would agree with this, except that the same issue also arises from indefinitely repeated games where the probability of the game ending after the current round is too high, or the time discount factor is too low, for a tit-for-tat strategy to work.
Let's say I'm about to engage in a one-shot PD with Jeffreyssai. I know that Jeffreyssai has participated in one-shot PDs with a variety of other agents, including humans, cats and dogs, paperclip maximizers from other universes, and Babyeaters, and that the outcomes have been:
In this scenario it seems obvious that I should cooperate with Jeffreyssai. Real life situations aren't likely to be this clear-cut but similar reasoning can still apply.
Edit: To make my point more explicit, all we need is common knowledge that we are both consistent one-boxers, with enough confidence to shift the expected utility of cooperation higher than defection. We don't need to exchange source code, we can use any credible signal that we one-box, such as pointing to evidence of past one-boxing.
Assuming agents participate in a sufficient number of public one-shot PD's is essentially the same as playing iterated PD. If true one-shot PD's are rare its doubtful there'd be enough historic evidence of your opponent to be certain of anything.