Basically: EDT/UDT has simple arguments in its favor and seems to perform well. There don't seem to be any serious arguments in favor of CDT, and the human intuition in its favor seems quite debunkable. So it seems like the burden of proof is on CDT, to justify why it isn't crazy. It may be that CDT has met that burden, but I'm not aware of it.
A. The dominance arguments in favor of two-boxing seem quite weak. They tend to apply verbatum to playing prisoner's dilemmas against a mirror (If the mirror cooperates you'd prefer defect, if the mirror defects you'd prefer defect, so regardless of the state of nature you'd prefer defect). So why do you not accept the dominance argument for a mirror, but accept it in the case of Newcomb-like problems? To discriminate the cases it seems you need to make an assumption of no causal connection, or a special role for time, in your argument.
This begs the question terribly---why is a causal connection privileged? Why is the role of time privileged? As far as I can tell these two things are pretty arbitrary and unimportant. I'm not aware of any strong philosophical arguments for CDT, besides "it seems intuitively sensible to a human," and see below for the debunking of those intuitions. (Again, maybe there are better arguments here, but I've never encountered one. Basically I'm looking for any statement of a kind of dominance principle over states of nature, which doesn't look completely arbitrary and is also at all plausible.)
B. A sophisticated interpretation of EDT (called UDT around here) seems to perform well in all cases we've considered, in the sense that an agent making good decisions will achieve good outcomes. I think this is strong evidence in favor of a theory which purports to say which actions are good, since good decisions ought to lead to good outcomes; I agree its not a knock-down argument, but again I know of no serious counterarguments.
C. It seems that EDT is supported by the simplest philosophical arguments. We need to choose between outcomes in which we make decision A vs. decision B. It makes sense to choose between outcomes which we consider to be possible (in which we make decision A or decision B). CDT doesn't do this, and considers outcomes which are inconsistent with our knowledge of the situation. This isn't enough to pin down EDT uniquely (though further arguments can), but it does seem like a strong point in favor of EDT over CDT.
D. An agent living in an environment like humans' will do fine by using CDT, because the only effects of their decisions are causal. CDT is much simpler to run than EDT because it doesn't rely on a strong self-model (doing EDT without a good self-model results in worse decisions than CDT in reasonable situations; this is basically what the claims that EDT performs badly in such-and-such a situation amount to, at least the ones I have seen). So it seems like we can pretty easily explain why humans have an intuition in favor of CDT, and it seems like extremely weak evidence against EDT/UDT.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
The two-boxer is trying to maximise money (utility). They are interested in the additional question of which bits of that money (utility) can be attributed to which things (decisions/agent types). "Caused gain" is a view about how we should attribute the gaining of money (utility) to different things.
So they agree that the problem is about maximising money (utility) and not "caused gain". But they are interested in not just which agents end up with the most money (utility) but also which aspects of those agents is responsible for them receiving the money. Specifically, they are interested in whether the decisions the agent makes are responsible for the money they receive. This does not mean they are trying to maximise something other than money (utility). It means they are interested in maximising money and then also in how you can maximise money via different mechanisms.
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it's very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large 'free' cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor's infallibility. Generally, I think grasping verbal arguments doesn't "modify" humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
EDIT: fixeds some typos.