A CDT agent will do this, if it can be proven that it cannot make worse decisions after the modification than if it had not modified itself. I actually tried to find literature on this a while back, but couldn't find any, so I assigned a very low probability to the possibility that this could be proven. Seeing how you seem to be familiar with the topic, do you know of any?
I am somewhat familiar with the topic but note that I am most familiar with the work that has already moved past CDT (ie. considers CDT irrational and inferior to a reflective decision theory along the lines of TDT or UDT). Thus far nobody has got around to formally writing up a "What CDT self modifies to" paper that I'm aware of (I wish they would!). It would be interesting to see what someone coming from the assumption that CDT is sane could come up with. Again I'm unfamiliar with such attempts but in this case that is far less evidence about such things existing.
I wasn't asking for a concrete alternative for CDT. If anything, I'm interested in a proof that such a decision theory can possibly exist. Because trying to find an alternative when you haven't proven this seems like a task with a very low chance of success.
I have read lots of LW posts on this topic, and everyone seems to take this for granted without giving a proper explanation. So if anyone could explain this to me, I would appreciate that.
This is a simple question that is in need of a simple answer. Please don't link to pages and pages of theorycrafting. Thank you.
Edit: Since posting this, I have come to the conclusion that CDT doesn't actually play Newcomb. Here's a disagreement with that statement:
And here's my response:
Edit 2: Clarification regarding backwards causality, which seems to confuse people:
Edit 3: Further clarification on the possible problems that could be considered Newcomb:
Edit 4: Excerpt from Nozick's "Newcomb's Problem and Two Principles of Choice":