That seems silly. I'd think the time difference thoroughly breaks the symmetry that the TDT answer to the prisoner's dilemma that I've seen relies on. It's like if one of the prisoners got to hear the other prisoners' answer before deciding.
But I suppose your answer does increase the usefulness of this problem as something to think about.
TDT agents do what they'd want to precommit to doing. Thus, in PD where one prisoner sees the other's decision, TDT agents still cooperate (since if the second player doesn't cooperate, then it won't benefit from first player's cooperation, but it wants to, hence it does cooperate).
In the interest of making decision theory problems more relevant, I thought I'd propose a real-life version of counterfactual mugging. This is discussed in Drescher's Good and Real, and many places before. I will call it the Hazing Problem by comparison to this practice (possibly NSFW – this is hazing, folks, not Disneyland).
The problem involves a timewise sequence of agents who each decide whether to "haze" (abuse) the next agent. (They cannot impose any penalty on previous agent.) For all agents n, here is their preference ranking:
1) not be hazed by n-1
2) be hazed by n-1, and haze n+1
3) be hazed by n-1, do NOT haze n+1
or, less formally:
1) not be hazed
2) haze and be hazed
3) be hazed, but stop the practice
The problem is: you have been hazed by n-1. Should you haze n+1?
Like in counterfactual mugging, the average agent has lower utility by conditioning on having been hazed, no matter how big the utility difference between 2) and 3) is. Also, it involves you having to make a choice from within a "losing" part of the "branching", which has implications for the other branches.
You might object the choice of whether to haze is not random, as Omega’s coinflip is in CM; however, there are deterministic phrasings of CM, and your own epistemic limits blur the distinction.
UDT sees optimality in returning not-haze unconditionally. CDT reasons that its having been hazed is fixed, and so hazes. I *think* EDT would choose to haze because it would prefer to learn that, having been hazed, they hazed n+1, but I'm not sure about that.
I also think that TDT chooses not-haze, although this is questionable since I'm claiming this is isomorphic to CM. I would think TDT reasons that, "If n's regarded it as optimal to not haze despite having been hazed, then I would not be in a position of having been hazed, so I zero out the disutility of choosing not-haze."
Thoughts on the similarity and usefulness of the comparison?