I feel like MIRI perhaps mispositioned FDT (their variant of UDT) as a clear advancement in decision theory, whereas maybe they could have attracted more attention/interest from academic philosophy if the framing was instead that the UDT line of thinking shows that decision theory is just more deeply puzzling than anyone had previously realized. Instead of one major open problem (Newcomb's, or EDT vs CDT) now we have a whole bunch more. I'm really not sure at this point whether UDT is even on the right track, but it does seem clear that there are some thorny issues in decision theory that not many people were previously thinking about:
- Indexical values are not reflectively consistent. UDT "solves" this problem by implicitly assuming (via the type signature of its utility function) that the agent doesn't have indexical values. But humans seemingly do have indexical values, so what to do about that?
- The commitment races problem extends into logical time, and it's not clear how to make the most obvious idea of logical updatelessness work.
- UDT says that what we normally think of as different approaches to anthropic reasoning are really different preferences, which seems to sidestep the problem. But is that actually right, and if so where are these preferences supposed to come from?
- 2TDT-1CDT - If there's a population of mostly TDT/UDT agents and few CDT agents (and nobody knows who the CDT agents are) and they're randomly paired up to play one-shot PD, then the CDT agents do better. What does this imply?
- Game theory under the UDT line of thinking is generally more confusing than anything CDT agents have to deal with.
- UDT assumes that the agent has access to its own source code and inputs as symbol strings, so it can potentially reason about logical correlations between its own decisions and other agents' as well defined mathematical problems. But humans don't have this, so how are humans supposed to reason about such correlations?
- Logical conditionals vs counterfactuals, how should these be defined and do the definitions actually lead to reasonable decisions when plugged into logical decision theory?
These are just the major problems that I was trying to solve (or hoping for others to solve) before I mostly stopped working on decision theory and switched my attention to metaphilosophy. (It's been a while so I'm not certain the list is complete.) As far as I know nobody has found definitive solutions to any of these problems yet, and most are wide open.
I don't think that's the case unless you have really weird assumptions. If the other party can't tell what the TDT/UDT agent will pick, they'll defect, won't they? It seems strange that the other party would be able to tell what the TDT/UDT agent will pick but not whether they're TDT/UDT or CDT.
Edit: OK, I see the idea is that the TDT/UDT agents have known, fixed code, which can, e.g., randomly mutate into CDT. They can't voluntarily change their code. Being able to trick the other party about your code is an advantage - I don't see that as a TDT/UDT problem.
I mean, that's a thing you might hope to be true. I'm not sure if it actually is true.