AlexMennen comments on Stupid Questions Open Thread Round 4 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (179)
While we're on the subject of decision theory... what is the difference between TDT and UDT?
Maybe the easiest way to understand UDT and TDT is:
Comparing UDT and TDT directly, the main differences seem to be that UDT does not do Bayesian updating on sensory inputs and does not make use of causality. There seems to be general agreement that Bayesian updating on sensory inputs is wrong in a number of situations, but disagreement and/or confusion about whether we need causality. Gary Drescher put it this way:
(Eliezer didn't give an answer. ETA: He did answer a related question here.)
I can see what updating on sensory updating does to TDT (causing it to fail counterfactual mugging). But what does it mean to say that TDT makes use of causality and UDT doesn't? Are there any situations where this causes them to give different answers?
(I added a link at the end of the grandparent comment where Eliezer does give some of his thoughts on this issue.)
Eliezer seems to think that causality can help deal with Gary Drescher's "5-and-10" problem:
But it seems possible to build versions of UDT that are free from such problems (such as the proof-based ones that cousin_it and Nesov have explored), although there are still some remaining issues with "spurious proofs" which may be related. In any case, it's unclear how to get help from the notion of causality, and as far as I know, nobody has explored in that direction and reported back any results.
I'm not an expert but I think this is how it works:
Both decision theories (TDT and UDT) work by imagining the problem from the point of view of themselves before the problem started. They then think "From this point of view, which sequence of decisions would be the best one?", and then they follow that sequence of decisions. The difference is in how they react to randomness in the environment. When the algorithm is run, the agent is already midway through the problem, and so might have some knowledge that it didn't have at the start of the problem (e.g. whether a coinflip came up heads or tails). When visualising themselves at the start of the problem TDT assumes they have this knowledge, UDT assumes they don't.
An example is Counterfactual Mugging:
TDT visualises itself before the problem started, knowing that the coin the coin will come up tails. From this point of view the kind of agent that does well is the kind that refuses to give $100, and so that's what TDT does.
UDT visualises itself before the problem started, and pretends it doesn't know what the coin does. From this point of view the kind of agent that does well is the kind that gives $100 in the case of tails, so that's what UDT does.
Why do we still reference TDT so much if UDT is better?
Many people think of UDT as being a member of the "TDT branch of decision theories." And in fact, much of what is now discussed as "UDT" (e.g. in A model of UDT with a halting oracle) is not Wei Dai's first or second variant of UDT but instead a new variant of UDT sometimes called Ambient Decision Theory or ADT.
Follow-up: Is it in how they compute conditional probabilities in the decision algorithm? As I understand it, that's how CDT and EDT and TDT differ.
I don't think that is how CDT and EDT differ, actually. Instead, it's that EDT cares about conditional probabilities and CDT doesn't. For instance, in Newcomb's problem, a CDT agent could agree that his expected utility is higher conditional on him one-boxing than it is conditional on him two-boxing. But he two-boxes anyway because the correlation isn't causal. A guess TDT/UDT does compute conditional probabilities differently in the sense that they don't pretend that their decisions are independent of the outputs of similar algorithms.