This post is inspired by the recent discussion I had with IlyaShpitser and Vaniver on EDT.
A random variable only ever has one value
In probability theory, statistics and so on, we often use the notion of a random variable (RV). If you go look at the definition, you will see that a RV is a function of the sample space. What that means is that a RV assigns a value to each possible outcome of a system. In reality, where there are no closed systems, this means that a RV assigns a value to each possible universe.
For example, a random variable X representing the outcome a die roll is a function of type "Universe → {1..6}". The value of X in a particular universe u is then X(u). Uncertainty in X corresponds to uncertainty about the universe we are in. Since X is a pure mathematical function, its value is fixed for each input. That means that in a fixed universe, say our universe, such a random variable only ever takes on one value.
So, before the die roll, the value of X is undefined1, and after the roll X is forever fixed. X is the outcome of a certain particular roll. If I roll the same die again, that doesn't change the value of X. If you want to talk about multiple rolls, you have to use different variables. The usual solution is to use indices, X1, X2, etc.
This also means that the nodes in a causal model, are not random variables. For example in the causal model "Smoking → Cancer", there is no single RV for smoking. Rather, the model is implicitly a generalized to mean "Smokingi → Canceri" for all persons i.
What this means for EDT
It is sometimes claimed that Evidential Decision Theory (EDT) can not deal with causal structure. But I would disagree. To avoid confusion, I will refer to my interpretation as Estimated Evidential Decision Theory (EEDT).
Decision theories such as (E)EDT rely on the following formula to make decisions:
where oj are the possible outcomes, U(oj) is the utility of an outcome, O is a random variable that represents the actual outcome, and a is an action. The (E)EDT policy is to take the action that maximizes V(a), the value of that action.
How would you evaluate this formula in practice? To do that, you need to know P(O=oj | a). I.e. the probability of a certain outcome given that you take a certain action. But keep in mind the previous section! There is only one random variable O, which is the outcome of this action. Without assuming some prior knowledge, O is unrelated to the outcome of other similar actions in similar situations.
At the time an agent has to decide what action a to take, the action has not happened yet, and the outcome is not yet known to him. This means that the agent has no observations of O. The agent therefore has to estimate P(O=oj|a) by using only his prior knowledge. How this estimation is done exactly is not specified by EEDT. If the agent wants to use a causal model, he is perfectly free to do so!
You might argue that by not specifying how the conditional probabilities P(O=oj|a) are calculated, I have taken out the interesting part of the decision theory. With the right choice of estimation procedure, EEDT can describe CDT, normal/naive EDT, and even UDT2. But EEDT is not so general as to be completely useless. What it does give you is a way to reduce the problem of making decisions to that of estimating conditional probabilities.
Footnotes
1. Technically, 'undefined' is not in the domain of X. What I mean is that X is a partial function of universes, or a function only of universes in which the die has been rolled.
2. To get CDT, assume there is a causal model for A -> O, and use that to estimate P(O=oj | do A=a). To get naive EDT, estimate the probabilities from data without taking causality or confounders into account. To get UDT, model A as being the choice of all sufficiently similar agents, not just yourself.
Generally, no. Newcomb's is weird, and so examples using it will be weird.
It may be clearer to imagine a scenario where there is a default value for some node, which may depend on other variables in the system, and that you could intervene to adjust it from the default to some other value you prefer.
For example, suppose you had a button that toggles whether a fire alarm is ringing. Suppose the fire alarm is not perfectly reliable, so that sometimes it rings when there isn't a fire, and sometimes when there's a fire it doesn't ring. It's very different for you to observe that the alarm is off, and then switch the alarm on, and for you to observe that the alarm is on.
If an EDT system only has two nodes, "fire" (which is unobserved) and "alarm" (which is observed), then it doesn't have a way to distinguish between the alarm switching on its own (when we should update our estimate of fire) and the alarm switching because we pressed the button (when we shouldn't update our estimate of fire). We could fix that by adding in a "button" node, or by switching to a causal network where fire points to alarm but alarm doesn't point to fire. In general, the second approach is better because it lacks degrees of freedom which it should not have (and because many graph-based techniques scale in complexity based on the number of nodes, whereas making the edges directed generally reduces the complexity, I think). It's also agnostic to how we intervene, which allows for us to use one graph to contemplate many interventions, rather than having a clear-cut delineation between decision and nature nodes.
Right; I meant to convey that in the Omega sees the future case, not even Professor X can surprise Omega.
Hopefully, you can tell the difference between an alarm you triggered and an alarm that you did not.