paper-machine comments on Timelessness as a Conservative Extension of Causal Decision Theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (65)
I think Spohn also qualifies as an extension of CDT. It's been remarked before that Spohn's "intention nodes" are very similar to EY's "logical nodes" and by transitivity also CDT+E.
Disagreed. By CDT I mean calculating utilities using:
(The only modification from the wikipedia article is that I'm using Pearl's clearer notation for P(A>Oj).)
The naive CDT setup for Newcomb's problem has a causal graph which looks like B->M<-P, where B is your boxing decision, P is Omega's prediction, and M is the monetary reward you receive. This causal graph disagrees with the problem statement, as it necessarily implies that B and P are unconditionally independent, which we know is not the case from the assumption that Omega is a perfect predictor. The causal graph that agrees with the problem statement is B->P->M and B->M, in which case one-boxing is trivially the right action.
The bulk of Spohn's paper is all about how to get over the fear of backwards causation in hypothetical scenarios which explicitly allow backwards causation. You can call that an extension if you want, but it seems to me that's all in the counterfactual reasoning module, not in the decision-making module. (That is, CDT does not describe how you come up with P(Oj|do(A)), only what you do with it once you have it.)
Uh, doesn't the naive CDT setup for Newcomb's problem normally include a "my innards" node that has arrows going to both B and P? It's that that introduces the unconditional dependence between B and P. Obviously "B -> M <- P" by itself can't even express the problem because it can't represent Omega making any prediction at all.
If you decide what your innards are, and not what your action is, then this matches the problem description. If you can somehow have dishonest innards (Omega thinks I'm a one-boxer, then I can two-box), then this again violates the perfect prediction assumption.
I believe, as an empirical question, the first explicitly CDT accounts of Newcomb's problem did not use graphs, but if you convert their argument into a graph, it implicitly assumes "B -> M <- P."
Isn't the whole point of CDT that you cut any arrows from ancestor nodes with do(A) where A is your "intervention"? Obviously you can't have your innards imply your action if you explicitly violate that connection by describing your decision as an intervention.
Here is how I understood typical CDT accounts of Newcomb's problem: You have a graph given by
B <- Innards -> PandB -> M <- P.Innardsstarts with some arbitrary prior probability since you don't know your decision beforehand. You perturb the graph by deletingInnards -> Bin order to calculatep(M | do(B)), and in doing so you end up with a graph "looking like"B -> M <- P. Then the usual "dominance" arguments determine the decision regardless of the prior probability onInnards.Of course, after doing this analysis and coming up with a decision you now know (unconditionally) the value of
Band thereforeInnards, so arguably the probabilities for those should be set to 1 or 0 as appropriate in the original graph. This is generally interpreted by CDTists as a proof that this agent always two-boxes, and always gets the smaller reward.Yes. My point is that when you have a supernatural Omega, then putting any of Omega's actions in ancestor nodes of your decisions, instead of descendant nodes of your decisions, is a mistake that violates the problem description.
But if you don't delete the incoming arches on your decision nodes then it isn't CDT anymore, it's just EDT.
Which begs the question of why we should bother with CDT in the first place.
Some people claim that EDT fails at "smoking lesion" type of problems, but I think it is due to incorrect modelling or underspecification of the problem. If you use the correct model EDT produces the "right" answer.
It seems to me that EDT is superior to CDT.
(Ilya Shpitser will disagree, but I never understood his arguments)
People have known how to deal with smoking lesion (under a different name) since the 18th century (hint: the solution is not the EDT solution):
http://www.e-publications.org/ims/submission/STS/user/submissionFile/12809?confirm=bbb928f0
The trick is to construct a system that deals with things 20 times more complicated than smoking lesion. That system is recent, and you will have to read e.g. my thesis, or Jin Tian's thesis, or elsewhere to see what it is.
I have yet to see anyone advocating EDT actually handle a complicated example correctly. Or even a simple tricky example, e.g. the front door case.
You still delete incoming arcs when you make a decision. The argument is that if Omega perfectly predicts your decision, then causally his prediction must be a descendant of your decision, rather than an ancestor, because if it were an ancestor you would sever the connection that is still solid (and thus violate the problem description).
This is a shame, because he's right. Here's my brief attempt at an explanation of the difference between the two:
EDT uses the joint probability distribution. If you want to express a joint probability distribution as a graphical Bayesian network, then the direction of the arrows doesn't matter (modulo some consistency concerns). If you utilize your human intelligence, you might be able to figure out "okay, for this particular action, we condition on X but not on Y," but you do this for intuitive reasons that may be hard to formalize and which you might get wrong. When you use the joint probability distribution, you inherently assume that all correlation is causation, unless you've specifically added a node or data to block causation for any particular correlation.
CDT uses the causal network, where the direction of the arrows is informative. You can tell the difference between altering and observing something, in that observations condition things both up and down the causal graph, whereas alterations only condition things down the causal graph. You only need to use your human intelligence to build the right graph, and then the math can take over from there. For example, consider price controls: there's a difference between observing that the price of an ounce of gold is $100 and altering the price of an ounce of gold to be $100. And causal networks allow you to answer questions like "given that the price of gold is observed to be $100, what will happen when we force the price of gold to be $120?"
Now, if you look at the math, you can see a way to embed a causal network in a network without causation. So we could use more complicated networks and let conditioning on nodes do the graph severing for us. I think this is a terrible idea, both philosophically and computationally, because it entails more work and less clarity, both of which are changes in the wrong direction.
If I understand correctly, in causal networks the orientation of the arches must respect "physical causality", which I roughly understand to mean consistency with the thermodynamical arrow of time.
There is no way for your action to cause Omega's prediction in this sense, unless time travel is involved.
Yes, different Bayesian networks can represent the same probability distribution. And why would that be a problem? The probability distribution and your utility function are all that matters.
"Correlation vs causation" is an epistemic error. If you are making it then you are using the wrong probability distribution, not a "wrong" factorization of the correct probability distribution.
In the real world, this is correct, but it is not mathematically necessary. (To go up a meta level, this is about how you build causal networks in the first place, not about how you reason once you have a causal network; even if philosophers were right about CDT as the method to go from causal networks to decisions, they seem to have been confused about the method by which one goes from English problem statements to causal networks when it comes to Newcomb's problem.)
It is. How else can Omega be a perfect predictor? (I may be stretching the language, but I count Laplace's Demon as a time traveler, since it can 'see' the world at any time, even though it can only affect the world at the time that it's at.)
The problem is that you can't put any meaning into the direction of the arrows because they're arbitrary.
If you give me a causal diagram and the embedded probabilities for the environment, and ask me to predict what would happen if you did action A (i.e. counterfactual reasoning), you've already given me all I need to calculate the probabilities of any of the other nodes you might be interested in, for any action included in the environment description.
If you give me a joint probability distribution for the environment, and ask me to predict what would happen if you did action A, I don't have enough information to calculate the probabilities of the other nodes. You need to give me a different joint probability distribution for every possible action you could take. This requires a painful amount of communication, but possibly worse is that there's no obvious type difference between the joint probability distribution for the environment and for the environment given a particular action--and if I calculate the consequences of an action given the whole environment's data, I can get it wrong.
The problem is that this can lead to inconsistency when you have two omegas trying to predict each other.
This is one of the arguments against the possibility of Laplace's Demon, and I agree that a world with two Omegas is probably going to be inconsistent.
You say "disagreed" but then end up saying what I meant in the last paragraph.
Consider that I may have read Spohn before.
I think that we're arguing about whether the label CDT refers to just the utility calculation or the combination of the utility calculation and the counterfactual module, not about any of the math. I can go into the reasons why I like to separate those two out, but I think I've already covered the basics.
I generally aim to include the audience when I write comments, which sometimes has the side effect of being insultingly basic to the person I'm responding to. Normally I'm more careful about including disclaimers to that effect, and I apologize for missing that this time.