Eliezer_Yudkowsky comments on Philosophy Needs to Trust Your Rationality Even Though It Shouldn't - Less Wrong

27 Post author: lukeprog 29 November 2012 09:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (169)

You are viewing a single comment's thread. Show more comments above.

Comment author: IlyaShpitser 01 December 2012 06:10:06AM *  12 points [-]

By (a) I mean that you can sometimes get the true graph exactly even without having to observe confounders. Actually this was sort of known already (see the FCI algorithm, or even the IC* algorithm in Pearl's book), but we can do a lot better than that. For example, if we have the true graph:

a -> b -> c -> d, with a <- u1 -> c, and a <- u2 -> d, where we do not observe u1,u2, and u1,u2 are very complicated, then we can figure out the true graph exactly by independence type techniques without having to observe u1 and u2. Note: the marginal distribution p(a,b,c,d) that came from this graph has no conditional independences at all (checkable by d-separation on a,b,c,d), so typical techniques fail.


(b) is I guess "a subtle issue" -- but my point is about careful language use and keeping causal and statistical issues clear and separate.

A "Bayesian network" (or "belief network" -- I don't like the word Bayesian here because it is confusing the issue, you can use frequentist techniques with belief networks if you wanted, in fact a lot of folks do) is a joint distribution that factorizes as a DAG. That's it. Nothing about causality. If there is a joint density representing a causal process where a is a direct cause of b is a direct cause of c, then this joint density will factorize with respect to both

a -> b -> c

and

a <- b <- c

but only the former graph is causal, the latter is not. Both graphs form a "Bayesian network" with the joint density (since the density factorizes with respect to both graphs), but only one graph is a causal graph. If you want to talk about causal models, in addition to saying that there is a Markov factorization you also need to say something else -- something that makes parents into direct causes. Usually people say something like:

for every x, p(x | pa(x)) = p(x | do(pa(x))), or mention the g-formula, or the truncated factorization of do(.), or "the causal Markov condition."

But this is something that (a) you need to say explicitly, and (b) involves language beyond standard probability theory because there is a do(.), and (c) is controversial to some people. What is do(.)? It refers to a hypothetical experiment/intervention.


If all you are learning is a graph that gives you a Markov factorization you have no business making claims about interventions -- interventions are a separate magisterium. You can assume that the unknown graph from which the data came is causal -- but you need to say this explicitly, this assumption will be controversial to some people, and by making that assumption you are I think committing yourself to the use of interventionist/potential outcome language (just to describe what it means for a data generating graph to be causal).

I have no problems with you doing Bayesian updating and getting posteriors over causal models -- I just wanted to get more precision on what a causal model is. A causal model is not a density factorizing with respect to a DAG -- that's a statistical model. A causal model makes assertions that relate hypothetical experiments like p(x | do(pa(x))) with observed data like p(x | pa(x)). So your Bayesian updating is operating in a world that contains more than just probability theory (which is a theory of standard joint densities, without the mention of do(.) or hypothetical experiments). You can in fact augment probability theory with a logical description of interventions, see for example this paper:

http://www.jair.org/papers/paper648.html


If your notion of causal model does not relate do(.) to observed data, then I don't know what you mean by a causal model. It's certainly not what I mean by it.

Comment author: Eliezer_Yudkowsky 01 December 2012 09:06:00PM 1 point [-]

a -> b -> c -> d, with a <- u1 -> c, and a <- u2 -> d, where we do not observe u1,u2, and u1,u2 are very complicated, then we can figure out the true graph exactly by independence type techniques without having to observe u1 and u2. Note: the marginal distribution p(a,b,c,d) that came from this graph has no conditional independences at all (checkable by d-separation on a,b,c,d), so typical techniques fail.

Irrelevant question: Isn't (b || d) | a, c?

Comment author: IlyaShpitser 01 December 2012 10:42:26PM 9 points [-]

No, because b -> c <-> a <-> d is an open path if you condition on c and a.

Comment author: Eliezer_Yudkowsky 02 December 2012 01:07:13AM 2 points [-]

Ah, right.