You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Caspian comments on Michael Nielsen explains Judea Pearl's causality - Less Wrong Discussion

18 Post author: gwern 24 January 2012 07:35PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (15)

You are viewing a single comment's thread. Show more comments above.

Comment author: Caspian 28 January 2012 02:18:41AM 2 points [-]

Y_{j,\cdot} is a collection of random variables

That is not the same as there being Y-nodes. Nodes would be part of the graph structure, and so be more visible when you look at the graph.

The only difference is whether the Y-values require their own nodes.

Comment author: Tyrrell_McAllister 28 January 2012 03:46:19PM *  2 points [-]

I see. Thanks. I was thrown off because he'd already said that he would "overload" the notation for random variables, using it also to represent nodes or sets of nodes. But what you say makes sense.

I'm not sure what the real difference is, though. The graph is just a way to depict dependencies among random variables. If you're already working with a collection of random variables with given dependencies, the graph is just, well, a graphical way to represent what you're already dealing with. Am I right, then, in thinking that the only difference between the two "approaches" is whether you bother to create this auxiliary graphical structure to represent what you're doing, instead of just working directly with the random variable X_i, Y_ij, and their dependency functions f_i ?

It's easier for humans to think in terms of pictures. But if you were programming a computer to reason causally this way, wouldn't you implement the two "approaches" in essentially the same way?

Comment author: Caspian 29 January 2012 12:10:03PM 1 point [-]

If you have a specified causal system you could represent it either way, yes.

Speculating on another reason he may have made the distinction: often he posed problems with specified causal graphs but unspecified functions. So he may have meant that in the problems like these, with one approach you can easily specify some node values as being deterministic functions of other node values, whereas with the other approach you don't (since a specified graph rules out further random influences in one approach but not the other).