You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

IlyaShpitser comments on Open thread, 23-29 June 2014 - Less Wrong Discussion

3 Post author: David_Gerard 23 June 2014 07:21AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (190)

You are viewing a single comment's thread. Show more comments above.

Comment author: IlyaShpitser 25 June 2014 06:19:28PM *  4 points [-]

Hi, thanks for this. I agree that this choice was not arbitrary at all!

There are a few related reasons why it was made.

(a) Pearl wisely noted that it is independences that we exploit for things like propagating beliefs around a sparse graph in polynomial time. When he was still arguing for the use of probability in AI, people in AI were still not fully on board, because they thought that to probabilistically reason about n binary variables we need a 2^n table for the joint, which is a non-starter (of course statisticians were on board w/ probability for hundreds of years even though they didn't have computers -- their solution was to use clever parametric models. In some sense Bayesian networks are just another kind of clever parametric model that finally penetrated the AI culture in the late 80s).

(b) We can define statistical (causal) models by either independences or dependences, but there is a lack of symmetry here that the symmetry of the "presence or absence of edges in a graph" masks. An independence is about a small part of the parameter space. That is, a model defined by an independence will correspond to a manifold of smaller dimension generally that sits in a space corresponding to a saturated model (no constraints). A model defined by dependences will just be that same space with a "small part" missing. Lowering dimension in a model is really nice in stats for a number of reasons.

(c) While conceivably we might be interested in a presence of a causal effect more than an absence of a causal effect, you are absolutely right that generally assumptions that allow us to equate a causal effect with some functional of observed data take the form of equality constraints (e.g. "independences in something.") So it is much more useful to represent that even if we care about the presence of an effect at the end of the day. We can just see how far from null the final effect number is -- we don't need a graphical representation. However a graphical representation for assumptions we are exploiting to get the effect as a functional of observed data is very handy -- this is what eventually led Jin Tian to his awesome identification algorithm on graphs.

(d) There is an interesting logical structure to conditional independence, e.g. Phil Dawid's graphoid axioms. There is something like that for dependences (Armstrong's axioms for functional dependence in db theory?) but the structure isn't as rich.

edit: there are actually only two semi-graphoids : one for symmetry and one for chain rule.

edit^2: graphoids are not complete (because conditional independence is actually kind of a nasty relation). But at least it's a ternary relation. There are far worse dragons in the cave of "equality constraints."