By (a) I mean that you can sometimes get the true graph exactly even without having to observe confounders. Actually this was sort of known already (see the FCI algorithm, or even the IC* algorithm in Pearl's book), but we can do a lot better than that. For example, if we have the true graph:
a -> b -> c -> d, with a <- u1 -> c, and a <- u2 -> d, where we do not observe u1,u2, and u1,u2 are very complicated, then we can figure out the true graph exactly by independence type techniques without having to observe u1 and u2. Note: the marginal distribution p(a,b,c,d) that came from this graph has no conditional independences at all (checkable by d-separation on a,b,c,d), so typical techniques fail.
(b) is I guess "a subtle issue" -- but my point is about careful language use and keeping causal and statistical issues clear and separate.
A "Bayesian network" (or "belief network" -- I don't like the word Bayesian here because it is confusing the issue, you can use frequentist techniques with belief networks if you wanted, in fact a lot of folks do) is a joint distribution that factorizes as a DAG. That's it. Nothing about causality. If there is a joint density representing a causal process where a is a direct cause of b is a direct cause of c, then this joint density will factorize with respect to both
a -> b -> c
and
a <- b <- c
but only the former graph is causal, the latter is not. Both graphs form a "Bayesian network" with the joint density (since the density factorizes with respect to both graphs), but only one graph is a causal graph. If you want to talk about causal models, in addition to saying that there is a Markov factorization you also need to say something else -- something that makes parents into direct causes. Usually people say something like:
for every x, p(x | pa(x)) = p(x | do(pa(x))), or mention the g-formula, or the truncated factorization of do(.), or "the causal Markov condition."
But this is something that (a) you need to say explicitly, and (b) involves language beyond standard probability theory because there is a do(.), and (c) is controversial to some people. What is do(.)? It refers to a hypothetical experiment/intervention.
If all you are learning is a graph that gives you a Markov factorization you have no business making claims about interventions -- interventions are a separate magisterium. You can assume that the unknown graph from which the data came is causal -- but you need to say this explicitly, this assumption will be controversial to some people, and by making that assumption you are I think committing yourself to the use of interventionist/potential outcome language (just to describe what it means for a data generating graph to be causal).
I have no problems with you doing Bayesian updating and getting posteriors over causal models -- I just wanted to get more precision on what a causal model is. A causal model is not a density factorizing with respect to a DAG -- that's a statistical model. A causal model makes assertions that relate hypothetical experiments like p(x | do(pa(x))) with observed data like p(x | pa(x)). So your Bayesian updating is operating in a world that contains more than just probability theory (which is a theory of standard joint densities, without the mention of do(.) or hypothetical experiments). You can in fact augment probability theory with a logical description of interventions, see for example this paper:
http://www.jair.org/papers/paper648.html
If your notion of causal model does not relate do(.) to observed data, then I don't know what you mean by a causal model. It's certainly not what I mean by it.
a -> b -> c -> d, with a <- u1 -> c, and a <- u2 -> d, where we do not observe u1,u2, and u1,u2 are very complicated, then we can figure out the true graph exactly by independence type techniques without having to observe u1 and u2. Note: the marginal distribution p(a,b,c,d) that came from this graph has no conditional independences at all (checkable by d-separation on a,b,c,d), so typical techniques fail.
Irrelevant question: Isn't (b || d) | a, c?
Part of the sequence: Rationality and Philosophy
Thomas Kelly
Jason Brennan
After millennia of debate, philosophers remain heavily divided on many core issues. According to the largest-ever survey of philosophers, they're split 25-24-18 on deontology / consequentialism / virtue ethics, 35-27 on empiricism vs. rationalism, and 57-27 on physicalism vs. non-physicalism.
Sometimes, they are even divided on psychological questions that psychologists have already answered: Philosophers are split evenly on the question of whether it's possible to make a moral judgment without being motivated to abide by that judgment, even though we already know that this is possible for some people with damage to their brain's reward system, for example many Parkinson's patients, and patients with damage to the ventromedial frontal cortex (Schroeder et al. 2012).1
Why are physicists, biologists, and psychologists more prone to reach consensus than philosophers?2 One standard story is that "the method of science is to amass such an enormous mountain of evidence that... scientists cannot ignore it." Hence, religionists might still argue that Earth is flat or that evolutionary theory and the Big Bang theory are "lies from the pit of hell," and philosophers might still be divided about whether somebody can make a moral judgment they aren't themselves motivated by, but scientists have reached consensus about such things.
In its dependence on masses of evidence and definitive experiments, science doesn't trust your rationality:
Sometimes, you can answer philosophical questions with mountains of evidence, as with the example of moral motivation given above. But or many philosophical problems, overwhelming evidence simply isn't available. Or maybe you can't afford to wait a decade for definitive experiments to be done. Thus, "if you would rather not waste ten years trying to prove the wrong theory," or if you'd like to get the right answer without overwhelming evidence, "you'll need to [tackle] the vastly more difficult problem: listening to evidence that doesn't shout in your ear."
This is why philosophers need rationality training even more desperately than scientists do. Philosophy asks you to get the right answer without evidence that shouts in your ear. The less evidence you have, or the harder it is to interpret, the more rationality you need to get the right answer. (As likelihood ratios get smaller, your priors need to be better and your updates more accurate.)
Because it tackles so many questions that can't be answered by masses of evidence or definitive experiments, philosophy needs to trust your rationality even though it shouldn't: we generally are as "stupid and self-deceiving" as science assumes we are. We're "predictably irrational" and all that.
But hey! Maybe philosophers are prepared for this. Since philosophy is so much more demanding of one's rationality, perhaps the field has built top-notch rationality training into the standard philosophy curriculum?
Alas, it doesn't seem so. I don't see much Kahneman & Tversky in philosophy syllabi — just light-weight "critical thinking" classes and lists of informal fallacies. But even classes in human bias might not improve things much due to the sophistication effect: someone with a sophisticated knowledge of fallacies and biases might just have more ammunition with which to attack views they don't like. So what's really needed is regular habits training for genuine curiosity, motivated cognition mitigation, and so on.
(Imagine a world in which Frank Jackson's famous reversal on the knowledge argument wasn't news — because established philosophers changed their minds all the time. Imagine a world in which philosophers were fine-tuned enough to reach consensus on 10 bits of evidence rather than 1,000.)
We might also ask: How well do philosophers perform on standard tests of rationality, for example Frederick (2005)'s CRT? Livengood et al. (2010) found, via an internet survey, that subjects with graduate-level philosophy training had a mean CRT score of 1.32. (The best possible score is 3.)
A score of 1.32 isn't radically different from the mean CRT scores found for psychology undergraduates (1.5), financial planners (1.76), Florida Circuit Court judges (1.23), Princeton Undergraduates (1.63), and people who happened to be sitting along the Charles River during a July 4th fireworks display (1.53). It is also noticeably lower than the mean CRT scores found for MIT students (2.18) and for attendees to a LessWrong.com meetup group (2.69).
Moreover, several studies show that philosophers are just as prone to particular biases as laypeople (Schulz et al. 2011; Tobia et al. 2012), for example order effects in moral judgment (Schwitzgebel & Cushman 2012).
People are typically excited about the Center for Applied Rationality because it teaches thinking skills that can improve one's happiness and effectiveness. That excites me, too. But I hope that in the long run CFAR will also help produce better philosophers, because it looks to me like we need top-notch philosophical work to secure a desirable future for humanity.3
Next post: Train Philosophers with Pearl and Kahneman, not Plato and Kant
Previous post: Intuitions Aren't Shared That Way
Notes
1 Clearly, many philosophers have advanced versions of motivational internalism that are directly contradicted by these results from psychology. However, we don't know exactly which version of motivational internalism is defended by each survey participant who said they "accept" or "lean toward" motivational internalism. Perhaps many of them defend weakened versions of motivational internalism, such as those discussed in section 3.1 of May (forthcoming).
2 Mathematicians reach even stronger consensus than physicists, but they don't appeal to what is usually thought of as "mountains of evidence." What's going on, there? Mathematicians and philosophers almost always agree about whether a proof or an argument is valid, given a particular formal system. The difference is that a mathematician's premises consist in axioms and in theorems already strongly proven, whereas a philosopher's premises consist in substantive claims about the world for which the evidence given is often very weak (e.g. that philosopher's intuitions).
3 Bostrom (2000); Yudkowsky (2008); Muehlhauser (2011).