Previously submitted: http://lesswrong.com/lw/9jw/michael_nielsen_explains_judea_pearls_causality/
Looks promising, but requiring the graph to be acyclic makes it difficult to model processes where feedback is involved. A workaround would be treat each time stamp of a process as a different event. Have A(0)->B(1), where event A at time 0 affects event B at time 1, B(0)->A(1), A(0)->A(1), B(0)->B(1), A(t)->B(t+1), etc. But this gets unwieldy very quickly.
Your workaround is correct, and not as unwieldy as it may appear at first glance. A lot of people have been using causal diagrams with this structure very successfully in situations where the data generating mechanism has loops. As a starting point, see the literature on inverse probability weighting and marginal structural models.
Processes with feedback loops are, in fact, a primary motivation for using causal directed acyclic graphs. If there are no feedback loops, reasoning about causality is relatively simple even without graphs; whereas if there are loops, even very smart people will get it wrong unless they are able to analyze the situation in terms of the graphical concept of 'collider stratification bias'.
The correlation/causation conundrum is a particularly frustrating one in the social sciences due to the complex interaction of variables related to human experience.
I've found looking at time-order and thinking of variables-as-events is a helpful way to simplify experimental design seeking to get at causal mechanisms in my behavioral research.
Take the smoking example:
I would consider measuring changes in strength of correlation at various points in an ongoing experiment.
Once a baseline measurement is obtained from those already smoking subjects/participants, we measure the correlation between avg. number of cigarettes smoked per weak and lung capacity. This way one doesn't have to randomize or control, unethically asking people to smoke if they don't already. We already have a hypothesis based on the prior that volume of cigarettes smoked has a strong positive correlation with lung damage, and so reducing the number of cigarettes smoked would improve lung functioning in smokers.
But here we assume that the lifestyles of the smokers studied are relatively stable across the span of the experiment.
The researcher must take into account mediating factors that could impact lung functioning outside of smoking - i.e Intermittent exercise and lifestyle improvements.
In any case, following the same group of people over time is a lot easier than matching comparison groups by race/age/gender/education, or any of the other million human variables.
Once a baseline measurement is obtained from those already smoking subjects/participants, we measure the correlation between avg. number of cigarettes smoked per weak and lung capacity. This way one doesn't have to randomize or control, unethically asking people to smoke if they don't already. We already have a hypothesis based on the prior that volume of cigarettes smoked has a strong positive correlation with lung damage, and so reducing the number of cigarettes smoked would improve lung functioning in smokers.
It was not clear from this description what exactly your design was. Is it the case that you find some smokers, and then track the relationship between lung capacity and how much they smoke per week (which varies due to [reasons])? Or do you artificially reduce the nicotine intake in smokers (which is an ethical intervention)? Or what?
Seems like a much longer (and harder to read) version of Eliezer's Causal Model post. What can I expect to get out of this one that I wouldn't find in Eliezer's version?
Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.
-XKCD
http://www.gatsby.ucl.ac.uk/~zoubin/course05/BayesBall.pdf
Amusing name, linear time algorithm. Also amusingly I happen to have direct line of sight on the author while writing this post :).
In some sense, we know a priori that d-separation has to be linear time because it is a slightly fancy graph traversal. If you don't like Bayes Ball, you can use the moralization algorithm due to Lauritzen (described here:
http://www.stats.ox.ac.uk/~steffen/teaching/grad/graphicalmodels.pdf
see slide titled "alternative equivalent separation"), which is slightly harder to follow for an unaided human, but which has a very simple implementation (which reduces to a simple DFS traversal of an undirected graph you construct).
edit: fixed links, hopefully.
A post about how, for some causal models, causal relationships can be inferred without doing experiments that control one of the random variables.
If correlation doesn’t imply causation, then what does?