I just noticed your edit:
edit to express what I meant better: "Do you have any examples where lack of statistical dependence coexists with causality, and this happens without path cancellations?"
The capacitor example is one: there is one causal arrow, so no multiple paths that could cancel, and no loops. The arrow could run in either direction, depending on whether the power supply is set up to generate a voltage or a current.
Of course, I is by definition proportional to dV/dt, and this is discoverable by looking at the short-term transient behaviour. But sampled on a long timescale you just get a sequence of i.i.d. independent pairs.
For cyclic graphs, I'm not sure how "path cancellation" is defined, if it is at all. The generic causal graph of the archetypal control system has arrows D --> P --> O and R --> O --> P, there being a cycle between P and O. The four variables are the Disturbance, the Perception, the Output, and the Reference.
If P = O+D, O is proportional to the integral of R-P, R = zero, and D is a signal varying generally on a time scale slower than the settling time of the loop, then O has a correlation with D close to -1, and O and D have correlations with P close to zero.
There are only two parameters, the settling time of the loop and the timescale of variations in D. So long as the former is substantially less than the latter, these correlations are unchanged.
Would you consider this an example of path cancellation? If so, what are the paths, and what excludes this system from the scope of theorems about faithfulness violations having measure zero? Not being a DAG is one reason, of course, but have any such theorems been extended to at least some class of cyclic graphs?
Addendum:
When D is a source with a long-term Gaussian distribution, the statistics of the system are multivariate Gaussian, so correlation coefficients capture the entire statistical dependence. Following your suggestion about non-parametric dependence tests I've run simulations in which D instead makes random transitions between +/- 1, and calculated statistics such as Kendall's tau, but the general pattern is much the same. The controller takes time to respond to the sudden transitions, which allows the zero correlations to turn into weak ones, but that only happens because the controller is failing to control at those moments. The better the controller works, the smaller the correlation of P with O or D.
I've also realised that "non-parametric statistics" is a subject like the biology of non-elephants, or the physics of non-linear systems. Shannon mutual information sounds in theory like the best possible measure, but for continuous quantities I can get anything from zero to perfect prediction of one variable from the other just by choosing a suitable bin size for the data. No statistical conclusions without statistical assumptions.
Dear Richard,
I have not forgotten about your paper, I am just extremely busy until early March. Three quick comments though:
(a) People have viewed cyclic models as defining a stable distribution in an appropriate Markov chain. There are some complications, and it seems with cyclic models (unlike the DAG case) the graph which predicts what happens after an intervention, and the graph which represents the independence structure of the equilibrium distribution are not the same graph (this is another reason to treat the statistical and causal graphical mod...
Another monthly installment of the rationality quotes thread. The usual rules apply: