To reductively explain causality, it has to be explained in non-causal terms, most likely in terms of total propability distributions. Pearl explains causality in terms of causal graphs which are created by conditionalizing the propability distribution on not , but . What does this mean? It's easy enough to explain in causal terms: You make it so occurs without changing any of its causal antecedents. But of course that fails to explain causality. How could it be explained without that?
Well, first off, Pearl would remind you that reduction doesn't have to mean probability distributions. If Markov models are simple explanations of our observations, then what's the problem with using them?
The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities, thereby identifying any function on causal graphs (like setting the value of a node without updating its parents) with an operator on probability distributions (given the graphical model). Note that in common syntax, "conditioning" on do()-ing something means applying the operator to the probability distribution. But you can google this or find it in Pearl's book Causality.
I'd just like you to think more about what you want from an "explanation." What is it you want to know that would make things feel explained?
I see no problem assuming that you start out with a prior over causal models - we do the same for propabilistic models after all. The question is how the updating works, and if, assuming the world has a causal structure, this way of updating can identify it.
... (read more)