I have been digging causality recently, and I cannot understand what the "do-operator" does. Like I understand how it relates to the notion of intervention, like "what happens to our system when this happens", but I don't understand the difference with a simple "conditioned on/knowing" operator.
I feel like the semantic difference is that the "knowing" operator always relies on the probability that X happens, whereas do(X) is more deterministic.
Is my intuition right or somehow right? Or did I get something wrong?
AFAIK the distinction is that:
For example, suppose hypothetically that I cook dinner every evening. And this process consists of these steps in order:
Some days I have lots of ingredients, and I prepare elaborate dinners. Other days I don't, and I make simple and easy dinners.
Now, suppose that on one particular evening, I am making instant ramen (). We're given no other info about this evening, but we know this.
What can we conclude from this? A lot, it turns out:
This is what happens when we condition on .
If instead we apply the do-operator to , then:
Concretely, this models a situation where I first survey my ingredients like usual, and am then forced to make instant ramen by some force outside the universe (i.e. outside our W/X/Y/Z causal diagram).
And this is a useful concept, because we often want to know what would happen if we performed just such an intervention!
That is, we want to know whether it's a good idea to add a new cause to the diagram, forcing some variable to have values we think lead to good outcomes.
To understand what would happen in such an intervention, it's wrong to condition on the outcome using the original, unmodified diagram – if we did that, we'd draw conclusions like "forcing me to make instant ramen would cause me to see relatively few ingredients on the shelves later, after dinner."