The smoking node has a causal influence on the tar node, but there's also a random factor.
I don't see how this is true of either approach.
Let X_smokes and X_tar be the random variables associated with your nodes. Under the first approach, if there are no other "exogenous" Y-nodes, then there is a function f_tar such that X_tar = f_tar(X_smokes). Doesn't that mean that whether you have tar is entirely a function of whether you smoke?
Maybe I'm mistaken about what it means for one random variable to be a function of another. We can understand X_smokes and X_tar formally as functions from the sample space Ω of people* to the state space {0,1} of Boolean values, right? Usually, to say that one function f is a function of another function g is to say that, for some function F, f(x) = F(g(x)) for each element x of the domain. That is, the value of f at x is entirely determined by the value of g at x.
If this convention applies when the functions are random variables, then to say that X_tar = f_tar(X_smokes) is to say that, for each person 𝜔, X_tar(𝜔) = f_tar(X_smokes(𝜔)). Thus, for every smoker 𝜔, X_tar(𝜔) has the same value, namely f_tar(1). That is, the answer to whether a smoker has tar in their lungs is always the same. Similarly, among all nonsmokers, the answer f_tar(0) to whether they have tar in their lungs is always the same. Therefore, whether or not you smoke determines whether or not you have tar in your lungs.
Do people mean something different when they say that one random variable is a function of another? If so, what do they mean? If not, where is there room for a "random factor" when there are no exogenous Y-variables, even under the first approach described by Nielsen?
* ETA: I originally had the sample space Ω being the set of all possible worlds, which seems wrong on reflection.
Mostly the essay is careful not to flatly say that a node value X_1 is a function of a node value X_2. Sometimes it is a random function of X_2 (note the qualifier "random"), sometimes it is a function of X_2 and a random value Y_1, where Y_1 does not have its own node (so does not increase the size of the graph). And of course there is an exception when proposing the alternate approach, where the nodes are divided into the random ones, and those which are a deterministic function of other node values.
In my example, I would not say the tar node v...
Michael Nielsen has posted a long essay explaining his understanding of the Pearlean causal DAG model. I don't understand more than half, but that's much more than I got out of a few other papers. Strongly recommended for anyone interested in the topic.