Causal networks do not make an iid assumption.
Yeah, I guess that's way too strong; there are a lot of alternative assumptions also that justify using them.
What is a sample? How do we know two numbers (or other strings) came from the same sample?
I think we just have to assume this problem solved. Whenever we use causal networks in practice, we know what a sample is. You can try to weaken this and see if you still get anything useful, but this is very different then 'conditioning on time' as you present in the post.
Since the association contains information separate from the values themselves, how can we incorporate that information into the framework explicitly?
Bayes theorem? If we have a strong enough prior and enough information to reverse-engineer the association reasonably well, then we might be able to learn something. If you're running a clinical trial and you recorded which drugs were given out, but not to which patients, then you need other information, such as a prior about which side-effects they cause and measurements of side-effects that are associated with specific patients. Otherwise you just don't have the data necessary to construct the model.
Exactly! We want to incorporate the association information using Bayes theorem. If you have zero information about the mapping, then your knowledge is invariant under permutations of the data sets (e.g., swapping T0 with T1). That implies that your prior over the associations is uniform over the possible permutations (note that a permutation uniquely specifies an association and vice versa). So, when calculating the correlation, you have to average over all permutations, and the correlation turns out to be identically zero for all possible data. No associ...
In a recent comment, I suggested that correlations between seemingly unrelated periodic time series share a common cause: time. However, the math disagrees... and suggests a surprising alternative.
Imagine that we took measurements from a thermometer on my window and a ridiculously large tuning fork over several years. The first set of data is temperature T over time t, so it looks like a list of data points [(t0, T0), (t1, T1), ...]. The second set of data is mechanical strain e in the tuning fork over time, so it looks like a list of data points [(t0, e0), (t1, e1), ...]. We line up the temperature and strain data according to time, yielding [(T0, e0), (T1, e1), ...] and find a significant correlation between the two, since they happen to have similar periodicity.
Recalling Judea Pearl, we suggest that there is almost certainly some causal relationship between the temperature outside the window and the strain in the ridiculously large tuning fork. Common sense suggests that neither causes the other, so perhaps they have some common cause? The only other variable in the problem is time, so perhaps time is the common cause. This sort of makes sense, since changes in time intuitively seem to cause the changes in temperature and strain.
Let's check that intuition with some math. First, imagine that we ignore the time data. Now we just have a bunch of temperature data points [T0, T1, ...] and strain data points [e0, e1, ...]. In fact, in order to truly ignore time data, we cannot even order the points according to time! But that means that we no longer have any way to line up the points T0 with e0, T1 with e1, etc. Without any way to match up temperature points to corresponding strain points, the temperature and strain data are randomly ordered, and the correlation disappears!
We have just performed a d-separation. When time t was known (i.e., controlled for), the variables T and e were correlated. But when t was unknown, the variables were uncorrelated. Now, let's wave our hands a little and equate correlation with dependence. If time were a common cause of temperature and strain, then we should see that T and e are correlated without knowledge of time, but the correlation disappears when controlling for time. However, we see exactly the opposite structure: controlling for t induces the correlation. This pattern is called a "collider", and it implies that time is a common effect of temperature and strain. Rather than time causing the oscillations in our time series, the oscillations in our time series cause time.
Whoa. Now that the math has given us the answer, let's step back and try to make sense of it. Imagine that everything in the universe stopped moving for some time, and then went back to moving exactly as before. How could we measure how much time passed while the universe was stopped? We couldn't. For all practical purposes, if nothing changes, then time has stopped. Time, then, is an effect of motion, not vice versa. This is an old idea from philosophy/physics (I think I originally read it in one of Stephen Hawking's books). We've just rederived it.
But we may still wonder: what caused the correlation between temperature and strain? A common effect cannot cause a correlation, so where did it come from? The answer is that there was never any correlation between temperature and strain to begin with. Given just the temperature and strain data, with no information about time (e.g. no ordering or correspondence between points), there was no correlation. The correlation was induced by controlling for time. So the correlation is only logical; there is no physical cause relating the two, at least within our model.