I'm a little ashamed to admit I only read "Why Correlation Usually ≠ Causation" yesterday. It's very, very good, and you should read it too.
My essential takeaway from it is this: You can find nonzero correlations between almost anything you care to measure. However, it seems unlikely that the number of causal relationships in the universe scales at all proportionally to the number of correlative ones in the Universe.
This question feels like the wrong one to be asking to me, somehow. It feels ontology-flavored, in a way that doesn't make it a great match for how I normally think about statistics, and I would appreciate some book recommendations on the subject in the comments. But first, let me try to explain my thinking on this.
Start with the "base" layer of reality, the movement of atoms, or electrons, or strings, or what-have-you. If we are watching the actions and reactions of that layer from afar, then it seems to me that we have the best possible environment for doing a few experiments to first demonstrate correlation, and then a few more to demonstrate causation afterwards. While we can never be 100% sure, we can asymptotically reach certainty in that world. So far, so good; there's a reason experimental physics can get so precise with its predictions.
When you go one layer of abstraction up -- to molecules, if our base layer was "atoms" -- it seems to me that suddenly the difficulty of ascertaining causation should skyrocket. There are many more confounding variables and possibilities, that make designing an adequate experiment much harder. In addition, it is harder to define "molecule" precisely than it was to define "atom". How far do we move the constituent atoms apart before we turn a molecule into a non-molecule, for example? That seems like a question that you have to sometimes answer differently depending on different scenarios.
The experiments you run for correlation between molecules, on the other hand, might be harder, but I don't get the feeling they experience the same kind of... Combinatorial-superexplosion-y additional difficulty that an experiment designed for causation has to handle.
You should probably try to account for things like thermal noise, and trace impurities and the like, if you have to, but past a certain point it's sort of okay to let go of the reins and just say "We can do more correlation tests later". The claim underneath that claim being that, those things which muck up the data are mostly due to random chance, and if we do the experiment again under different conditions, we will get a different set of random circumstances wrapping around the experiment.
This problem feels like it recurs every time you go up a level, which is why it concerns me so much. When you get to the level of dealing with human beings in medicine, it feels to me the difficulty of determining causation must be so vast as to be almost not worth the effort; and yet, at the same time, that intuition feels clearly wrong, because there was a lot of low-hanging fruit in the world of medicine -- vaccines being the example par excellence. But on the other hand, vaccines operate on a relatively simple causal mechanism! Maybe it shouldn't be surprising that such low hanging fruit exists; what would be truly impressive would be if we found an easy cure to some disease founded upon principles that only show themselves at the level of reasoning about humans themselves, the same way we usually reason about molecules-as-primitives instead of atoms-as-primitives when we start to do biochemistry
I apologize if this isn't a terribly clear explanation of what I'm getting at. If anything in here strikes you as similar to a problem you have thought about yourself and have read up on, let me know. At the least, I should be able to come back within a few month and be able to properly pose my question.
On a book recommendation, the Book of Why (review here) gives a well explained intro to some modern (or maybe the cool kids have already moved on to something else) reasoning about differentiating causation and correlation.
I'm a little ashamed to admit I only read "Why Correlation Usually ≠ Causation" yesterday. It's very, very good, and you should read it too.
My essential takeaway from it is this: You can find nonzero correlations between almost anything you care to measure. However, it seems unlikely that the number of causal relationships in the universe scales at all proportionally to the number of correlative ones in the Universe.
This question feels like the wrong one to be asking to me, somehow. It feels ontology-flavored, in a way that doesn't make it a great match for how I normally think about statistics, and I would appreciate some book recommendations on the subject in the comments. But first, let me try to explain my thinking on this.
Start with the "base" layer of reality, the movement of atoms, or electrons, or strings, or what-have-you. If we are watching the actions and reactions of that layer from afar, then it seems to me that we have the best possible environment for doing a few experiments to first demonstrate correlation, and then a few more to demonstrate causation afterwards. While we can never be 100% sure, we can asymptotically reach certainty in that world. So far, so good; there's a reason experimental physics can get so precise with its predictions.
When you go one layer of abstraction up -- to molecules, if our base layer was "atoms" -- it seems to me that suddenly the difficulty of ascertaining causation should skyrocket. There are many more confounding variables and possibilities, that make designing an adequate experiment much harder. In addition, it is harder to define "molecule" precisely than it was to define "atom". How far do we move the constituent atoms apart before we turn a molecule into a non-molecule, for example? That seems like a question that you have to sometimes answer differently depending on different scenarios.
The experiments you run for correlation between molecules, on the other hand, might be harder, but I don't get the feeling they experience the same kind of... Combinatorial-superexplosion-y additional difficulty that an experiment designed for causation has to handle.
You should probably try to account for things like thermal noise, and trace impurities and the like, if you have to, but past a certain point it's sort of okay to let go of the reins and just say "We can do more correlation tests later". The claim underneath that claim being that, those things which muck up the data are mostly due to random chance, and if we do the experiment again under different conditions, we will get a different set of random circumstances wrapping around the experiment.
This problem feels like it recurs every time you go up a level, which is why it concerns me so much. When you get to the level of dealing with human beings in medicine, it feels to me the difficulty of determining causation must be so vast as to be almost not worth the effort; and yet, at the same time, that intuition feels clearly wrong, because there was a lot of low-hanging fruit in the world of medicine -- vaccines being the example par excellence. But on the other hand, vaccines operate on a relatively simple causal mechanism! Maybe it shouldn't be surprising that such low hanging fruit exists; what would be truly impressive would be if we found an easy cure to some disease founded upon principles that only show themselves at the level of reasoning about humans themselves, the same way we usually reason about molecules-as-primitives instead of atoms-as-primitives when we start to do biochemistry
I apologize if this isn't a terribly clear explanation of what I'm getting at. If anything in here strikes you as similar to a problem you have thought about yourself and have read up on, let me know. At the least, I should be able to come back within a few month and be able to properly pose my question.