There is correlation without causation, but there is also causation without correlation. Why, when is the latter? Is there one reason or more and if so how can they be structured and by what? If one of the observables does not change, because there is a controlling observer (prediction+feedback), there is no way to establish correlation. I am displeased by bayesian probability combined with graphs (DAG), it so obviously lacks the nonlinear activation function. If two random binary streams feed into a XOR gate, the output is uncorrelated with anyone of the streams even though there is plenty of change to observe and perfect causality.
Correlation is linear. Many causal functions can be non-linear.
Think of medicine. X is the dosage, Y is the improvement of health. If the dose is too low, you will get no response. If the does is within a good range, health improves. If the does is too high, you will begin to get even sicker. If data was gathered all along this inverted parabola, the correlation might be zero. But there is still a causal relationship between health and dosage.
Thus you can have causation without correlation.
You can probably think of many such functions with diminishing or negative returns as the dosage increases, e.g. years of education vs. lifetime earnings.
Whether you see a positive, negative, or null correlation can depend on where you sample from the response function. In the "real world" data might be bunched up around certain regions of the response function. Thus for the "average person/instance" you can determine if there is a correlation or not, and then say this is basically the causal effect (for the average person/instance).
But if you want accuracy and precision over concision you will use a more complex model.
Concise models are better memes than complex models, however, and so we are flooded with linear models or binary models.
yes, but I am posing the WHY question. In this case it is just an averaging effect not a feedback controller.