FrameBenignly comments on Open thread, Dec. 21 - Dec. 27, 2015 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (230)
You're using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it's not that you can't determine whether the effect was causative; it's that it's more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated to the independent variable. It's not a question of whether the effect is due to correlation or causation, but whether the relationship between the independent and dependent variable even exists at all.
(1) Observational studies are almost always attempts to determine causation. Sometimes the investigators try to pretend that they aren't, but they aren't fooling anyone, least of all the general public. I know they are attempting to determine causation because nobody would be interested in the results of the study unless they were interested in causation. Moreover, I know they are attempting to determine causation because they do things like "control for confounding". This procedure is undefined unless the goal is to estimate a causal effect
(2) What do you mean by the sentence "the study was causative"? Of course nobody is suggesting that the study itself had an effect on the dependent variable?
(3) Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists. The correlation is real, even if it is due to confounding. It just doesn't represent a causal effect
You are assuming a couple of things which are almost always true in your (medical) field, but are not necessarily true in general. For example,
Nope. Another very common reason is to create a predictive model without caring about actual causation. If you can't do interventions but would like to forecast the future, that's all you need.
That further assumes your underlying process is stable and is not subject to drift, regime changes, etc. Sometimes you can make that assumption, sometimes you cannot.
You'd also like a guarantee that others can't do interventions, or else your measure could be gamed. (But if there's an actual causal relationship, then 'gaming' isn't really possible.)
(1) I just think calling a nonrandomized study a correlational study is weird.
(2) I meant to say effect; not study; fixed
(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I'm thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.
I don't understand what you mean by "real relationship". I suggest tabooing the terms "real relationship" and "no relationship".
I am using the word "correlation" to discuss whether the observed variable X predicts the observed variable Y in the (hypothetical?) superpopulation from which the sample was drawn. Such a correlation can exist even if neither variable causes the other.
If X predicts Y in the superpopulation (regardless of causality), the correlation will indeed be real. The only possible definition I can think of for a "false" correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability. Statistical methodology is in general more than adequate to discuss whether the appearance of correlation in your sample is due to real correlation in the superpopulation. You do not need causal inference to reason about this question. Moreover, confounding is not relevant.
Confounding and causal inference are only relevant if you want to know whether the correlation in the superpopulation is due to the causal effect of X on Y. You can certainly define the causal effect as the "actual real relationship", but then I don't understand how it is distinct from causation.
I just realized the randomized-nonrandomized study was just an example and not what you were talking about.