You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

FrameBenignly comments on Open thread, Dec. 21 - Dec. 27, 2015 - Less Wrong Discussion

2 Post author: MrMind 21 December 2015 07:56AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (230)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 22 December 2015 08:50:04PM 7 points [-]

Correlation!=causation: returning to my old theme (latest example: is exercise/mortality entirely confounded by genetics?), what is the right way to model various comparisons?

By which I mean, consider a paper like "Evaluating non-randomised intervention studies", Deeks et al 2003 which does this:

In the systematic reviews, 8 studies compared results of randomised and non-randomised studies across multiple interventions using metaepidemiological techniques. A total of 194 tools were identified that could be or had been used to assess non-randomised studies. 60 tools covered at least 5 of 6 pre-specified internal validity domains. 14 tools covered 3 of 4 core items of particular importance for non-randomised studies. 6 tools were thought suitable for use in systematic reviews. Of 511 systematic reviews that included nonrandomised studies, only 169 (33%) assessed study quality. 69 reviews investigated the impact of quality on study results in a quantitative manner. The new empirical studies estimated the bias associated with non-random allocation and found that the bias could lead to consistent over- or underestimations of treatment effects, also the bias increased variation in results for both historical and concurrent controls, owing to haphazard differences in case-mix between groups. The biases were large enough to lead studies falsely to conclude significant findings of benefit or harm. ...Conclusions: Results of non-randomised studies sometimes, but not always, differ from results of randomised studies of the same intervention. Nonrandomised studies may still give seriously misleading results when treated and control groups appear similar in key prognostic factors. Standard methods of case-mix adjustment do not guarantee removal of bias. Residual confounding may be high even when good prognostic data are available, and in some situations adjusted results may appear more biased than unadjusted results.

So we get pairs of studies, more or less testing the same thing except one is randomized and the other is correlational. Presumably this sort of study-pair dataset is exactly the kind of dataset we would like to have if we wanted to learn how much we can infer causality from correlational data.

But how, exactly, do we interpret these pairs? If one study finds a CI of 0-0.5 and the counterpart finds 0.45-1.0, is that confirmation or rejection? If one study finds -0.5-0.1 and the other 0-0.5, is that confirmation or rejection? What if they are very well powered and the pair looks like 0.2-0.3 and 0.4-0.5? A criterion of overlapping confidence intervals is not what we want.

We could try to get around it by making a very strict criterion: 'what fraction of pairs have confidence intervals excluding zero for both studies, and the studies are opposite signed?' This seems good: if one study 'proves' that X is helpful and the other study 'proves' that X is harmful, then that's as clearcut a case of correlation!=causation as one could hope for. With a pair of studies like -0.5/-0.1 and +0.1-+0.5, that is certainly a big problem.

The problem with that is that it is so strict that we would hardly ever conclude a particular case was correlation!=causation (few of the known examples are so wellpowered clearcut), leading to systematic overoptimism, and it inherits the typical problems of NHST like generally ignoring costs (if exercise reduces mortality by 50% in correlational studies and 5% in randomized studies, then to some extent correlation=causation but the massive overestimate could easily tip exercise from being worthwhile to not being worthwhile).

We also can't simply do a two-group comparison and get a result like 'correlational studies always double the effect on average, so to correct, just halve the effect and then see if that is still statistically-significant', which is something you can do with, say, blinding or publication bias because it turns out to not be that conveniently simple - it's not an issue of researchers predictably biasing ratings toward the desired higher outcome or publishing only the results/studies which show the desired results. The randomized experiments seem to turn in larger, smaller, or opposite-signed results at, well, random.

This is a similar problem as with the Reproducibility Project: we would like the replications of the original psychology studies to tell us, in some sense, how 'trustworthy' we can consider psychology studies in general. But most of the methods seem to diagnose lack of power as much as anything (the replications were generally powered 80%+, IIRC, which still means that a lot will not be statistically-significant even if the effect is real). Using Bayes factors is helpful in getting us away from p-values but still not the answer.

It might help to think about what is going on in a generative sense. What do I think creates these results? I would have to say that the results are generally being driven by a complex causal network of genes, biochemistry, ethnicity, SES, varying treatment methods etc which throws up an even more complex & enormous set of multivariate correlations (which can be either positive or negative), while effective interventions are few & rare (likewise, can be both positive or negative) but drive the occasional correlation as well. When a correlation is presented by a researcher as an effective intervention, it might be drawn from the large set of pure correlations or it might have come from the set of causals. It is unlabeled and we are ignorant of which group it came from. There is no oracle which will tell us that a particular correlation is or is not causal (that would make life too easy), but then (in this case) we can test it, and get a (usually small) amount of data about what it does in a randomized setting. How do we analyze this?

I would say that what we have here is something quite specific: a mixture model. Each intervention has been drawn from a mixture of two distributions, all-correlation (with a wide distribution allowing for many large negative & positive values) and causal effects (narrow distribution around zero with a few large values), but it's unknown which of the two it was drawn from and we are also unsure what the probability of drawing from one or the other is. (The problem is similar to my earlier noisy polls: modeling potentially falsified poll data.)

So when we run a study-pair through this, then if they are not very discrepant, the posterior estimate shifts towards having drawn from the causal group in that case - and also slightly increases the overall estimate of the probability of drawing from the causal group; and vice-versa if they are heavily discrepant, in which case it becomes much more probable that there was a draw from the correlational group, and slightly more probable that draws from the correlation group are more common. At the end of doing this for all the study-pairs, we get estimates of causal/correlation posterior probability for each particular study-pair (which automatically adjusts for power etc and can be further used for decision-theory like 'does this reduce the expected value of the specific treatment of exercise to <=$0?), but we also get an overall estimate of the switching probability - which tells us in general how often we can expect tested correlations like these to be causal.

I think this gives us everything we want. Working with distributions avoids the power issues, for any specific treatment we can give estimates of being causal, we get an overall estimate as a clear unambiguous probability, etc.

Comment author: FrameBenignly 22 December 2015 11:34:09PM *  0 points [-]

You're using correlation in what I would consider a weird way. Randomization is intended to control for selection effects to reduce confounds, but when somebody says correlational study I get in my head that they mean an observational study in which no attempt was made to determine predictive causation. When an effect shows up in a nonrandomized study, it's not that you can't determine whether the effect was causative; it's that it's more difficult to determine whether the causation was due to the independent variable or an extraneous variable unrelated to the independent variable. It's not a question of whether the effect is due to correlation or causation, but whether the relationship between the independent and dependent variable even exists at all.

Comment author: Anders_H 23 December 2015 12:41:05AM *  2 points [-]

(1) Observational studies are almost always attempts to determine causation. Sometimes the investigators try to pretend that they aren't, but they aren't fooling anyone, least of all the general public. I know they are attempting to determine causation because nobody would be interested in the results of the study unless they were interested in causation. Moreover, I know they are attempting to determine causation because they do things like "control for confounding". This procedure is undefined unless the goal is to estimate a causal effect

(2) What do you mean by the sentence "the study was causative"? Of course nobody is suggesting that the study itself had an effect on the dependent variable?

(3) Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists. The correlation is real, even if it is due to confounding. It just doesn't represent a causal effect

Comment author: Lumifer 23 December 2015 04:40:18PM *  2 points [-]

You are assuming a couple of things which are almost always true in your (medical) field, but are not necessarily true in general. For example,

Observational studies are almost always attempts to determine causation

Nope. Another very common reason is to create a predictive model without caring about actual causation. If you can't do interventions but would like to forecast the future, that's all you need.

Assuming that the statistics were done correctly and that the investigators have accounted for sampling variability, the relationship between the independent and dependent variable definitely exists.

That further assumes your underlying process is stable and is not subject to drift, regime changes, etc. Sometimes you can make that assumption, sometimes you cannot.

Comment author: Vaniver 23 December 2015 08:45:34PM *  1 point [-]

Another very common reason is to create a predictive model without caring about actual causation. If you can't do interventions but would like to forecast the future, that's all you need.

You'd also like a guarantee that others can't do interventions, or else your measure could be gamed. (But if there's an actual causal relationship, then 'gaming' isn't really possible.)

Comment author: FrameBenignly 23 December 2015 01:03:11AM 0 points [-]

(1) I just think calling a nonrandomized study a correlational study is weird.

(2) I meant to say effect; not study; fixed

(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I'm thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.

Comment author: Anders_H 23 December 2015 01:18:54AM 0 points [-]

I don't understand what you mean by "real relationship". I suggest tabooing the terms "real relationship" and "no relationship".

I am using the word "correlation" to discuss whether the observed variable X predicts the observed variable Y in the (hypothetical?) superpopulation from which the sample was drawn. Such a correlation can exist even if neither variable causes the other.

If X predicts Y in the superpopulation (regardless of causality), the correlation will indeed be real. The only possible definition I can think of for a "false" correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability. Statistical methodology is in general more than adequate to discuss whether the appearance of correlation in your sample is due to real correlation in the superpopulation. You do not need causal inference to reason about this question. Moreover, confounding is not relevant.

Confounding and causal inference are only relevant if you want to know whether the correlation in the superpopulation is due to the causal effect of X on Y. You can certainly define the causal effect as the "actual real relationship", but then I don't understand how it is distinct from causation.

Comment author: FrameBenignly 23 December 2015 04:01:22AM 0 points [-]

I just realized the randomized-nonrandomized study was just an example and not what you were talking about.