Open thread, Dec. 21 - Dec. 27, 2015

MrMind

I would advise thinking about these problems separately, that is start trying to solve combining two RCTs.

I think when you break it into two separate problems like that, you miss the point. Combining two RCTs is reasonably well-solved by multilevel random effects models. I'm also not trying to solve the problem of inferring from a correlational dataset to specific causal models, which seems well in hand by Pearlean approaches. I'm trying to bridge between the two: assume a specific generative model for correlation vs causation and then infer the distribution.

How do we combine them into a single conclusion (let's say the "average causal effect": difference in outcome means under treatment vs placebo)?

But this is exactly the problem! Apparently, there is no meaningful 'average causal effect' between correlational and causational studies. In one study, it was much larger; in the next, it was a little smaller; in the next, it was much smaller; in the one after that, the sign reversed... If you create a regular multilevel meta-analysis of a bunch of randomized and correlational studies, say, and you toss in a fixed-effect covariate and regress 'Y ~ Randomized', you get an estimate of ~0. The actual effect in each case may be quite large, but the average over all the studies is a wash.

This is different from other methodological problems. With placebos, there is a predictable systematic bias which gives you a large positive bias. Likewise, publication bias skews effects up. Likewise, non-blinding of raters. And so on and so forth. You can easily estimate with an additive fixed-effect / linear model and correct for particular biases. But with random vs correlation, it seems that there's no particular direction the effects head in, you just know that whatever they are, they'll be different from your correlational results. So you need to do something more imaginative in modeling.

But I think a more helpful way to go is to ignore sampling variability entirely, and just start with two joint distributions P1 and P2 that represent variables in your two studies (in other words you assume infinite sample size, so you get the distributions exactly).

OK, let's imagine all our studies are infinite sized. I collect 5 study-pairs, correlational vs randomized, d effect size:

0.5 vs 0.1 (difference: 0.4)
-0.22 vs -0.22 (difference: 0)
0.8 vs -0.2 (difference: -1.0)
0.3 vs 0.3 (difference: 0
0.5 vs -0.1 (difference: 0.6)

I apply my mixture model strategy.

We see that in study #2 and #4, the correlational and causal effects are identical, 100% confidence, and thus both were drawn from the randomized distribution. With two datapoints -0.22 and 0.3, we begin to infer that the distribution of causal effects is probably fairly narrow around 0 and we update our normal distribution appropriately to be skeptical about any claims of large causal effects.

We see in study #1, #3, and #5, that the correlational and causal effects differ, 100% confidence, and thus we know that the correlational effect for that particular treatment was drawn from the general correlational distribution. The correlational effects are .5, -.8. .5 - all quite large, and so we infer that correlational effects tend to be quite large and its distribution has a large standard deviation (or whatever).

We then note that in 2/5 of the pairs, the correlational effect was the causal effect, and so we estimate that the probability of a correlational effect having been drawn from the causal distribution rather than the correlation distribution is P=2/5. Or in other words, correlation=causality 40% of the time. However, if we had tried to calculate an additive variable like in a meta-regression, we would find that the Randomized covariate was estimated at exactly 0 (mean(c(0.4, 0, -1.0, 0, 0.6)) ~> [1] 0) and certainly is not statistically-significant.

Now when someone comes to us with an infinite-sized correlational trial that purified Egyptian mummy reduces allergy symptoms by d=0.5, we feed it into our mixture model and we get a useful posterior distribution which exhibits a bimodal pattern where it is heavily peaked at 0 (reflecting the more-likely-than-not scenario that mummy is mummery) but also peaked at d=0.4 or so, reflecting shrinkage of the scenario that mummy is munificent, which will predict better than if we naively tried to just shift the d=0.5 posterior distribution up or down some units.

The problem with real studies is that they are not infinitely sized, so when the point-estimates disagree and we get 0.45 vs 0.5, obviously we cannot strongly conclude which distribution in the mixture it was drawn from, and so we need to propagate that uncertainty through the whole model and all its parameters.