You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

RichardKennaway comments on Open thread, Dec. 21 - Dec. 27, 2015 - Less Wrong Discussion

2 Post author: MrMind 21 December 2015 07:56AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (230)

You are viewing a single comment's thread. Show more comments above.

Comment author: IlyaShpitser 29 December 2015 08:00:22PM *  2 points [-]

I think when you break it into two separate problems like that, you miss the point.

I am pretty sure I am not, but let's see. What you are basically saying is "analysis => synthesis doesn't work."

Combining two RCTs is reasonably well-solved by multilevel random effects models.

Hierarchical models are a particular parametric modeling approach for data drawn from multiple sources. People use this type of stuff to good effect, but saying it "solves the problem" here is sort of like saying linear regression "solves" RCTs. What if the modeling assumptions are wrong? What if you are not sure what the model should be?

I'm also not trying to solve the problem of inferring from a correlational dataset to specific causal models, which > seems well in hand by Pearlean approaches.

Let's call them "interventionist approaches." Pearl is just the guy people here read. People have been doing causal analysis from observational data since at least the 70s, probably earlier in certain special cases.

I'm trying to bridge between the two: assume a specific generative model for correlation vs causation and then > infer the distribution.

Ok.

But this is exactly the problem! Apparently, there is no meaningful 'average causal effect' between correlational and causational studies.

This is what we should talk about.

If there is one RCT, we have a treatment A (with two levels a, and a') and outcome Y. Of interest is outcome under hypothetical treatment assignment to a value, which we write Y(a) or Y(a'). "Average causal effect" is E[Y(a)] - E[Y(a')]. So far so good.

If there is one observational study, say A is assigned based on C, and C affects Y, what is of interest is still Y(a) or Y(a'). Interventionist methods would give you a formula for E[Y(a)] - E[Y(a')] in terms of p(A,C,Y). You can then construct an estimator for that formula, and life is good. So far so good.

Note that so far I made no modeling assumptions on the relationship of A and Y at all. It's all completely unrestricted by choice of statistical model. I can do crazy non-parametric random forest to model the relationship of A and Y if I wanted. I can do linear regression. I can do whatever. This is important -- people often smuggle in modeling assumptions "too soon." When we are talking about prediction problems like in machine learning, that's ok. We don't care about modeling too much we just want good predictive performance. When we care about effects, the model is important. This is because if the effect is not strong and your model is garbage, it can mislead you.


If there are two RCTs, we have two sets of outcomes: Y1(a), Y1(a') and Y2(a), Y2(a'). Even here, there is no one causal effect so far. We need to make some sort of assumption on how to combine these. For example, we may try to generalize regression models, and say that a lot of the way A affects Y is the same regression across the two studies, but some of the regression terms are allowed to differ to model population heterogeneity. This is what hierarchical models do.

In general we have E[f(Y1(a), Y2(a))] - E[f(Y1(a'),Y2(a'))], for some f(.,.) that we should justify. At this level, things are completely non-parametric. We can model the relationship of A and Y1,Y2 however we want. We can model f however we want.


If we have one RCT and one observational study, we still have Y1(a), Y1(a') for the RCT, and Y2(a), Y2(a') for the observational study. To determine the latter we use "interventionist approaches" to express them in terms of observational data. We then combine things using f(.,.) as before. As before we should justify all the modeling we are doing.


I am pretty sure Barenboim thought about this stuff (but he doesn't do statistical inference, just the general setup).

Comment author: RichardKennaway 30 December 2015 09:28:53AM *  1 point [-]

Pearl is just the guy people here read.

Is there anyone you would recommend studying in addition?

Comment author: IlyaShpitser 31 December 2015 08:47:52PM *  1 point [-]

Depends on what you want. It doesn't matter "who has priority" when it comes to learning the subject. Pearl's book is good, but one big disadvantage of reading just Pearl is Pearl does not deal with the statistical inference end of causal inference very much (by choice). Actually, I heard Pearl has a new book in the works, more suitable for teaching.

But ultimately we must draw causal conclusions from actual data, so statistical inference is important. Some big names that combine causal and statistical inference: Jamie Robins, Miguel Hernan, Eric Tchetgen Tchetgen, Tyler VanderWeele (Harvard causal group), Mark van der Laan (Berkeley), Donald Rubin et al (Harvard), Frangakis, Rosenblum, Scharfstein, etc. (Johns Hopkins causal group), Andrea Rotnitzky (Harvard), Susan Murphy (Michigan), Thomas Richardson (UW), Phillip Dawid (Cambridge, but retired, incidentally the inventor of conditional independence notation). Lots of others.

I believe Stephen Cole posts here, and he does this stuff also (http://sph.unc.edu/adv_profile/stephen-r-cole-phd/).


Miguel Hernan and Jamie Robins are working on a new causal inference book that is more statistical, might be worth a look. Drafts available online:

http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/