cousin_it comments on Thermodynamics of Intelligence and Cognitive Enhancement - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (69)
wiki
Does it make sense to control for education and socioeconomic status when measuring the effect of intelligence?
Not in this case. This is a good example of how you can go wrong by overcontrolling (or maybe we should chalk this up as an example of how correlations!=causations?)
Suppose the causal model of Genes->Intelligence->Education->Less-Reproduction is true (and there are no other relationships). Then if we regress on Less-Reproduction and include Intelligence & Education as predictors, we discover that after controlling for Education, Intelligence adds no predictive value & explains no variance & is uncorrelated with Less-Reproduction. Sure, of course: all Intelligence is good for is predicting Education, but we already know each individual's Education. This is an interesting and valid result worth further research in our hypothetical world.
Does this mean dysgenics will be false, since the coefficient of Intelligence is estimated at ~0 by our little regression formula? Nope! We can get dysgenics easily: people with high levels of Genes will cause high levels of Intelligence, which will cause high levels of Education, which will cause high levels of Less-Reproduction, which means that their genes will be be selected against and the next generation start with lower Genes. Even though it's all Education's fault for causing Less-Reproduction, it's still going to hammer the Genes.
I don't know if this sort of problem has a widely-known name (IlyaShpitser might know one); I've seen it described in some papers but without a specific term attached, for example, "Let’s Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong":
Epidemiology seems to call this "overadjustment"; for example, "Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies":
Hi, sorry I missed this post earlier. Yes, this is sometimes called overadjustment. Their definition of overadjustment is incomplete -- they are missing the case where there is a variable associated with both exposure and outcome, is not an intermediate variable, but adjusting for it increases bias anyways. This case has a different name, M-bias, and occurs for instance in this graph:
A -> Y <- H1 -> M <- H2 -> A
Say we do not observe H1, H2, and A is our exposure (treatment), Y is our outcome. The right thing to do here is to not adjust for M. It's called "M-bias" because the part of this graph involving H variables kind of looks like an M, if you draw it using the standard convention of unobserved confounders on top.
But there is a wider problem here than this, because sometimes what you are doing is 'adjusting for confounders,' but in reality you shouldn't even be using the formula that adjusting for confounders gives you, but use another formula. This happens for example with longitudinal studies (with a non-genetic treatment that is vulnerable to confounders over time). In such studies you want to use something called the g-computation algorithm instead of adjusting for confounders.
I guess if I were to name the resulting bias, it would be "causal model misspecification bias." That is, you are adjusting for confounders in a particular way because you think the true causal model is a certain way, but you are wrong about that -- the model is actually different and the causal effect requires a different approach from what you are using.
I have a paper with Tyler Vanderweele and Jamie Robins that characterizes exactly what has to be true on the graph for adjustment to be valid for causal effects. So you will get bias from adjustment (for a particular set) if and only if the condition in the paper does not hold for your model.