gwern comments on Open thread, September 8-14, 2014 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (295)
This paper validates the approach (something a lot of people, for a lot of different reasons, were skeptical of), and even on its own merits we still get some predictive power out of it: the 3 top hits cover a range of ~1.5 points, and the 69 variants with 90% confidence predict even more. (I'm not sure how much since they don't bother to use all their data, but if we assume the 69 are evenly distributed between 0-0.5 points, then the mean is 0.25 and the total predictive power is more than a few points.)
What use is this result? Well, what use is a new-born baby? As the cryptographers say, 'attacks only get better'.
And, uh, why would you think that? There's no secret sauce here. Just take a lot of samples and run a regression. I don't think they even used anything particularly complex like a lasso or elastic net.
Pretend for a second it's a nutrition study and apply your usual scepticism :-) You know quite well that "just run a regression" is, um... rarely that simple.
To give one obvious example, interaction effects are an issue, including interaction between genes and the environment.
No, that's the great thing about genetic associations! First, genes don't change over a lifetime, so every association is in effect a longitudinal study where the arrow of time immediately rules out A<-B or reverse causation in which IQ somehow causes particular variants to be overrepresented; that takes out one of the three causal pathways. Then you're left with confounding - but there's almost no way for a third variable to pick out people with particular alleles and grant them higher intelligence, no greenbeard effect, and population differences are dealt with by using relatively homogenous samples & controlling for principal components - so you don't have to worry much about A<-C->B. So all you're left with is A->B.
But they're not. They're not a large part of what's going on. And they don't affect the associations you find through a straight analysis looking for additive effects.
But their expression does.
How do you know?
An expression in circumstances dictated by what genes one started with.
Because if they were a large part of what was going on, the estimates would not break down cleanly and the methods work so well.