gwern comments on Open thread, September 8-14, 2014 - Less Wrong

5 Post author: polymathwannabe 08 September 2014 12:31PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (295)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 11 September 2014 04:45:27PM *  4 points [-]

My baseline is much more narrow and technical. It is "we look at the the genome of a baby and have no idea what will be its IQ when it grows up". That is still largely the case and the paper's ability to forecast does not look impressive to me.

This paper validates the approach (something a lot of people, for a lot of different reasons, were skeptical of), and even on its own merits we still get some predictive power out of it: the 3 top hits cover a range of ~1.5 points, and the 69 variants with 90% confidence predict even more. (I'm not sure how much since they don't bother to use all their data, but if we assume the 69 are evenly distributed between 0-0.5 points, then the mean is 0.25 and the total predictive power is more than a few points.)

What use is this result? Well, what use is a new-born baby? As the cryptographers say, 'attacks only get better'.

I think getting there will take a bit more than just engineering.

And, uh, why would you think that? There's no secret sauce here. Just take a lot of samples and run a regression. I don't think they even used anything particularly complex like a lasso or elastic net.

Comment author: Lumifer 11 September 2014 05:11:32PM 3 points [-]

There's no secret sauce here. Just take a lot of samples and run a regression.

Pretend for a second it's a nutrition study and apply your usual scepticism :-) You know quite well that "just run a regression" is, um... rarely that simple.

To give one obvious example, interaction effects are an issue, including interaction between genes and the environment.

Comment author: gwern 11 September 2014 11:10:44PM *  8 points [-]

Pretend for a second it's a nutrition study and apply your usual scepticism :-) You know quite well that "just run a regression" is, um... rarely that simple.

No, that's the great thing about genetic associations! First, genes don't change over a lifetime, so every association is in effect a longitudinal study where the arrow of time immediately rules out A<-B or reverse causation in which IQ somehow causes particular variants to be overrepresented; that takes out one of the three causal pathways. Then you're left with confounding - but there's almost no way for a third variable to pick out people with particular alleles and grant them higher intelligence, no greenbeard effect, and population differences are dealt with by using relatively homogenous samples & controlling for principal components - so you don't have to worry much about A<-C->B. So all you're left with is A->B.

To give one obvious example, interaction effects are an issue, including interaction between genes and the environment.

But they're not. They're not a large part of what's going on. And they don't affect the associations you find through a straight analysis looking for additive effects.

Comment author: Lumifer 12 September 2014 12:46:32AM 3 points [-]

genes don't change over a lifetime

But their expression does.

They're not a large part of what's going on.

How do you know?

Comment author: gwern 14 September 2014 09:37:12PM 1 point [-]

But their expression does.

An expression in circumstances dictated by what genes one started with.

How do you know?

Because if they were a large part of what was going on, the estimates would not break down cleanly and the methods work so well.

Comment author: Azathoth123 13 September 2014 02:07:45AM 4 points [-]

Keep in mind that the outside view of biological complexity is that

The known unknowns have tended to end up lower in complexity than we've predicted. But unknown unknowns continue to blindside us, unabated, adding to the total complexity of the human body.

Or to phrase this another way:

people accurately estimate the total complexity and then apportion it among the known unknowns, thus creating an overestimate.

Comment author: gwern 14 September 2014 09:35:19PM 2 points [-]

I don't think the outside view is relevant here. We have coming up on a century of twin studies and behavioral genetics and very motivated people coming up with possibilities for problems, and so far the traditional estimates are looking pretty good: for example, when people go and look at genetics directly, the estimates for simple additive heritability look very similar to the traditional estimates. The other day offered an example of a SNP study confirming the estimates from twin studies, "Substantial SNP-based heritability estimates for working memory performance", Vogler et al 2014. If all these complexities were real and serious problems and the Outside View advises us to be skeptical, why do we keep finding the SNP/GCTA estimates look exactly like we would have predicted?

Comment author: Azathoth123 15 September 2014 02:46:58AM 3 points [-]

Ok, I confess I have no idea what SNP and GCTA are. As for the study Lumifer linked to, Razib Khan's analysis of it is that it suggests intelligence is a complex polygenetic trait. This should not be surprising as it is certainly an extremely complex trait in terms of phenotype.