Clarity comments on Genosets - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (7)
Another genomics PhD here. It's a complex topic. We know that combinatorial effects (epistasis in genetics lingo) matter, from population genetics studies in model organisms. This is despite the fact that simple linear models perform well in the human population - provided they are against some reasonably constant genetic background, low allele frequencies mean that the combinatorial effects are well captured by linear ones.
The problem is that even if you only care about pairwise combinations, there are far too many of them, given a uniform prior. Even if we sequence everyone on earth we wouldn't have anywhere near enough info, sequencing additional individuals has diminishing returns because there's only so much genetic variation in the human population (and ~23000^2 possible pairwise combinations).
What we need are good priors over combinations of mutations. To do that we'll need detailed info about which genes function together to produce which phenotypes. Such models exist already and are seeing moderate success, but we need new ideas and more data than any one startup could provide. Which is exactly what molecular biologists are working on.
OP here. Having learned more statistics since I last posted - I reckon it could be as simple as exploring various interactions (effect modifications) in the data with respect to additional SNP's. The issue would be that interactions require greater sample sizes to avoid spurious results and most genetics research has woefully low sample sizes which would only be harder to overcome when inching towards more personalised medicine based on individual genomes.
Yes that's the case. To get enough data we probably need lots of in vitro experiments. Remember that data is not equal to information - even really big sample sizes wouldn't be enough to resolve the combinatoric explosion. What I mean in that comment up there (I posted it before it was finished, I think) is that there are ~23k genes in the genome, so even under the absurdly simple assumption that there's only one mutation possible per gene, you have half a billion possible combinations of gene breakages, which you will never ever be able to get enough of a sample size to look at blindly.