VincentYu comments on Open Thread, Jun. 1 - Jun. 7, 2015 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (203)
Interesting. Thanks for posting this!
I received exactly the same number of SNPs from BGI, so it looks like our data were processed under the same pipeline. I've found three people who have publicly posted their BGI data: two at the Personal Genome Project (hu2FEC01 and hu41F03B, each with 5,095,048 SNPs), and one on a personal website (with 18,217,058 SNPs).
The double dashes are no calls. 23andme reports on a set list of SNPs, and instead of omitting an SNP when they can't confidently determine the genotype, they indicate this with a double dash.
This seems normal considering the error rates from 23andme that others have been reporting (example). I don't know about BGI's error rates.
I think it might be possible to accurately guess the actual genotypes for some of the mismatches by imputing the genotypes with something like Impute2 (for each mismatched SNP, leave it out and impute it using the nearby SNPs). This will take many hours of work, though, and you might as well phase and impute across the whole genome if you have the time, interest, and processing power to do so (I've been meaning to try this out to learn more about how these things work).