VincentYu comments on Get genotyped for free ( If your IQ is high enough) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (62)
Are you sure you've downloaded your entire genome file? My uncompressed file is about 500 MB, and I got about 26000 annotations on Promethease. It seems like your file might have gotten truncated during the download.
Short step-by-step guide for those who want to get their genome annotated by Promethease:
genome.txt.# rsid; Promethease chokes if you don't) and save. This is required to get Promethease to recognize the file.* I advise against downloading the
genome.txt.gzfile directly because for some reason SpiderOak hasContent-encoding: gzipin their HTTP response header, which means that browsers will transparently uncompress that file. This makes me uneasy because there is no checksum provided for the (somewhat large) plain text file, so we have little protection against corruption and truncation. In contrast, by using 'Download All Files' to download everything in a zip, the data's integrity will be automatically verified against CRC-32 checksums when we unzip and gunzip locally.Thanks for the explanation and tips! I used your procedure and ended up with the same 131MB file. Interestingly I did not need to remove the "--" entries. I have been exchanging email with BGI and they indicated files could have significantly different number of entries (but I am surprised at >3x!). Is there any chance your sequencing had greater than 4x coverage? My VCF file is queued up and should be available in a few months which should help clarify what I am seeing.
I don't know. How do I find out?
I think the VCF would tell you if you had it. Another possibility would be using a lower quality threshold for calling SNPs, but that seems unlikely.