academic and commercial
I think the main issue is retail vs wholesale. (BGI's $600 probably means volume pricing.) Also, I think NHGRI publishes averages over deployed machines, including old ones, which overestimates the cost of buying a new machine to create new capacity.
Dante is a retail commercial product of $700 for a whole genome. Dante and Veritas had previously had sales of $200 and $300 (which was probably measuring the demand curve and doesn't say much about cost). Two days after writing this comment, Veritas cut its price in half to $600, below Dante.
Nice post! The plot from NHGRI looks strangely off in my browser. The plateau at the bottom right should be just above $1000 (but looks as if it is closer to $100). The graph in the actual NHGRI page ( https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost ) is correct, and so is Wikipedia.
What does it mean to sequence a genome?
A simplistic view of the genome is just a coil of DNA-tape in every cell, that can be read out, base-by-base. But in truth, the genome has lots of edge cases. First, the genomes of humans, and eukaryotes in general, are packed into chromosomes. There are
From A reference standard for genome biology (2018):
So basically, sequencing a genome from scratch is very expensive, so they make one very good "reference genome" for each species, then for any individual, it's only necessary to sequence bits and pieces, then compare with the reference. It's like playing a jigsaw puzzle: if you know the whole picture, it's much easier to piece together the pieces.
Wouldn't it be funny, if in the future, nanotechnology becomes good enough to directly manipulate on DNA material, as fluidly as electronics are good at manipulating the voltage levels of semiconductor chips, so that computation can be run directly on DNA, obviating the laborious process of converting the nucleotide bases into silicon "bases"? ... Nah, they'll never be able to run quicksort on the chromosome without converting it to silicon bits first.
That's just the genome. There are also the epigenomes, the genomes of mitochondria, the genomes of cancer cells...
The quest for completion continues. There is still much to do.
Human whole genome sequencing
This was plotted in 2014, according to Technology: The $1,000 genome (2014), Nature News. Back then, it cost $5000.
In 2015 it dropped suddenly to $1000 and stagnated, according to NHGRI Genome Sequencing Program (June 7, 2019).
BGI claims to sequence from only $600, with details lacking. I'd put that as the lower bound on sequencing.
I find this curve interesting, for it is definitely not exponential. NIH explained the abrupt drop at 2008 thus:
The drop in the middle of 2015 is noted but not clearly explained:
Instead, a morass of technical information is given, with no mention of any abrupt technical changes:
There is also a warning about comparing price quotes from academic and commercial institutions:
The $1000 genome
From Wikipedia:
The $100 genome
Clearly, if there's $1000 genome, there has to be $100 (and so on inexorably).
The $100 genome is still not here yet. The earliest mention I could find is from a 2008 report from MIT Tech Review that quotes two predictions:
First prediction is wrong. Extrapolating at 2008 from the graph, 2011 seemed reasonable, but turns out there's significant plateauing. Second prediction is right, but not very sharp.
It's a confusing mess to figure out which of these commercial products is supposed to do which, but it seems currently a (mostly complete) personal genome sequencing takes down to $600.
Massive Whole-Genome Sequencing (WGS) projects
Humans
There seems to be on the order of 1 million human genomes sequenced so far. Many are national projects.
Some are sub-national, though.
Nonhumans
1001 Genomes project started in 2008, aiming to sequence genomes of strains of Arabidopsis thaliana (the model plant in biology). It concluded with 1135 genomes published in 2016.
B10K project started in 2014, aiming to sequence all 10,560 species of Aves before 2020. So far (July 5, 2017) it has acquired 2500 samples and sequenced just 300. It will surely fail to deliver. There's another project, OpenWings, started in April 2018, aims for the same in 4 years. As of June 2019, the project is alive, but no firm progress has been reported.
Bat 1K began in 2018, and aims to sequence all bats, defined as the 1288 species of Chiroptera. The May 2019 newsletter reports completion of "deep sequencing" (sequencing multiple times to reduce error) of 5 bats and near-completion of a sixth. They have also received funds to sequence one species from each of the 21 bat families.
Going bigger, the Genome 10K project aims to sequence the genome of at least one individual from each vertebrate genus, approximately 10,000 genomes.
It is a main step of The Vertebrate Genomes Project, which aims to generate reference genomes for all 66,000 extant vertebrate species. It has made some good progress in September 2018, publishing 15 reference genomes from 14 species. They sequenced the female zebra finch (Taeniopygia guttata), the most commonly studied vocal learner, twice (a male and a female), presumably because it is particularly interesting for studying the genetics of language.
The logical conclusion is the Earth BioGenome Project, a project started in November 2018, aiming to sequence all genomes of known eukaryotic species on earth in 10 years:
Note: 1 exabyte isn't that much in terms of REALLY big science. The voracious LHC produces 1 petabyte/sec, too much to record, and so it's filtered before storage. Even after filtering, it still has archived 200 petabytes on 2017 June 29.
Considering the current cost of one human genome is $1k, this gives an estimate of $1.5 billion. The projected cost is $4.7 billion, which passes the Fermi estimate sanity check.
The project is made of many sub projects. For example:
Also microbes
The Earth Microbiome Project studies the microbes of earth:
It started in 2010. At the end of 2017, it reported 28000 species sequenced. I could not find a price or completion date estimate. Hopefully it will complete faster than 100 years (assuming constant speed)!