Followup to: An Alien God, The Wonder of Evolution, Evolutions Are Stupid
Yesterday, I wrote:
Humans can do things that evolutions probably can't do period over the expected lifetime of the universe. As the eminent biologist Cynthia Kenyon once put it at a dinner I had the honor of attending, "One grad student can do things in an hour that evolution could not do in a billion years." According to biologists' best current knowledge, evolutions have invented a fully rotating wheel on a grand total of three occasions.
But then, natural selection has not been running for a mere million years. It's been running for 3.85 billion years. That's enough to do something natural selection "could not do in a billion years" three times. Surely the cumulative power of natural selection is beyond human intelligence?
Not necessarily. There's a limit on how much complexity an evolution can support against the degenerative pressure of copying errors.
(Warning: A simulation I wrote to verify the following arguments did not return the expected results. See addendum and comments.)
(Addendum 2: This discussion has now been summarized in the Less Wrong Wiki. I recommend reading that instead.)
The vast majority of mutations are either neutral or detrimental; here we are focusing on detrimental mutations. At equilibrium, the rate at which a detrimental mutation is introduced by copying errors, will equal the rate at which it is eliminated by selection.
A copying error introduces a single instantiation of the mutated gene. A death eliminates a single instantiation of the mutated gene. (We'll ignore the possibility that it's a homozygote, etc; a failure to mate also works, etc.) If the mutation is severely detrimental, it will be eliminated very quickly - the embryo might just fail to develop. But if the mutation only leads to a 0.01% probability of dying, it might spread to 10,000 people before one of them died. On average, one detrimental mutation leads to one death; the weaker the selection pressure against it, the more likely it is to spread. Again, at equilibrium, copying errors will introduce mutations at the same rate that selection eliminates them. One mutation, one death.
This means that you need the same amount of selection pressure to keep a gene intact, whether it's a relatively important gene or a relatively unimportant one. The more genes are around, the more selection pressure required. Under too much selection pressure - too many children eliminated in each generation - a species will die out.
We can quantify selection pressure as follows: Suppose that 2 parents give birth to an average of 16 children. On average all but 2 children must either die or fail to reproduce. Otherwise the species population very quickly goes to zero or infinity. From 16 possibilities, all but 2 are eliminated - we can call this 3 bits of selection pressure. Not bits like bytes on a hard drive, but mathematician's bits, information-theoretical bits; one bit is the ability to eliminate half the possibilities. This is the speed limit on evolution.
Among mammals, it's safe to say that the selection pressure per generation is on the rough order of 1 bit. Yes, many mammals give birth to more than 4 children, but neither does selection perfectly eliminate all but the most fit organisms. The speed limit on evolution is an upper bound, not an average.
This 1 bit per generation has to be divided up among all the genetic variants being selected on, for the whole population. It's not 1 bit per organism per generation, it's 1 bit per gene pool per generation. Suppose there's some amazingly beneficial mutation making the rounds, so that organisms with the mutation have 50% more offspring. And suppose there's another less beneficial mutation, that only contributes 1% to fitness. Very often, an organism that lacks the 1% mutation, but has the 50% mutation, will outreproduce another who has the 1% mutation but not the 50% mutation.
There are limiting forces on variance; going from 10 to 20 children is harder than going from 1 to 2 children. There's only so much selection to go around, and beneficial mutations compete to be promoted by it (metaphorically speaking). There's an upper bound, a speed limit to evolution: If Nature kills off a grand total of half the children, then the gene pool of the next generation can acquire a grand total of 1 bit of information.
I am informed that this speed limit holds even with semi-isolated breeding subpopulations, sexual reproduction, chromosomal linkages, and other complications.
Let's repeat that. It's worth repeating. A mammalian gene pool can acquire at most 1 bit of information per generation.
Among mammals, the rate of DNA copying errors is roughly 10^-8 per base per generation. Copy a hundred million DNA bases, and on average, one will copy incorrectly. One mutation, one death; each non-junk base of DNA soaks up the same amount of selection pressure to counter the degenerative pressure of copying errors. It's a truism among biologists that most selection pressure goes toward maintaining existing genetic information, rather than promoting new mutations.
Natural selection probably hit its complexity bound no more than a hundred million generations after multicellular organisms got started. Since then, over the last 600 million years, evolutions have substituted new complexity for lost complexity, rather than accumulating adaptations. Anyone who doubts this should read George Williams's classic "Adaptation and Natural Selection", which treats the point at much greater length.
In material terms, a Homo sapiens genome contains roughly 3 billion bases. We can see, however, that mammalian selection pressures aren't going to support 3 billion bases of useful information. This was realized on purely mathematical grounds before "junk DNA" was discovered, before the Genome Project announced that humans probably had only 20-25,000 protein-coding genes. Yes, there's genetic information that doesn't code for proteins - all sorts of regulatory regions and such. But it is an excellent bet that nearly all the DNA which appears to be junk, really is junk. Because, roughly speaking, an evolution isn't going to support more than 10^8 meaningful bases with 1 bit of selection pressure and a 10^-8 error rate.
Each base is 2 bits. A byte is 8 bits. So the meaningful DNA specifying a human must fit into at most 25 megabytes.
(Pause.)
Yes. Really.
And the Human Genome Project gave the final confirmation. 25,000 genes plus regulatory regions will fit in 100,000,000 bases with lots of room to spare.
Addendum: genetics.py, a simple Python program that simulates mutation and selection in a sexually reproducing population, is failing to match the result described above. Sexual recombination is random, each pair of parents have 4 children, and the top half of the population is selected each time. Wei Dai rewrote the program in C++ and reports that the supportable amount of genetic information increases as the inverse square of the mutation rate(?!) which if generally true would make it possible for the entire human genome to be meaningful.
In the above post, George Williams's arguments date back to 1966, and the result that the human genome contains <25,000 protein-coding regions comes from the Genome Project. The argument that 2 parents having 16 children with 2 surviving implies a speed limit of 3 bits per generation was found here, and I understand that it dates back to Kimura's work in the 1950s. However, the attempt to calculate a specific bound of 25 megabytes was my own.
It's possible that the simulation contains a bug, or that I used unrealistic assumptions. If the entire human genome of 3 billion DNA bases could be meaningful, it's not clear why it would contain <25,000 genes. Empirically, an average of O(1) bits of genetic information per generation seems to square well with observed evolutionary times; we don't actually see species gaining thousands of bits per generation. There is also no reason to believe that a dog has greater morphological or biochemical complexity than a dinosaur. In short, only the math I tried to calculate myself should be regarded as having failed, not the beliefs that are wider currency in evolutionary biology. But until I understand what's going on, I would suggest citing only George Williams's arguments and the Genome Project result, not the specific mathematical calculation shown above.
I've been enjoying your evolution posts and wanted to toss in my own thoughts and see what I can learn.
"Our first lemma is a rule sometimes paraphrased as "one mutation, one death"."
Imagine that having a working copy of gene "E" is essential. Now suppose a mutation creates a broken gene "Ex". Animals that are heterozygous with "E" and "Ex" are fine and pass on their genes. Only homozygous "Ex" "Ex" result in a "death" that removes 2 mutations.
Now imagine that a duplication event gives four copies of "E". In this example an animal would only need one working gene out of the four possible copies. When the rare "Ex" "Ex" "Ex" "Ex" combination arises then the resulting "death" removes four mutations.
In fruit fly knock-out experiments, breaking one development gene often had no visible affect. Backup genes worked well enough. The backup gene could have multiple roles: First, it has a special function that improves the animal fitness. Second, it works as a backup when the primary gene is disabled. The resulting system is robust since the animal can thrive with many broken copies and evolution is efficient since a single "death" can remove four harmful mutations.
I've focussed on protein-coding genes, but this concept also applies to short DNA segments that code for elements such as miRNA's. Imagine that the DNA segment is duplicated. Being short, it is rarely deactivated by a mutation. Over time a genome may acquire many working copies that code for that miRNA. Rarely an animal would inherit no working copies and so a "death" would remove multiple chromosomes that "lacked" that DNA segment. On the other hand, too many copies might also be fatal. Chromosomes with too few or too many active copies would suffer a fitness penalty.
On a different note, imagine two stags. The first stag has lucked-out and inherited many alleles that improve its fitness. The second stag wasn't so lucky and inherited many bad alleles. The first stag successfully mates and the second doesn't. One "death" removed many inferior alleles.
Animals may have evolved sexual attraction based on traits that depend on the proper combined functioning of many genes. An unattractive mate might have many slightly harmful mutations. Thus one "death" based on sexual selection might remove many harmful mutations.
Evolution might be a little better than the "one mutation, one death" lemma implies. (I agree that evolution is an inefficient process.)
"This 1 bit per generation has to be divided up among all the genetic variants being selected on, for the whole population. It's not 1 bit per organism per generation, it's 1 bit per gene pool per generation."
Suppose new allele "A" has fitness advantage 1.03 compared to the wild allele "a" and that another allele "B" on the same type chromosome has fitness advantage 1.02. Eventually the "A" and "B" alleles will be sufficiently common that a crossover creating a new chromosome "AB" with "A" and "B" alleles is likely (This crossover probability depends on the population sizes of "Ab" and "aB" chromosomes and the distance between the alleles). The new chromosome "AB" should have a fitness of 1.05 compared to the chromosome "ab". Both "A" and "B" should then see an accelerated spread until the "ab" chromosomes are largely displaced. The rate would then diminish as "AB" displaced "Ab" and "aB" chromosomes. Thus multiple beneficial mutations of the same type chromosome should spread faster than the "single mutation" formula would indicate.
Due to crossover, good "bits" would tend to accumulate on good chromosomes thereby increasing the fitness of the entire chromosome as described above. The highly fit good chromosome thus displaces chromosome with many bad "bits". The good "bits" are no longer inherited independently and each "death" can now select multiple information "bits".
We seem to view evolution from a similar perspective.
Information requires selection in order to be preserved. The DNA information in an animal genome could be ranked in "fitness" value and the resulting graph would likely follow a power law. I.e., some DNA information is extremely important and likely to be preserved while most of the DNA is relatively free to drift. In a species such as fruit flies with many offspring selection can drive the species high up a local fitness peak. Much of the animal genome will be optimized. In a species such as humans with few offspring there is much less selection pressure and the specie gene pool wanders further from local peaks. More of the human genome drifts. (E.g., human regulatory elements are less conserved than rodent regulatory elements.)