What is the purpose of declaring some organism the "winner" of evolution? This is like looking at a vast river delta and declaring one of its many streams to be the "most successful" at finding the sea. Any such judgement is epiphenomenal to the thing itself, which does not care about the stories anyone makes up about it.
Some people say that e.g. inner alignment failed for evolution in creating humans. In order for that claim of historical alignment difficulty to cash out, it feels like humans need to be "winners" of evolution in some sense, as otherwise species that don't achieve as full agency as humans do seem like a plausibly more relevant comparison to look at. This is kind of a partial post, playing with the idea but not really deciding anything definitive.
Here’s a sensible claim:
CLAIM A: “IF there’s a learning algorithm whose reward function is X, THEN the trained models that it creates will not necessarily explicitly desire X.”
This is obviously true, and every animal including humans serves as an example. For most animals, it’s trivially true, because most animals doesn’t even know what inclusive genetic fitness is, so obviously they don’t explicitly desire it.
So here’s a stronger claim:
CLAIM B: “CLAIM A is true even if the trained model is sophisticated enough to fully understand what X is, and to fully understand that it was itself created by this learning algorithm.”
This one is true too, and I think humans are the only example we have. I mean, the claim is really obvious if you know how algorithms work etc., but of course some people question it anyway, so it can be nice to have a concrete illustration.
(More discussion here.)
Neither of those claims has anything to do with humans being the “winners” of evolution. I don’t think there’s any real alignment-related claim that does. Although, people say all kinds of things, I suppose. So anyway, if there’s really something substantive that this post is responding to, I suggest you try to dig it out.
Neither of those claims has anything to do with humans being the “winners” of evolution. I don’t think there’s any real alignment-related claim that does. Although, people say all kinds of things, I suppose. So anyway, if there’s really something substantive that this post is responding to, I suggest you try to dig it out.
The evolution analogy of how evolution failed to birth intelligent minds that valued what evolution valued is an intuition pump that does get used in explaining outer/inner alignment failures, and is part of why in some corners there's a general backdrop of outer/ inner alignment being so hard.
It's also used in the sharp left turn, where the capabilities of an optimization process like humans outstripped their alignment to evolutionary objectives, and the worry is that an AI could do the same to us, and evolutionary analogies do get used here.
Both Eliezer Yudkowsky and Nate Soares use arguments that rely on evolution failing to get a selection target inside us, thus misaligning us to evolution:
The OP talks about the fact that evolution produced lots of organisms on Earth, of which humans are just one example, and that if we view the set of all life, arguably more of it consists of bacteria or trees than humans. Then this comment thread has been about the question: so what? Why bring that up? Who cares?
Like, here’s where I think we’re at in the discussion:
Nate or Eliezer: “Evolution made humans, and humans don’t care about inclusive genetic fitness.”
tailcalled: “Ah, but did you know that evolution also made bacteria and trees?”
Nate or Eliezer: “…Huh? What does that have to do with anything?”
If you think that the existence on Earth of lots of bacteria and trees is a point that specifically undermines something that Nate or Eliezer said, then can you explain the details?
I wouldn't go this far yet. E.g. I've been playing with the idea that the weighting where humans "win" evolution is something like adversarial robustness. This just wasn't really a convincing enough weighting to be included in the OP. But if something like that turns out correct then one could imagine that e.g. humans get outcompeted by something that's even more adversarially robust. Which is basically the standard alignment problem.
Like I did not in fact interject in response to Nate or Eliezer. Someone asked me what triggered my line of thought, and I explained that it came from their argument, but I also said that my point was currently too incomplete.
Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.
Object-level comment:
Neither of those claims has anything to do with humans being the “winners” of evolution.
I think it might be more complicated than that:
Here's some things we can try:
I think the later is the strongest counter-argument to "humans are not the winners".
Right, I think there are variants of it that might work out, but there's also the aspect where some people argue that AGI will turn out to essentially be a bag-of-heuristics or similar, where inner alignment becomes less necessary because the heuristics achieve the outer goal even if they don't do it as flexibly as they could.
Richard Kennaway asked why I would think in those lines but the point of the OP isn't to make an argument about AI alignment, it's merely to think in those lines. Conclusions can come later once I'm finished exploring it.
Wish I could upvote and disagree. Evolution is a mechanism without a target. It's the result of selection processes, not the cause of those choices.
I like to treat the environment/ecology as the cause. So that e.g. trees are caused by the sun.
I kind of feel like pelagibacter communis could maybe be seen as "evolutionary heat" or "ecological heat", like in the sense that the ecology has "space for" some microscopic activity so whatever major ecological causes pop up, some minimal species will evolve to fill up the minimal-species-niche.
I think a reasonable-seeming metric on which humans are doubtless the winners is "energy controlled".
Total up all the human metabolic energy, plus the output of the world's power grids, the energy of all that petrol/gas burning in cars/boilers. If you are feeling generous you could give humans a percentage of all the metabolic energy going through farm animals.
Its a bit weird, because on the one hand its obvious that collectively humans control the planet in a way no other organism does. But, you are looking for a metric where plants and single-celled organisms are allowed to participate, and they can't properly be said to control anything, even themselves.
I think there's something to this. Also since making the OP, I've been thinking that human control of fire seems important. If trees have the majority of the biomass, but humans can burn the trees for energy or just to make space, then that also makes humans special (and overlaps a lot with what you say about energy controlled).
This also neatly connects human society to the evolutionary ecology since human dominance hierarchies determine who is able to control what energy (or set fire to what trees).
Insofar as you're thinking of evolution as analogous to gradient flow, it only makes sense if it's local and individual-level I think -- it is a category error to say that a species that has more members is a winner. The first shark that started eating its siblings in utero improved its genetic fitness (defined as the expected number of offspring in the specific environment it existed in) but might have harmed the survivability of the species as a whole.
In the case of gradient flow, we expect almost-all starting conditions to end up in a similar functional relationship when restricting attention to their on-distribution behavior. This allows us to pick a canonical winner.
Evolution is somewhat different from this in that we're not working with a random distribution but instead a historical distribution, but that should just increase the convergence even more.
The noteworthy part is that despite this convergence, there's still multiple winners because it depends on your weighting (and I guess because the species aren't independent, too).
Yeah, this makes sense.
You could also imagine more toy-model games with mixed ecological equilibria.
E.g. suppose there's some game where you can reproduce by getting resources, and you get resources by playing certain strategies, and it turns out there's an equilibrium where there's 90% strategy A in the ecosystem (by some arbitrary accounting) and 10% strategy B. It's kind of silly to ask whether it's A or B that's winning based on this.
Although now that I've put things like that, it does seem fair to say that A is 'winning' if we're not at equilibrium, and A's total resources (by some accounting...) is increasing over time.
Now to complicate things again, what if A is increasing in resource usage but simultaneously mutating to be played by fewer actual individuals (the trees versus pelagibacter, perhaps)? Well, in the toy model setting it's pretty tempting to say the question is wrong, because if the strategy is changing it's not A anymore at all, and A has been totally wiped out by the new strategy A'.
Actually I guess I endorse this response in the real world too, where if a species is materially changing to exploit a new niche, it seems wrong to say "oh, that old species that's totally dead now sure were winners." If the old species had particular genes with a satisfying story for making it more adaptable than its competitors, perhaps better to take a gene's-eye view and say those genes won. If not, just call it all a wash.
Anyhow, on humans: I think we're 'winners' just in the sense that the human strategy seems better than our population 200ky ago would have reflected, leading to a population and resource use boom. As you say, we don't need to be comparing ourselves to phytoplankton, the game is nonzero-sum.
E.g. suppose there's some game where you can reproduce by getting resources, and you get resources by playing certain strategies, and it turns out there's an equilibrium where there's 90% strategy A in the ecosystem (by some arbitrary accounting) and 10% strategy B. It's kind of silly to ask whether it's A or B that's winning based on this.
But this is an abstraction that would never occur in reality. The real systems that inspire this sort of thing have lots of pelagibacter communis and the strategies A and B are constantly diverging off into various experimental organisms that fit neither strategy and then die out.
When you choose to model this as a mixture of A and B, you're already implicitly picking out both A and B as especially worth paying attention to - that is, as "winners" in some sense.
Actually I guess I endorse this response in the real world too, where if a species is materially changing to exploit a new niche, it seems wrong to say "oh, that old species that's totally dead now sure were winners." If the old species had particular genes with a satisfying story for making it more adaptable than its competitors, perhaps better to take a gene's-eye view and say those genes won. If not, just call it all a wash.
But in this case you could just say A' is winning over A. Like if you were training a neural network, you wouldn't say that your random initialization won the loss function, you'd say the optimized network scores better loss than the initial random initialization.
Perhaps I should have said that it's silly to ask whether "being like A" or "being like B" is the goal of the game.
I've previously argued that genetic fitness is a measure of selection strength, not the selection target. What evolution selects for are traits that happen to be useful in the organism's current environment. The extent to which a trait is useful in the organism's current environment can be quantified as fitness, but fitness is specific to a particular environment and the same trait might have a very different fitness in some other environment.
I guess I don't really understand what you're asking. I meant my comment as an answer to this bit in the OP:
I think it's common on LessWrong to think of evolution's selection target as inclusive genetic fitness - that evolution tries to create organisms which make as many organisms with similar DNA to themselves as possible. But what exactly does this select for?
In that evolution selecting for "inclusive genetic fitness" doesn't really mean selecting for anything in particular; what exactly that ends up selecting for is completely dependent on the environment (where "the environment" also includes the species itself, which is relevant for things like sexual selection or frequency-dependent selection).
If you fix the environment, assuming for the sake of argument that it's possible to do that, then the exact thing it selects for are just the traits that are useful in that environment.
Do humans have high inclusive genetic fitness?
I think it's a bit of a category mistake to ask about the inclusive fitness of a species. You could calculate the average fitness of an individual within the species, but at least to my knowledge (caveat: I'm not a biologist) that's not very useful. Usually it's individual genotypes or phenotypes within the species that are assigned a fitness.
The OP is more of a statement that you get different results depending on whether you focus on organism count or biomass or energy flow. I motivate this line of inquiry by a question about what evolution selects for, but that's secondary to the main point.
I think it's common on LessWrong to think of evolution's selection target as inclusive genetic fitness - that evolution tries to create organisms which make as many organisms with similar DNA to themselves as possible. But what exactly does this select for? Do humans have high inclusive genetic fitness?
One way to think of it is that all organisms alive today are "winners"/selected-for by that competition, but that seems unreasonable to me, since some individual organisms clearly have genetic disorders or similar which make them unfit according to this criterion.
There's some sort of consensus that we can assign individual organisms to "species", and then we could count it by the number of members of that species. Supposedly, the most numerous species is Pelagibacter communis, with 10^28 individuals, vastly outnumbering humanity. Maybe we could say that this is the selection target of evolution?
Of course as would be expected, pelagibacter is a very minimalist species, being single-celled and having very few genes. This minimalism also makes it hard to notice, to the point where according to Wikipedia, it was first discovered in 1990. (I wonder if there's another species that's smaller, more common, and even harder to notice...) This raises the question of pure numerousity is the correct way of thinking of it.
If we instead weight by biomass, most life is in the form of plants, and I think more specifically trees. This makes perfect sense to me - trees evolve from a direct competition for height, which is one of the traits most directly related to mass. And in a way, biomass is more sensible to weight by than numerousity, since it is less dependent on the way you slice a species into individual organisms.
But trees are pretty static. Maybe the problem is that since mass has inertia, this weighting implicitly discourages more dynamic species, like humans? An alternative is to weight by energy flow, but in that case, algae and grasses end up accounting for most of it. Sensible, because if you go up the trophic levels, you rapidly lose energy. That said, energy flow does have the dissatisfying (to me) element that it is "shared" between organisms that predate upon each other. I wonder if one could use something like entropy production to get a conceptually similar metric that's more attributable to a single organism.
I don't know of any weightings or metrics where humans are the winners, but it seems likely to me that there is one.