Evolution's selection target depends on your weighting

[-]Richard_Kennaway1y*175

What is the purpose of declaring some organism the "winner" of evolution? This is like looking at a vast river delta and declaring one of its many streams to be the "most successful" at finding the sea. Any such judgement is epiphenomenal to the thing itself, which does not care about the stories anyone makes up about it.

[-]tailcalled1y10-3

Some people say that e.g. inner alignment failed for evolution in creating humans. In order for that claim of historical alignment difficulty to cash out, it feels like humans need to be "winners" of evolution in some sense, as otherwise species that don't achieve as full agency as humans do seem like a plausibly more relevant comparison to look at. This is kind of a partial post, playing with the idea but not really deciding anything definitive.

[-]Steven Byrnes1y64

Here’s a sensible claim:

CLAIM A: “IF there’s a learning algorithm whose reward function is X, THEN the trained models that it creates will not necessarily explicitly desire X.”

This is obviously true, and every animal including humans serves as an example. For most animals, it’s trivially true, because most animals doesn’t even know what inclusive genetic fitness is, so obviously they don’t explicitly desire it.

So here’s a stronger claim:

CLAIM B: “CLAIM A is true even if the trained model is sophisticated enough to fully understand what X is, and to fully understand that it was itself created by this learning algorithm.”

This one is true too, and I think humans are the only example we have. I mean, the claim is really obvious if you know how algorithms work etc., but of course some people question it anyway, so it can be nice to have a concrete illustration.

(More discussion here.)

Neither of those claims has anything to do with humans being the “winners” of evolution. I don’t think there’s any real alignment-related claim that does. Although, people say all kinds of things, I suppose. So anyway, if there’s really something substantive that this post is responding to, I suggest you try to dig it out.

[-]Noosphere891y111

Neither of those claims has anything to do with humans being the “winners” of evolution. I don’t think there’s any real alignment-related claim that does. Although, people say all kinds of things, I suppose. So anyway, if there’s really something substantive that this post is responding to, I suggest you try to dig it out.

The evolution analogy of how evolution failed to birth intelligent minds that valued what evolution valued is an intuition pump that does get used in explaining outer/inner alignment failures, and is part of why in some corners there's a general backdrop of outer/ inner alignment being so hard.

It's also used in the sharp left turn, where the capabilities of an optimization process like humans outstripped their alignment to evolutionary objectives, and the worry is that an AI could do the same to us, and evolutionary analogies do get used here.

Both Eliezer Yudkowsky and Nate Soares use arguments that rely on evolution failing to get a selection target inside us, thus misaligning us to evolution:

https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#Yudkowsky_argues_against_AIs_being_steerable_by_gradient_descent_

https://www.lesswrong.com/posts/GNhMPAWcfBCASy8e6/a-central-ai-alignment-problem-capabilities-generalization

[-]Steven Byrnes1y50

The OP talks about the fact that evolution produced lots of organisms on Earth, of which humans are just one example, and that if we view the set of all life, arguably more of it consists of bacteria or trees than humans. Then this comment thread has been about the question: so what? Why bring that up? Who cares?

Like, here’s where I think we’re at in the discussion:

Nate or Eliezer: “Evolution made humans, and humans don’t care about inclusive genetic fitness.”

tailcalled: “Ah, but did you know that evolution also made bacteria and trees?”

Nate or Eliezer: “…Huh? What does that have to do with anything?”

If you think that the existence on Earth of lots of bacteria and trees is a point that specifically undermines something that Nate or Eliezer said, then can you explain the details?

[-]Noosphere891y42

Oh, I was responding to something different, my apologies.

[-]tailcalled1y20

I wouldn't go this far yet. E.g. I've been playing with the idea that the weighting where humans "win" evolution is something like adversarial robustness. This just wasn't really a convincing enough weighting to be included in the OP. But if something like that turns out correct then one could imagine that e.g. humans get outcompeted by something that's even more adversarially robust. Which is basically the standard alignment problem.

Like I did not in fact interject in response to Nate or Eliezer. Someone asked me what triggered my line of thought, and I explained that it came from their argument, but I also said that my point was currently too incomplete.

[-]Q Home1y31

Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.

Object-level comment:

Neither of those claims has anything to do with humans being the “winners” of evolution.

I think it might be more complicated than that:

We need to define what "a model produced by a reward function" means, otherwise the claims are meaningless. Like, if you made just a single update to the model (based on the reward function), calling it "a model produced by the reward function" is meaningless ('cause no real optimization pressure was applied). So we do need to define some goal of optimization (which determines who's a winner and who's a loser).
We need to argue that the goal is sensible. I.e. somewhat similar to a goal we might use while training our AIs.

Here's some things we can try:

We can try defining all currently living species as winners. But is it sensible? Is it similar to a goal we would use while training our AIs? "Let's optimize our models for N timesteps and then use all surviving models regardless of any other metrics" <- I think that's not sensible, especially if you use an algorithm which can introduce random mutations into the model.
We can try defining species which avoided substantial changes for the longest time as winners. This seems somewhat sensible, because those species experienced the longest optimization pressure. But then humans are not the winners.
We can define any species which gained general intelligence as winners. Then humans are the only winners. This is sensible because of two reasons. First, with general intelligence deceptive alignment is possible: if humans knew that Simulation Gods optimize organisms for some goal, humans could focus on that goal or kill all competing organisms. Second, many humans (in our reality) value creating AGI more than solving any particular problem.

I think the later is the strongest counter-argument to "humans are not the winners".

[-]tailcalled1y20

Right, I think there are variants of it that might work out, but there's also the aspect where some people argue that AGI will turn out to essentially be a bag-of-heuristics or similar, where inner alignment becomes less necessary because the heuristics achieve the outer goal even if they don't do it as flexibly as they could.

Richard Kennaway asked why I would think in those lines but the point of the OP isn't to make an argument about AI alignment, it's merely to think in those lines. Conclusions can come later once I'm finished exploring it.

[-]Dagon1y126

Wish I could upvote and disagree. Evolution is a mechanism without a target. It's the result of selection processes, not the cause of those choices.

[-]tailcalled1y30

I like to treat the environment/ecology as the cause. So that e.g. trees are caused by the sun.

I kind of feel like pelagibacter communis could maybe be seen as "evolutionary heat" or "ecological heat", like in the sense that the ecology has "space for" some microscopic activity so whatever major ecological causes pop up, some minimal species will evolve to fill up the minimal-species-niche.

[-]Ben1y50

I think a reasonable-seeming metric on which humans are doubtless the winners is "energy controlled".

Total up all the human metabolic energy, plus the output of the world's power grids, the energy of all that petrol/gas burning in cars/boilers. If you are feeling generous you could give humans a percentage of all the metabolic energy going through farm animals.

Its a bit weird, because on the one hand its obvious that collectively humans control the planet in a way no other organism does. But, you are looking for a metric where plants and single-celled organisms are allowed to participate, and they can't properly be said to control anything, even themselves.

[-]tailcalled1y20

I think there's something to this. Also since making the OP, I've been thinking that human control of fire seems important. If trees have the majority of the biomass, but humans can burn the trees for energy or just to make space, then that also makes humans special (and overlaps a lot with what you say about energy controlled).

This also neatly connects human society to the evolutionary ecology since human dominance hierarchies determine who is able to control what energy (or set fire to what trees).

[-]Dmitry Vaintrob1y32

Insofar as you're thinking of evolution as analogous to gradient flow, it only makes sense if it's local and individual-level I think -- it is a category error to say that a species that has more members is a winner. The first shark that started eating its siblings in utero improved its genetic fitness (defined as the expected number of offspring in the specific environment it existed in) but might have harmed the survivability of the species as a whole.

[-]tailcalled1y20

In the case of gradient flow, we expect almost-all starting conditions to end up in a similar functional relationship when restricting attention to their on-distribution behavior. This allows us to pick a canonical winner.

Evolution is somewhat different from this in that we're not working with a random distribution but instead a historical distribution, but that should just increase the convergence even more.

The noteworthy part is that despite this convergence, there's still multiple winners because it depends on your weighting (and I guess because the species aren't independent, too).

[-]Charlie Steiner1y20

Yeah, this makes sense.

You could also imagine more toy-model games with mixed ecological equilibria.

E.g. suppose there's some game where you can reproduce by getting resources, and you get resources by playing certain strategies, and it turns out there's an equilibrium where there's 90% strategy A in the ecosystem (by some arbitrary accounting) and 10% strategy B. It's kind of silly to ask whether it's A or B that's winning based on this.

Although now that I've put things like that, it does seem fair to say that A is 'winning' if we're not at equilibrium, and A's total resources (by some accounting...) is increasing over time.

Now to complicate things again, what if A is increasing in resource usage but simultaneously mutating to be played by fewer actual individuals (the trees versus pelagibacter, perhaps)? Well, in the toy model setting it's pretty tempting to say the question is wrong, because if the strategy is changing it's not A anymore at all, and A has been totally wiped out by the new strategy A'.

Actually I guess I endorse this response in the real world too, where if a species is materially changing to exploit a new niche, it seems wrong to say "oh, that old species that's totally dead now sure were winners." If the old species had particular genes with a satisfying story for making it more adaptable than its competitors, perhaps better to take a gene's-eye view and say those genes won. If not, just call it all a wash.

Anyhow, on humans: I think we're 'winners' just in the sense that the human strategy seems better than our population 200ky ago would have reflected, leading to a population and resource use boom. As you say, we don't need to be comparing ourselves to phytoplankton, the game is nonzero-sum.

[-]tailcalled1y2-1

E.g. suppose there's some game where you can reproduce by getting resources, and you get resources by playing certain strategies, and it turns out there's an equilibrium where there's 90% strategy A in the ecosystem (by some arbitrary accounting) and 10% strategy B. It's kind of silly to ask whether it's A or B that's winning based on this.

But this is an abstraction that would never occur in reality. The real systems that inspire this sort of thing have lots of pelagibacter communis and the strategies A and B are constantly diverging off into various experimental organisms that fit neither strategy and then die out.

When you choose to model this as a mixture of A and B, you're already implicitly picking out both A and B as especially worth paying attention to - that is, as "winners" in some sense.

Actually I guess I endorse this response in the real world too, where if a species is materially changing to exploit a new niche, it seems wrong to say "oh, that old species that's totally dead now sure were winners." If the old species had particular genes with a satisfying story for making it more adaptable than its competitors, perhaps better to take a gene's-eye view and say those genes won. If not, just call it all a wash.

But in this case you could just say A' is winning over A. Like if you were training a neural network, you wouldn't say that your random initialization won the loss function, you'd say the optimized network scores better loss than the initial random initialization.

[-]Charlie Steiner1y20

Perhaps I should have said that it's silly to ask whether "being like A" or "being like B" is the goal of the game.

[-]Kaj_Sotala1y20

I've previously argued that genetic fitness is a measure of selection strength, not the selection target. What evolution selects for are traits that happen to be useful in the organism's current environment. The extent to which a trait is useful in the organism's current environment can be quantified as fitness, but fitness is specific to a particular environment and the same trait might have a very different fitness in some other environment.

[-]tailcalled1y20

But the problem I mention seems to still apply even if you hold the environment fixed.

[-]Kaj_Sotala1y20

I guess I don't really understand what you're asking. I meant my comment as an answer to this bit in the OP:

I think it's common on LessWrong to think of evolution's selection target as inclusive genetic fitness - that evolution tries to create organisms which make as many organisms with similar DNA to themselves as possible. But what exactly does this select for?

In that evolution selecting for "inclusive genetic fitness" doesn't really mean selecting for anything in particular; what exactly that ends up selecting for is completely dependent on the environment (where "the environment" also includes the species itself, which is relevant for things like sexual selection or frequency-dependent selection).

If you fix the environment, assuming for the sake of argument that it's possible to do that, then the exact thing it selects for are just the traits that are useful in that environment.

Do humans have high inclusive genetic fitness?

I think it's a bit of a category mistake to ask about the inclusive fitness of a species. You could calculate the average fitness of an individual within the species, but at least to my knowledge (caveat: I'm not a biologist) that's not very useful. Usually it's individual genotypes or phenotypes within the species that are assigned a fitness.

[-]tailcalled1y10

The OP is more of a statement that you get different results depending on whether you focus on organism count or biomass or energy flow. I motivate this line of inquiry by a question about what evolution selects for, but that's secondary to the main point.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

23

Evolution's selection target depends on your weighting

23

23