I saw that Katja Grace has said something similar here; I'm just putting my own spin on the idea.
The relevance of the evolutionary analogy for inner alignment has been long discussed in this community, but one observation that seems to not be mentioned is that humans are still... pretty good at inclusive genetic fitness? Even in way-out-of-distribution environments like modern society, we still have strong desires to eat food, stay alive, find mates and reproduce (although the last one has relatively decreased recently; IGF hasn't totally generalized). We don't monomanically optimize for IGF, but we (and probably future NN-based AIs) don't monomanically optimize for anything, and our motivational circuits still do a pretty good job at keeping our species alive. So... why should we expect future AIs to catastrophically fail (i.e. be completely non-inner aligned with what we wanted it to do) at doing the actions we rewarded in RL training, which should be a much stronger outer optimizer than evolution?
Some possible objections:
I think one argument is that optimizing for IGF basically gives humans two jobs: survive, and have kids.
Animal skulls are evidence that the "survive" part can be difficult. We've nailed that one, though. Very few humans in developed countries die before reaching an age suitable for having kids. I doubt that there are any other animal species that come close to us in that metric. Almost all of us have "don't die" ingrained pretty deeply.
It's looking like we are moving toward failing pretty heavily on the second "have kids" job though, and you would think that would be the easier one.
So if there's a 50% failure rate on preserving outer optimizer values within the inner optimizer, that's actually pretty terrible.
We (or at least a majority of humans) do still have inner desires to have kids, though; they just get balanced out by other considerations, mostly creature comforts/not wanting to deal with the hassle of kids. But yeah, evolution did not foresee birth control, so that's a substantial misgeneralization.
We are still a very successful species overall according to IGF, but birth rates continue to decline, which is why I made my last point about inner alignment possibly drifting farther and farther away the stronger the inner optimizer (e.g. human culture) becomes.
We (or at least a majority of humans) do still have inner desires to have kids, though; they just get balanced out by other considerations, mostly creature comforts/not wanting to deal with the hassle of kids. But yeah, evolution did not foresee birth control, so that's a substantial misgeneralization.
We are still a very successful species overall according to IGF, but birth rates continue to decline, which is why I made my last point about inner alignment possibly drifting farther and farther away the stronger the inner optimizer (e.g. human culture) becomes.