[partly copied from here]
If future humans switch from DNA to XNA, or upload themselves into simulations, or imprint their values on AI successors, or whatever, then the future would be high-reward according to some of those RL algorithms and the future would be zero-reward according to others of those RL algorithms.
In other words, one “experiment” is simultaneously providing evidence about what the results look like for infinitely many different RL algorithms. Lucky us.
(Related to: “goal misgeneralization”.)
I don’t think it’s productive to just stare at the list of bullet points and try to find the one that corresponds to the “broadest, truest” essence of natural selection. What does that even mean? Why is it relevant to this discussion?
I do think it is potentially productive to argue that the evidence from some of these bullet-point “experiments” is more relevant to AI alignment than the evidence from others of these bullet-point “experiments”. But to make that argument, one needs to talk more specifically about what AI alignment will look like, and argue on that basis that some of the above bullet point RL algorithms are more disanalogous to AI alignment than others. This kind of argument wouldn’t be talking about which bullet point is “reasonable” or “the true essence of natural selection”, but rather about which bullet point is the tightest analogy to the situation where future programmers are developing powerful AI.
(And FWIW my answer to the latter is: none of the above—I think all of those bullet points are sufficiently disanalogous to AI alignment that we don’t really learn anything from them, except that they serve as an existence proof illustration of the extremely weak claim that inner misalignment in RL is not completely impossible. Further details here.)
Isn't your list an "any_of"?
I think your error is you are thinking the "RL algorithm" is the encoded policy network on a specific creature. Like a human wants to have children, a bacterium wants to find food and replicate itself the moment thresholds are reached. There is a physical mechanism that causes these policies to be enacted.
This is not the RL algorithm. The RL algorithm of evolution doesn't "exist" anywhere physical, it just happens to prefer outcomes where creatures cause other creatures to exist, they do not have to be remotely sharing the same code. Evolution ranks : build an AI successor >>>>>>>> father 1000 children >> father 1 child, for a concrete example, and it prefers them in that order.
Or another example, you think individual creatures would prefer their own genes to be propagated*. This is a policy. If hypothetically you could go to a biotech clinic and have your genetic code upgraded (junk cleaned out, AI designed genes replace all of your genes with superior versions or the best version found in the human gene pool), your policy network as a human being may not prefer that outcome, but evolution DOES.
Natural selection is often charged with having goals for humanity, and humanity is often charged with falling down on them.
Whoever's claiming this is really misunderstanding (or, often, misrepresenting) natural selection. It has goals in exactly the same way that Gravity has goals. They're also forgetting (or, often, ignoring) that natural selection works by REPLACEMENT, not by improvement in place or preservation of successes.
I fully agree that something like persistence/[continued existence in ~roughly the same shape] is the most natural/appropriate/joint-carving way to think about whatever-natural-selection-is-selecting-for in its full generality. (At least that's the best concept I know at the moment.)
(Although there is still some sloppiness in what does it mean for a thing at time t0 to be "the same" as some other thing at time t1.)
This view is not entirely novel, see e.g., Bouchard's PhD thesis (from 2004) or the SEP entry on "Fitness" (ctrl+F "persistence").
I also agree that [humans are]/[humanity is] obviously massively successful on that criterion.
I'm very uncertain as to what implications this has for AI alignment.
Natural selection is often charged with having goals for humanity, and humanity is often charged with falling down on them. The big accusation, I think, is of sub-maximal procreation. If we cared at all about the genetic proliferation that natural selection wanted for us, then this time of riches would be a time of fifty-child families, not one of coddled dogs and state-of-the-art sitting rooms.
But (the story goes) our failure is excusable, because instead of a deep-seated loyalty to genetic fitness, natural selection merely fitted humans out with a system of suggestive urges: hungers, fears, loves, lusts. Which all worked well together to bring about children in the prehistoric years of our forebears, but no more. In part because all sorts of things are different, and in part because we specifically made things different in that way on purpose: bringing about children gets in the way of the further satisfaction of those urges, so we avoid it (the story goes).
This is generally floated as an illustrative warning about artificial intelligence. The moral is that if you make a system by first making multitudinous random systems and then systematically destroying all the ones that don’t do the thing you want, then the system you are left with might only do what you want while current circumstances persist, rather than being endowed with a consistent desire for the thing you actually had in mind.
Observing acquaintences dispute this point recently, it struck me that humans are actually weirdly aligned with natural selection, more than I could easily account for.
Natural selection, in its broadest, truest, (most idiolectic?) sense, doesn’t care about genes. Genes are a nice substrate on which natural selection famously makes particularly pretty patterns by driving a sensical evolution of lifeforms through interesting intricacies. But natural selection’s real love is existence. Natural selection just favors things that tend to exist. Things that start existing: great. Things that, having started existing, survive: amazing. Things that, while surviving, cause many copies of themselves to come into being: especial favorites of evolution, as long as there’s a path to the first ones coming into being.
So natural selection likes genes that promote procreation and survival, but also likes elements that appear and don’t dissolve, ideas that come to mind and stay there, tools that are conceivable and copyable, shapes that result from myriad physical situations, rocks at the bottoms of mountains. Maybe this isn’t the dictionary definition of natural selection, but it is the real force in the world, of which natural selection of reproducing and surviving genetic clusters is one facet. Generalized natural selection—the thing that created us—says that the things that you see in the world are those things that exist best in the world.
So what did natural selection want for us? What were we selected for? Existence.
And while we might not proliferate our genes spectacularly well in particular, I do think we have a decent shot at a very prolonged existence. Or the prolonged existence of some important aspects of our being. It seems plausible that humanity makes it to the stars, galaxies, superclusters. Not that we are maximally trying for that any more than we are maximally trying for children. And I do think there’s a large chance of us wrecking it with various existential risks. But it’s interesting to me that natural selection made us for existing, and we look like we might end up just totally killing it, existence-wise. Even though natural selection purportedly did this via a bunch of hackish urges that were good in 200,000 BC but you might have expected to be outside their domain of applicability by 2023. And presumably taking over the universe is an extremely narrow target: it can only be done by so many things.
Thus it seems to me that humanity is plausibly doing astonishingly well on living up to natural selection’s goals. Probably not as well as a hypothetical race of creatures who each harbors a monomaniacal interest in prolonged species survival. And not so well as to be clear of great risk of foolish speciocide. But still staggeringly well.