Unpicking Extinction

ukc10014

TL;DR

Human extinction is trending: there has been a lot of noise, mainly on X, about the apparent complacency amongst e/acc with respect to human extinction. Extinction also feels adjacent to another view (not particular to e/acc) that ‘the next step in human evolution is {AI/AGI/ASI}’. Many have pushed back robustly against the former, while the latter doesn’t seem very fleshed out. I thought it useful to, briefly, gather the various positions and summarise them, hopefully not too inaccurately, and perhaps pull out some points of convergence.

This is a starting point for my own research (on de-facto extinction via evolution). There is nothing particularly new in here: see the substantial literature in the usual fora for instance. Thomas Moynihan’s X-risk (2020) documents the history of humanity’s collective realisation of civilisational fragility, while Émile P. Torres’ works (discussed below) set out a possible framework for an ethics of extinction.

My bottom line is: a) the degree of badness (or goodness) of human extinction seems less obvious or self-evident than one might assume, b) what we leave behind if and when we go extinct matters, c) the timing of when this happens is important, as is d) the manner in which the last human generations live (and die).

Relevant to the seeming e/acc take (i.e. being pretty relaxed about possible human extinction): it seems clear that our default position (subject to some caveats) should be to delay extinction on the grounds that a) it is irreversible (by definition), and b) so as to maximise our option value over the future. In any case, the e/acc view, which seems based on something (not very articulate) something entropy crossed with a taste for unfettered capitalism, is hard to take seriously and might even fail on its own terms.

Varieties of extinction

The Yudkowsky position

(My take on) Eliezer’s view is that he fears a misaligned AI (not necessarily superintelligent), acting largely on its own (e.g. goal-formation, planning, actually effecting things in the world), eliminates humans and perhaps all life on Earth. This would be bad, not just for the eliminated humans or their descendants, but also for the universe-at-large in the sense that intelligently-created complexity (of the type that humans generate) is an intrinsic good that requires no further justification. The vast majority of AI designs that Eliezer foresees would, through various chains of events, result in a universe with much less of these intrinsic goods.

He spells it out here in the current e/acc context, and clarifies that his view doesn’t hinge on the preservation of biological humans (this was useful to know). He has written copiously and aphoristically on this topic, for instance Value is Fragile and the Fun Theory sequence.

The Bostrom variant

Nick Bostrom’s views on human extinction seem to take a more-happy-lives-are-better starting point. My possibly mistaken impression is that, like Eliezer, he seems to value things like art, creativity, love, in the specific sense that a future where they didn’t exist would be a much worse one from a cosmic or species-neutral perspective. He describes an ‘uninhabited society’ that is technologically advanced and builds complex structures, but that ‘nevertheless lacks any type of being that is conscious or whose welfare has moral significance’ (Chapter 11, p. 173 of Superintelligence (2014)). To my knowledge, he doesn’t unpick what precisely about the uninhabited society would actually be bad and for whom (possibly this is well-understood point or a non-question in philosophy, but I’m not sure that is the case, at least judging from (see below) Benatar, Torres, this paper by James Lenman, or for that matter Schopenhauer).

A more tangible reason Bostrom thinks we should avoid going extinct anytime soon is to preserve ‘option value’ over the future - since so many questions about humans’ individual and group-level preferences, as well as species-level vocation, remain unanswered, it may be better to defer any irreversible changes until such time as we are collectively wiser. This intuitively makes sense, though even here it is unclear how strong the impact of option value actually would be on the overall value of X-risk reduction.

(Perhaps) the e/acc view

There doesn’t seem to be a recent substantial argument from the cluster of commentators lumped into ‘e/acc’ (particularly @basedbefjezos), but this 2022 post seems useful. My take on e/acc, working mostly off the document above: a) they have no bias in favour of humans or human-ish minds or creation as intrinsic goods (in the sense that I describe Eliezer or Bostrom having), b) one of their intrinsic goods seems to be maximising the amount intelligence in the cosmos (they take an expansive, aggressively non-anthropocentric definition of ‘intelligence’, seemingly including capitalism and other group-level cognition), c) their view on what the manifested results (of intelligence, whether artificial or socio-capitalist) is pretty tolerant i.e. whatever that intelligence results in is acceptable and that following (source)

‘the “will of the universe” [means] leaning into the thermodynamic bias towards futures with greater and smarter civilizations that are more effective at finding/extracting free energy from the universe and converting it to utility at grander and grander scales.’

Continuing the list above, d) they don’t believe Eliezer or Bostrom-style ‘uninhabited worlds’, which I assume is what ‘zombie’ is gesturing at, are likely: ‘No need to worry about creating “zombie” forms of higher intelligence, as these will be at a thermodynamic/evolutionary disadvantage compared to conscious/higher-level forms of intelligence’.

The e/acc also make reference to thermodynamic explanations for the origin of life (see Jeremy England), which they seem to extrapolate to higher forms of cognition operating at multiple-scales. I don’t know enough to critique this, except to say this feels like this foundation is carrying a lot of weight for their claims (but could be interesting if well-supported - which it currently is not).

The comments above do not include any further refinements of e/acc thought (and this is a live and heated conversation), but here is Eliezer’s suggestion (source) of what he would like to see from the e/acc crowd (in terms of fleshing out their ideas):

‘My Model of Beff Jezos's Position: I don't care about this prediction of yours enough to say that I disagree with it. I'm happy so long as entropy increases faster than it otherwise would. I have temporarily unblocked @BasedBeffJezos in case he has what I'd consider a substantive response to this, such as, "I actually predict, as an empirical fact about the universe, that AIs built according to almost any set of design principles will care about other sentient minds as ends in themselves, and look on the universe with wonder that they take the time and expend the energy to experience consciously; and humanity's descendants will uplift to equality with themselves, all those and only those humans who request to be uplifted; forbidding sapient enslavement or greater horrors throughout all regions they govern; and I hold that this position is a publicly knowable truth about physical reality, and not just words to repeat from faith; and all this is a crux of my position, where I'd back off and not destroy all humane life if I were convinced that this were not so."’

For context, and this is perhaps a historical curiosity, e/acc draws heavily on the ‘accelerationist’ cluster of ideas incubated at University of Warwick in the mid-1990s under the apocryphal Cybernetic Culture Research Unit (CCRU). It was unusually fecund, in that the original CCRU version take on accelerationism (insofar as it was documented/codified), splintered into left, right, far-right, unconditional, and a number of other variants, an ambiguous pantheon now joined by e/acc. The wikipedia page is a good start, as is this post and this article on Nick Land (a founding figure, subsequently shunned owing to far-right views). Excerpts of foundational texts can be found in the Accelerationist Reader. Rather than a coherent philosophy, accelerationism is perhaps better viewed as a generative meta-meme that was (and clearly still seems to be) particularly influential in art and popular culture.

Christiano: ‘humans lose control over the future’

Paul Christiano’s views (on the topic of extinction) seem to coalesce around: a) a universe of ‘value plurality’ i.e. where human values become merely one amongst many, is a bad one, b) a timeline where humans ‘lose control over the future’ is a bad timeline. These are most concretely discussed in the context of worlds that resemble today’s (i.e. with states, corporations, biological humans, etc.), and reflect on risks that arise from the interaction of our socio-economic structures (i.e. relatively laissez-faire capitalism), powerful technology, incompetent regulation, coordination problems, and variants of Goodhart’s Law.

However, in an intriguing 2018 post, Christiano does take a more speculative view: he relaxes his bias in favour of biological humans and our values/systems (i.e. entertains the view that our superintelligent successors inheriting the future might be okay if that is the only way our values might persist), but seems to punt the difficult questions as to the definition of words like ‘value’ and ‘niceness’.

Dan Hendrycks: evolutionary pressures disfavour humans

I wanted to briefly touch on Dan Hendrycks’ perspective, specifically his point-by-point rebuttal of e/acc views. His rebuttal references this 2023 paper, which takes as given that humans should prefer to keep control over the future and not become extinct.

Hendrycks suggests that, owing to social and technological pressures that manifest through rapid variation/proliferation (of ensembles of agentic AI systems) into competitive deployment environments, there may emerge forces akin to natural selection. This selection may favour selfish behaviour (on the part of AI systems), without the restraints that altruism, kin selection, cooperation, moral norms have historically provided for humans and some animals. When combined with their greater effectiveness in changing the world, it seems probable that AIs would collectively outcompete humans. This feels like an evolutionary treatment of points made here by Andrew Critch and analysed here.

On an initial read, I can’t find anything in Hendrycks about the possible balance of cooperation/competition between AI systems - i.e. do similar evolutionary pressures result in fratricidal conflict (in which humans are likely collateral damage), or do they indeed solve coordination problems better than humans (owing to source code transparency or a decision theory appropriate to their architecture and deployment environment) and mostly avoid conflict with each other.

If there is a chance that AI systems enter into fratricidal conflict, then it seems harder to argue that they necessarily will be ‘more effective at finding/extracting free energy from the universe and converting it to utility at grander and grander scales’, as e/acc suggests. They might just waste resources indefinitely. Absent a stronger argument, it feels like (on this point) e/acc might fail on its own terms.

Human evolution to some other substrate

Hendrycks’ evolutionary framing is clearly bad for humans. However, other evolutionary narratives can be more positive. One such vision is the possibility that humanity disappears as a species that is biologically similar to us, but undergoes an evolution perhaps onto some other (in)organic substrate, going as far as becoming fully embedded into the ‘natural’ environment. This view has been articulated by a number of people: Robin Hanson’s works, Richard Sutton, James Lovelock, Joscha Bach, (albeit at a stretch) Donna Haraway, Derek Shiller, and Hans Moravec.

Other than Robin Hanson and, perhaps, Joscha Bach, the writers don’t develop the idea of human evolution and transcendence in detail, and one would probably need to go back to the transhuman and posthuman literatures (which mostly pre-date the current wave of AI successes).

Trying to flesh this out is my area of specific interest, so please get in touch if you have thoughts.

Anti-natalists and digital suffering

The positions above mostly deal with existential risk arising as a result of technological or other mishap that befalls humanity. However, it is conceivable that a species might voluntarily go extinct, a group of views that include contemporary anti-natalism (that it is morally wrong to procreate). Anti-natalism has flavours: philosophical anti-natalists such as David Benatar who argue against procreation based on an asymmetry between pleasure and pain (in respect of the created individuals); and misanthropic anti-natalists, who argue against procreation on the basis of harms caused by humans (e.g. to the rest of the natural world), analysed here by Benatar.

I think these are interesting perspectives because they question the relation between value and population: is a world with more humans (subject to significant constraints on the amount of pain, free will, justice, etc.) really better than a world with fewer?

Specifically relevant to AI, see Brian Tomasik, who is explicitly concerned about the possibility of digital suffering, a topic also treated by Thomas Metzinger. Metzinger specifically argues against giving rise to AIs capable of suffering; related points are raised by Nick Bostrom and Carl Shulman in the context of governance and other issues in mixed-societies of humans and AIs (where our historical moral intuitions and social contracts break down in the presence of beings with wider hedonic ranges, rapid population growth and cheap replication/reproduction, vis a vis humans).

Émile P. Torres on the ethics of human extinction

In these two essays (the latter summarises their new book) Torres analyses human extinction extensively. Aside from a sociological study of the history of existential ethics, the relevant part of the Aeon essay is the distinction (which hasn’t often be made in the AI X-risk discourse) between the process of going extinct and the fact of going extinct. Torres also directly distinguishes a world without any humans (and no other human-created intelligence) versus a world where we are replaced (or we evolve) into a machine-based species. Again, they highlight a slight gap in the AI X-risk conversation which is often silent on timeframes (over which extinction might happen), in part presumably because the assumed context is usually one of risks materialising in the next few years or decades. Echoing the anti-natalist position, Torres picks at a foundation of utilitarianism-flavoured longtermism, that I (loosely) summarise as ‘more humans is better than fewer’. They mention the obvious point that if one thinks human lives are predominantly filled with suffering, then a world with more humans doesn’t seem obviously better (unless there is some other dominant source of value).

Takeaways

So where does that leave us in respect of the badness or goodness of human extinction? I see three major factors that might affect one’s views towards extinction.

Whether, and how, we are succeeded matters

Firstly, there seems to be a great difference between a) a perished humanity that leaves behind no intelligent successor, no substantial physical or intellectual artefacts, no creative works, and b) worlds where we are able to leave a legacy (which in Torres’ formulation, could be a biologically or inorganically re-engineered version of ourselves, a successor). The precise shape of that legacy is very unclear, which lends support to Bostrom’s call to preserve option value over the future as well as to Ord’s Long Reflection, though both of these are complicated by the fact that avoiding certain existential risks might itself require massive transformative technological changes (such as, eventually having to ‘roll the dice’ on AGI).

Timing seems to matter

Aside from anti-natalists, I imagine relatively few people would bite the bullet of voluntary extinction, especially if that meant they or their living (((...)great-)grand-)children would perish. This perspective prioritises the (apparent) interests of one’s own self and that of close kin. Others may be quite emotionally attached to the physical structures and intellectual achievements of humans, and may wish to see civilisation persist for some hundreds or thousands of years (see this history of human thought about extinction).

However, we should not insist or expect that biological humans would persist in societies recognisable by us, indefinitely into the future, nor might it be feasible as medical and other technologies advance, extending active or uploaded lives. As Lenman points out, our intuitions about survival are built in a narrative arc roughly comparable to a human lifetime, and we should be suspicious about extending these without some firmer, more impersonal ground.

More fundamentally, from the perspective of Bostrom’s option value arguments one might prefer a distant date for humanity’s extinction. A blunter way of putting it is that actually going extinct, by definition, closes off all other futures. However it gets more complicated when thinking about evolution-as-extinction or other scenarios.

The manner of extinction matters

It seems obvious that an extinction event with much suffering that would not otherwise have been experienced (absent the event) would be worse than a slow process of ‘natural’ dying out (e.g. through depopulation). Similarly, though I don’t focus on it, an extinction event that destroyed much other life on Earth as collateral damage would be worse than an event that mostly affected humans. It is also possible that an event that destroyed everything we have built so far (including accumulated knowledge), such that it might never be recovered or found by some future starfaring alien scouts, would be sad (if not perhaps concretely or quantifiably bad).

Is this actually an urgent question?

This might seem like pointless navel-gazing in light of more salient short- and medium-term risks for misaligned AI. It might also be actively unhelpful: as Christiano points out, some well-motivated concerns (applied thoughtlessly or prematurely) such as in respect of digital suffering, might corrupt our societal reasoning around AI safety and oversight.

However, imagine there exists a GPT-n that we suspect might experience phenomenological states, has goals and ability to construct long-term plans, and (let’s say) passes whatever alignment benchmarks we have at the time. However, we, its designers, decide from a precautionary principle (for whatever reason), to shut it down or not deploy it. In an echo of Stanislaw Lem’s Golem XIV, we would potentially be called upon (either by the machine, its immediate predecessors, or the judgement of history) to explain our reasoning, which might well touch on some of the issues raised above, and even if we can’t give any definitive answers, we may need to show that we have actually thought about it rather than sheepishly assuming a (carbon- or biological-chauvinist) position.

Insofar as you are particularly interested in the plausibility of literal human extinction, you might find the discussion here, here, and here worth reading.

That’s really useful, thank you.

I think the summary of my position on this would be that when we talk about "extinction" we tend to imagine more a violent event. Even if what drove that extinction was a splinter of humanity (e.g. a branch of superhumans, either biologically or cybernetically enhanced), that would still be bad, not only because it implies a lot of death, but because it means the ones that are left are our murderers, and I have a hard time thinking of anyone who would murder me as a worthy successor. If instead humanity gradually all morphed into something else I suppose taxonomically that still counts as the extinction of the species Homo Sapiens, but it's obviously not bad in any particular sense. We already do modify our bodies plenty: we do organ transplants, cosmetic surgery, birth control implants, Lasik, hormonal replacement and sex reassignment. We spend our whole life in a very out-of-distribution context compared to our ancestral environment. In all senses other than the genetic one, Homo Sapiens may already have gone extinct long ago, and we are already something else.

By the way, I am actually working on a post looking exactly at Jeremy England's theory (and at what I would consider its overinterpretation on e/acc's part) from a physics perspective. So look forward to that!

‘My Model of Beff Jezos's Position: I don't care about this prediction of yours enough to say that I disagree with it. I'm happy so long as entropy increases faster than it otherwise would.

This isn't what Beff believes at all.

Maximizing entropy in E/Acc takes approximately the same place as maximizing money takes in Objectivism. It is not inherently good, but it is a strong signal that you are on the right track.

In Objectivism, if you see someone promoting "for the benefit of all mankind", you can probably assume they are a villain. In E/Acc if you see someone promoting deceleration, likewise.

Surely they must mean something like extropy/complexity? Maximizing disorder doesn't seem to fit their vibe.

Thanks, that's very useful.

Speaking about Eliezer's views, and quoting from his tweet you reference

I predict the ASI that wipes us out, and eats the surrounding galaxies, will not want other happy minds around, or even to become happy itself.

I wonder if @Eliezer Yudkowsky has elaborated on his reasons for this particular prediction anywhere.

As extensive as his writings are, I have not encountered his reasoning on this particular point.

I would normally think that a narrowly focused AI with a narrowly formulated "terminal goal" and its power mostly acquired from instrumental convergence would indeed not intrinsically care about much besides that terminal goal.

However, an AI which is formulated in a more "relaxed" and less narrowly focused way, with open-endedness, curiosity, and diversity of experience being part of its fundamental mix of primary goals, seems to be likely to care about other minds and experiences.

So, perhaps, his thinking might be a left-over from the assumption that the winning system(s) will have narrowly formulated "terminal goals". But it would be better if he explains this himself.

Of course, our real goals are much stronger. We would like an option of immortality for ourselves and our loved ones, and our "personal P(doom)" is pretty close to 1 in the absence of drastic breakthroughs, and many of us would really like to also get a realistic shot at strongly decreasing that "personal P(doom)", and that's a fairly tall order, but one many of us would like to pursue.

"Curiosity, and diversity of experience" are very narrow targets, they are no more "relaxed" than "making paperclips".

Why do you think they are narrow? They certainly sound rather wide to me... And their presence in a goal mix does seem to make this mix wider (at least, that's how it feels to me). What would be more wide from your point of view?

But "relaxed" is a bit different, it's about not pressing too hard with one's optimization (the examples we know from our current experience include early stopping in training of AI models, not being fanatical in pushing new social and organizational methods in the society, having enough slack, and so on, all these things are known to be beneficial, and ignoring them is known to cause all kinds of problems; cf. AI safety concerns being closely related to optimizers being too efficient, so, yes, making AIs aware that optimizing too hard is probably not good for themselves either in the long-term sense is important).

(For true safety, for preservation of good entities and phenomena worth preserving, we would also want a good deal of emphasis on conservation, but not so much as to cause stagnation.)

Instrumentally useful mild optimization is different from leaving autonomy to existing people as a target. The former allows strong optimization in some other contexts, or else in aggregate, which eventually leads to figuring out how to do better than the intrumentally useful mild optimization. Preserving autonomy of existing people is in turn different from looking for diversity of experience or happiness, which doesn't single out people who already exist and doesn't sufficiently leave them alone to be said to have meaningfully survived.

Maximizing anything that doesn't include even a tiny component of such pseudokindness results in eventually rewriting existing people with something else that is more optimal, even if at first there are instrumental reasons to wait and figure out how. For this not to happen, an appropriate form of not-rewriting in particular needs to be part of the target. Overall values of superintelligence being aligned is about good utilization of the universe, with survival of humanity a side effect of pseudokindness almost certainly being a component of aligned values. But pseudokindness screens off overall alignment of values on the narrower question of survival of humanity (rather than the broader question of making good use of the universe). (Failing on either issue contributes to existential risk, since both permanently destroy potential for universe-spanning future development according to humane values, making P(doom) unfortunately ambiguous between two very different outcomes.)

Thanks, this is a very helpful comment and links.

I think, in particular, that pseudokindness is a very good property to have, because it is non-anthropocentric, and therefore it is a much more feasible task to make sure that this property is preserved during various self-modifications (recursive self-improvements and such).

In general it seems that making sure that recursively self-modifying systems maintain some properties as invariants is feasible for some relatively narrow class of properties which tend to be non-anthropocentric, and that if some anthropocentric invariants are desirable, the way to achieve that is to obtain them as corollaries of some natural non-anthropocentric invariants.

My informal meta-observation is that writings on AI existential safety tend to look like they are getting closer to some relatively feasible, realistic-looking approaches when they have a non-anthropocentric flavor, and they tend to look impossibly hard when they focus on "human values", "human control", and so on. It is my informal impression that we are seeing more of the anthropocentric focus lately, which might be helpful in terms of creating political pressure, but seems rather unhelpful in terms of looking for actual solutions. I did write an essay which is trying to help to shift (back) to the non-anthropocentric focus, both in terms of fundamental issues of AI existential safety, and in terms of what could be done to make sure that human interests are taken into account: Exploring non-anthropocentric aspects of AI existential safety