This is a cosmic trolley problem: whether to destroy one Earth-sized value now to preserve the possibility of a vaster tomorrow. And then it repeats: do we sacrifice that tomorrow also for the sake of the day after — or billion years after — and so on as long as we discover ever vaster possible tomorrows?
This is one of the standard paradoxes of utilitarianism: if you always sacrifice the present for a greater future, you never get any of those futures.
Hmm, I hadn't thought of the implications of chaining the logic behind the superintelligences policy - thanks for highlighting it!
I guess the main aim of the post was to highlight the existence of an opportunity cost to prioritising contemporary beings and how alignment doesn't solve that issue, but I guess there are also some normative claims that this policy could be justified.
Nevertheless, I'm not sure that the paradox necessarily applies to the policy in this scenario. Specifically, I think
>as long as we discover ever vaster possible tomorrows
doesn't hold. The fact that the accessible universe is finite and there is a finite amount of time before heat death means that there is some ultimate possible tomorrow?
Also, I think that sacrifices of the nature described in the post come in discrete steps with potentially large time differences between them allowing you to realise the gains of a particular future before the next sacrifice if that makes sense.
Epistemic Status: I think the dilemma as outlined in Section 1 follows from well-established ideas about Astronomical Waste. However, given that I have not seen it anywhere before I might have made some oversight I am unaware of. You don't know what you don't know but maybe someone on LessWrong does.
UPDATE: I have found a reference to this scenario as a footnote in Kaj Sotala's Chapter in "Artificial Intelligence Safety and Security" on Disjunctive Scenarios of Catastrophic AI Risk on page 318:
Introduction
The potential dangers posed by a misaligned superintelligence have been extensively explored on this forum. However, in this short post, I will outline a moral dilemma that introduces the possibility that beings who are living on Earth when a superintelligence emerges would be at risk even if the superintelligence is value-aligned[1]. In one sentence; once it can be done safely, the immense benefit of colonising the accessible universe quickly may vastly outweigh the welfare of beings living on Earth at the time colonisation becomes possible, potentially justifying extreme disregard for their welfare.
Now, this is not the life-or-death question facing humanity, but it is potentially a life-or-death question that a generation of humans might have to consider. If nothing else, it is a scenario that may challenge readers to consider some of the critical assumptions that they use to justify longtermism and/or their motivations for working on the alignment problem.
In this post, I will simply outline the dilemma. If the dilemma turns out to be of interest to the community I will write a follow-up post with a more extensive examination of the scenario.
The AI loves you, but you are made of atoms which it can use to help create an inconceivably large number of flourishing transhumans
Consider this: we succeed in creating a superintelligence that carries out actions that are aligned with whatever objective we give it. After thorough consideration, we choose an objective along the lines of “use the available energy in the accessible universe to create the greatest amount of 'human value’". I have a complete theory of human value but she lives in Canada, so for now we will have to settle for the assumption that creating value inherently requires energy. Since every second the available energy in the accessible universe is lost to entropy[2], any delay in harnessing this energy directly opposes our stated objective. Yet, the resources immediately available to the superintelligence to access this energy are finite and some of them will be required to sustain beings who will be living on Earth at the time. This introduces a trade-off between allocating resources between:
Critically, the amount of potential value that could be realised might be incredibly significant. For example, consider a theory of value where value is measured in terms of the number of flourishing sentient beings[3]. In a paper on Astronomical Waste, Nick Bostrom estimated that every second we do not colonise our local galactic supercluster the energy that has the potential to sustain up to 10^29 human[4] lives is lost. This is an unimaginably large number. For reference only around 10^11 humans have ever lived and there would be about 10^18 cm^2 on the surface of two earths. So 10^29 lives is about two Earths packed with all the lives that have ever lived on each square centimetre. And this is per second! Another way to think about this is that a 0.1 attosecond (or about the time it takes for light to travel a quarter of the diameter of a hydrogen atom) delay in colonisation is about equivalent to losing all 7.9 billion people currently living on Earth to entropy.
Given there is such an astronomical amount of potential value that could be realised by space colonisation, then unless we attribute a commensurate worth to forms of value contemporary to the advent of superintelligence, its worth would be negligible by comparison. Consequently, the superintelligence could justifiably enact a policy that disregards the resource requirements of contemporary forms of value for the sake of realising the astronomical amount of potential value. If this policy were enacted, any resources accessible to the superintelligence would be used exclusively towards ends that would further space colonisation. This would very likely include much of the same resources required to sustain life on Earth. Depending on the capability of the superintelligence and the requirements and externalities of the space colonisation process, this policy could pose anything from a small risk to contemporary beings to perhaps even a Terminal Global Catastrophic Risk[5] [6]
This is the dilemma then: how do we deal with this trade-off between the astronomical amount of future potential value accessible through space colonisation and the value contemporary to when space colonisation becomes possible? Moreover, how do we deal with this choice when the trade-off is so large?
Will The Real Longtermists Please Stand Up?
As far as I can see, there are broadly two main questions this dilemma centres around.
The second question largely depends on the nature of superintelligence and space colonisation both of which we know little about[7]. However, if we accept that there may be at least some cost to contemporary beings in realising potential value, the important question becomes how do we evaluate that trade-off?
It is unclear what position a longtermist should take[8]. Trading off potential and contemporary is not a pertinant question for most longtermist causes since usually the welfare of future generations is contingent on contemporary beings. However, in the case where a superintellignece could safeguard future generations via embryos or digital minds, contemporary humans may no longer be necessary for the survival of humanity[9]. In particular, when the existence of these future beings is in direct conflict with contemporary beings, does the maxim of making the longterm future go as well as possible still hold?
For total utilitarians[10] the answer might be clear: we should sacrifice beings contemporary to the emergence of a superintelligence for the sake of the much larger amount of potential value realisable through space colonisation. However, this seems to have a … shall we say abhorrent feeling to it. There seems to be some sort of common sense morality that tells us that causing great harm to existing beings is bad. It even seems like avoiding terminal GCRs should be a requirement of any sound moral theory. Yet at what cost do we adhere to this sense of morality? From the perspective of the potential beings, it may seem far more abhorrent to sacrifice such a large amount of value for the sake of an essentially arbitrary generation that happened to instantiate a superintelligence.
Also critical to the analysis are the facts of the scenario: given the unimaginably large number of beings some normal considerations might not be applicable. For instance, waiting for the voluntary consent of contemporary beings to be saved from catastrophe via uploading could not possibly be done in the 0.1 attoseconds that a commensurate amount of potential beings could be realised[11]. Even if your value system assigned enormous significance to the voluntary consent of contemporary beings, the cost of this decision would be increasingly astronomical as the seconds it took to realise it passed.
Whatever the ultimate position one takes on the trade-off the fact remains that there may be some significant cost to transitioning to a post-singularity future. Whether it is imposed on contemporary beings or future potential value is, for now, left as an exercise to the reader.
This post does not try to take a position on what exactly a solution to the control problem or the alignment problem might look like. We will simply explore the implications of what a reasonable person would consider a solution
The statement simplifies the concept of entropy for clarity. Entropy signifies disorder and the Second Law of Thermodynamics indicates that entropy in an isolated system tends to increase, not necessarily leading to an immediate loss of usable energy but rather its gradual dispersion into less useful forms. The phrase "available energy is lost to entropy" broadly encapsulates the idea that the universe's energy becomes less capable of doing work over time
One could also consider the quality of life of these beings - which could arguably be much higher if their environment is curated from scratch by a superintelligence - but we will just assume that all beings in this analysis have commensurate quality of life
This is the figure for digital humans. If you find this objectionable you can consider biological humans for which the relevant figure is 10^14 human lives per second. Either way the number is unimaginably large
The scope of a risk can be personal (affecting only one person), local (affecting some geographical region or a distinct group), global (affecting the entire human population or a large part thereof), trans-generational (affecting humanity over all, or almost all, future generations). The severity of a risk can be classified as imperceptible (barely noticeable), endurable (causing significant harm but not completely ruining quality of life), or terminal (causing death). Nick Bostrom (2013)
It is hard to extrapolate specific instances of how exactly space colonisation efforts could cause a terminal GCR but given the policy of the superintelligence (disregard the welfare of contemporary beings for the sake of future potential beings) these scenarios are largely similar to those of a misaligned superintelligence e.g. Infrastructure Profusion
One can think about the nature of the budget constraint of a superintelligence as a framework for evaluating this question but that is beyond the scope of this short post.
In the original paper on strong longtermism by Hillary Greeves and William MacAskill, on page 3 the authors state they believe but (notably) do not argue that there exist options that are near-best overall that are not near-best for the near future
Whether or not humanity survives if all humans are destroyed but then new humans are raised somewhere else is a question I will not get into here.
Total Utilitarianism is not the only framework under which you could asses the sacrifice of contemporary beings as a moral/good outcome. For example, the equality principle might favour this outcome. Or a virtue ethics system that values altruism or sacrifice for the greater good.
It is extremely dubious that even uploading alone could be achieved in this timeframe