A Proposed Adjustment to the Astronomical Waste Argument

Nick_Beckstead

This article has been cross-posted at http://effective-altruism.com/.

An existential risk is a risk “that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development,” (Bostrom, 2013). Nick Bostrom has argued that

“[T]he loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole. It may be useful to adopt the following rule of thumb for such impersonal moral action:

Maxipok: Maximize the probability of an “OK outcome,” where an OK outcome is any outcome that avoids existential catastrophe.”

There are a number of people in the effective altruism community who accept this view and cite Bostrom’s argument as their primary justification. Many of these people also believe that the best ways of minimizing existential risk involve making plans to prevent specific existential catastrophes from occurring, and believe that the best giving opportunities must be with charities that primarily focus on reducing existential risk. They also appeal to Bostrom’s argument to support their views. (Edited to add: Note that Bostrom himself sees maxipok as neutral on the question of whether the best methods of reducing existential risk are very broad and general, or highly targeted and specific.) For one example of this, see Luke Muehlhauser’s comment:

“Many humans living today value both current and future people enough that if existential catastrophe is plausible this century, then upon reflection (e.g. after counteracting their unconscious, default scope insensitivity) they would conclude that reducing the risk of existential catastrophe is the most valuable thing they can do — whether through direct work or by donating to support direct work.”

I now think these views require some significant adjustments and qualifications, and given these adjustments and qualifications, their practical implications become very uncertain. I still believe that what matters most about what we do is how our actions affect humanity’s long-term future potential, and I still believe that targeted existential risk reduction and research is a promising cause, but it now seems unclear whether targeted existential risk reduction is the best area to look for ways of making the distant future go as well as possible. It may be and it may not be, and which is right probably depends on many messy details about specific opportunities, as well as general methodological considerations which are, at this point, highly uncertain. Various considerations played a role in my reasoning about this, and I intend to talk about more of them in greater detail in the future. I’ll talk about just a couple of these considerations in this post.

In this post, I argue that:

Though Bostrom’s argument supports the conclusion that maximizing humanity’s long term potential is extremely important, it does not provide strong evidence that reducing existential risk is the best way of maximizing humanity’s future potential. There is a much broader class of actions which may affect humanity’s long-term potential, and Bostrom’s argument does not uniquely favor existential risk over other members in this class.
A version of Bostrom’s argument better supports a more general view: what matters most is that we make path-dependent aspects of the far future go as well as possible. There are important questions about whether we should accept this more general view and what its practical significance is, but this more general view seems to be a strict improvement on the view that minimizing existential risk is what matters most.
The above points favor very broad, general, and indirect approaches to shaping the far future for the better, rather than thinking about very specific risks and responses, though there are many relevant considerations and the issue is far from settled.

I think some prominent advocates of existential risk reduction already agree with these general points, and believe that other arguments, or other arguments together with Bostrom’s argument, establish that direct existential risk reduction is what matters most. This post is most relevant to people who currently think Bostrom’s arguments may settle the issues discussed above.

Path-dependence and trajectory changes

In thinking about how we might affect the far future, I've found it useful to use the concept of the world's development trajectory, or just trajectory for short. The world's development trajectory, as I use the term, is a rough summary way the future will unfold over time. The summary includes various facts about the world that matter from a macro perspective, such as how rich people are, what technologies are available, how happy people are, how developed our science and culture is along various dimensions, and how well things are going all-things-considered at different points of time. It may help to think of the trajectory as a collection of graphs, where each graph in the collection has time on the x-axis and one of these other variables on the y-axis.

With that concept in place, consider three different types of benefits from doing good. First, doing something good might have proximate benefits—this is the name I give to the fairly short-run, fairly predictable benefits that we ordinarily think about when we cure some child's blindness, save a life, or help an old lady cross the street. Second, there are benefits from speeding up development. In many cases, ripple effects from good ordinary actions speed up development. For example, saving some child's life might cause his country's economy to develop very slightly more quickly, or make certain technological or cultural innovations arrive more quickly. Third, our actions may slightly or significantly alter the world's development trajectory. I call these shifts trajectory changes. If we ever prevent an existential catastrophe, that would be an extreme example of a trajectory change. There may also be smaller trajectory changes. For example, if some species of dolphins that we really loved were destroyed, that would be a much smaller trajectory change.

The concept of a trajectory change is closely related to the concept of path dependence in the social sciences, though when I talk about trajectory changes I am interested in effects that persist much longer than standard examples of path dependence. A classic example of path dependence is our use of QWERTY keyboards. Our keyboards could have been arranged in any number of other possible ways. A large part of the explanation of why we use QWERTY keyboards is that it happened to be convenient for making typewriters, that a lot of people learned to use these keyboards, and there are advantages to having most people use the same kind of keyboard. In essence, there is path dependence whenever some aspect of the future could easily have been way X, but it is arranged in way Y due to something that happened in the past, and now it would be hard or impossible to switch to way X. Path dependence is especially interesting when way X would have been better than way Y. Some political scientists have argued that path dependence is very common in politics. For example, in an influential paper (with over 3000 citations) Pierson (2000, p. 251) argues that:

Specific patterns of timing and sequence matter; a wide range of social outcomes may be possible; large consequences may result from relatively small or contingent events; particular courses of action, once introduced, can be almost impossible to reverse; and consequently, political development is punctuated by critical moments or junctures that shape the basic contours of social life.

The concept of a trajectory change is also closely related to the concept of a historical contingency. If Thomas Edison had not invented the light bulb, someone else would have done it later. In this sense, it is not historically contingent that we have light bulbs, and the most obvious benefits from Thomas Edison inventing the light bulb are proximate benefits and benefits from speeding up development. Something analogous is probably true of many other technological innovations such as computers, candles, wheelbarrows, object-oriented programming, and the printing press. Some important examples of historical contingencies: the rise of Christianity, the creation of the US Constitution, and the writings of Karl Marx. Various aspects of Christian morality influence the world today in significant ways, but the fact that those aspects of morality, in exactly those ways, were part of a dominant world religion was historically contingent. And therefore events like Jesus's death and Paul writing his epistles are examples of trajectory changes. Likewise, the US Constitution was the product of deliberation among a specific set of men, the document affects government policy today and will affect it for the foreseeable future, but it could easily have been a different document. And now that the document exists in its specific legal and historical context, it is challenging to make changes to it, so the change is somewhat self-reinforcing.

Some small trajectory changes could be suboptimal

Persistent trajectory changes that do not involve existential catastrophes could have great significance for shaping the far future. It is unlikely that the far future will inherit many of our institutions exactly as they are, but various aspects of the far future—including social norms, values, political systems, and perhaps even some technologies—may be path dependent on what happens now, and sometimes in suboptimal ways. In general, it is reasonable to assume that if there is some problem that might exist in the future and we can do something to fix it now, future people would also be able to solve that problem. But if values or social norms change, they might not agree that some things we think are problems really are problems. Or, if people make the wrong decisions now, certain standards or conventions may get entrenched, and resulting problems may be too expensive to be worth fixing. For further categories of examples of path-dependent aspects of the far future, see these posts by Robin Hanson.

The astronomical waste argument and trajectory changes

Bostrom’s argument only works if reducing existential risk is the most effective way of maximizing humanity’s future potential. But there is no robust argument that trying to reduce existential risk is a more effective way of shaping the far future than trying to create other positive trajectory changes. Bostrom’s argument for the overwhelming importance of reducing existential risk can be summarized as follows:

The expected size of humanity's future influence is astronomically great.
If the expected size of humanity's future influence is astronomically great, then the expected value of the future is astronomically great.
If the expected value of the future is astronomically great, then what matters most is that we maximize humanity’s long-term potential.
Some of our actions are expected to reduce existential risk in not-ridiculously-small ways.
If what matters most is that we maximize humanity’s future potential and some of our actions are expected to reduce existential risk in not-ridiculously-small ways, what it is best to do is primarily determined by how our actions are expected to reduce existential risk.
Therefore, what it is best to do is primarily determined by how our actions are expected to reduce existential risk.

Call that the “astronomical waste” argument.

It is unclear whether premise (5) is true because it is unclear whether trying to reduce existential risk is the most effective way of maximizing humanity’s future potential. For all we know, it could be more effective to try to create other positive trajectory changes. Clearly, it would be better to prevent extinction than to improve our social norms in a way that indirectly makes the future go one millionth better, but, in general, “X is a bigger problem than Y” is only a weak argument that “trying to address X is more important than trying to address Y.” To be strong, the argument must be supplemented by looking at many other considerations related to X and Y, such as how much effort is going into solving X and Y, how tractable X and Y are, how much X and Y could use additional resources, and whether there are subsets of X or Y that are especially strong in terms of these considerations.

Bostrom does have arguments that speeding up development and providing proximate benefits are not as important, in themselves, as reducing existential risk. And these arguments, I believe, have some plausibility. Since we don’t have an argument that reducing existential risk is better than trying to create other positive trajectory changes and an existential catastrophe is one type of trajectory change, it seems more reasonable for defenders of the astronomical waste argument to focus on trajectory changes in general. It would be better to replace the last two steps of the above argument with:

4’ Some of our actions are expected to change our development trajectory in not-ridiculously-small ways.

5’. If what matters most is that we maximize humanity’s future potential and some of our actions are expected to change our development trajectory in not-ridiculously-small ways, what it is best to do is primarily determined by how our actions are expected to change our development trajectory.

6’. Therefore, what it is best to do is primarily determined by how our actions are expected to change our development trajectory.

This seems to be a strictly more plausible claim than the original one, though it is less focused.

In response to the arguments in this post, which I e-mailed him in advance, Bostrom wrote a reply (see the end of the post). The key comment, from my perspective, is:

“Many trajectory changes are already encompassed within the notion of an existential catastrophe. Becoming permanently locked into some radically suboptimal state is an xrisk. The notion is more useful to the extent that likely scenarios fall relatively sharply into two distinct categories---very good ones and very bad ones. To the extent that there is a wide range of scenarios that are roughly equally plausible and that vary continuously in the degree to which the trajectory is good, the existential risk concept will be a less useful tool for thinking about our choices. One would then have to resort to a more complicated calculation. However, extinction is quite dichotomous, and there is also a thought that many sufficiently good future civilizations would over time asymptote to the optimal track.”

I agree that a key question here is whether there is a very large range of plausible equilibria for advanced civilizations, or whether civilizations that manage to survive long enough naturally converge on something close to the best possible outcome. The more confidence one has in the second possibility, the more interesting existential risk is as a concept. The less confidence one has in the second possibility, the more interesting trajectory changes in general are. However, I would emphasize that unless we can be highly confident in the second possibility, it seems that we cannot be confident that reducing existential risk is more important than creating other positive trajectory changes because of the astronomical waste argument alone. This would turn on further considerations of the sort I described above.

Broad and narrow strategies for shaping the far future

Both the astronomical waste argument and the fixed up version of that argument conclude that what matters most is how our actions affect the far future. I am very sympathetic to this viewpoint, abstractly considered, but I think its practical implications are highly uncertain. There is a spectrum of strategies for shaping the far future that ranges from the very targeted (e.g., stop that asteroid from hitting the Earth) to very broad (e.g., create economic growth, help the poor, provide education programs for talented youth), with options like “tell powerful people about the importance of shaping the far future” in between. The limiting case of breadth might be just optimizing for proximate benefits or for speeding up development. Defenders of the astronomical waste argument tend to be on the highly targeted end of this spectrum. I think it’s a very interesting question where on this spectrum we should prefer to be, other things being equal, and it’s a topic I plan to return to in the future.

The arguments I’ve offered above favor broader strategies for shaping the far future, though they don’t settle the issue. The main reason I say this is that the best ways of creating positive trajectory changes may be very broad and general, whereas the best ways of reducing existential risk may be more narrow and specific. For example, it may be reasonable to try to assess, in detail, questions like, “What are the largest specific existential risks?” and, “What are the most effective ways of reducing those specific risks?” In contrast, it seems less promising to try to make specific guesses about how we might create smaller positive trajectory changes because there are so many possibilities and many trajectory changes do not have significance that is predictable in advance. No one could have predicted the persistent ripple effects that Jesus's life had, for example. In other cases—such as the framing of the US Constitution—it's clear that a decision has trajectory change potential, but it would be hard to specify, in advance, which concrete measures should be taken. In general, it seems that the worse you are at predicting some phenomenon that is critical to your plans, the less your plans should depend on specific predictions about that phenomenon. Because of this, promising ways to create positive trajectory changes in the world may be more broad than the most promising ways of trying to reduce existential risk specifically. Improving education, improving parenting, improving science, improving our political system, spreading humanitarian values, or otherwise improving our collective wisdom as stewards of the future could, I believe, create many small, unpredictable positive trajectory changes.

I do not mean to suggest that broad approaches are necessarily best, only that people interested in shaping the far future should take them more seriously than they currently do. The way I see the trade-off between highly targeted strategies and highly broad strategies is as follows. Highly targeted strategies for shaping the far future often depend on highly speculative plans, often with many steps, which are hard to execute. We often have very little sense of whether we are making valuable progress on AI risk research or geo-engineering research. On the other hand, highly broad strategies must rely on implicit assumptions about the ripple effects of doing good in more ordinary ways. It is very subtle and speculative to say how ordinary actions are related to positive trajectory changes, and estimating magnitudes seems extremely challenging. Considering these trade-offs in specific cases seems like a promising area for additional research.

Summary

In this post, I argued that:

The astronomical waste argument becomes strictly more plausible if we replace the idea of minimizing existential risk with the idea of creating positive trajectory changes.
There are many ways in which our actions could unpredictably affect our general development trajectory, and therefore many ways in which our actions could shape the far future for the better. This is one reason to favor broad strategies for shaping the far future.

The trajectory change perspective may have other strategic implications for people who are concerned about maximizing humanity’s long-term potential. I plan to write about these implications in the future.[i]

Comment from Nick Bostrom on this post

[What follows is an e-mail response from Nick Bostrom. He suggested that I share his comment along with the post. Note that I added a couple of small clarifications to this post (noted above) in response to Bostrom's comment.]

One can arrive at a more probably correct principle by weakening, eventually arriving at something like 'do what is best' or 'maximize expected good'. There the well-trained analytic philosopher could rest, having achieved perfect sterility. Of course, to get something fruitful, one has to look at the world not just at our concepts.

Many trajectory changes are already encompassed within the notion of an existential catastrophe. Becoming permanently locked into some radically suboptimal state is an xrisk. The notion is more useful to the extent that likely scenarios fall relatively sharply into two distinct categories---very good ones and very bad ones. To the extent that there is a wide range of scenarios that are roughly equally plausible and that vary continuously in the degree to which the trajectory is good, the existential risk concept will be a less useful tool for thinking about our choices. One would then have to resort to a more complicated calculation. However, extinction is quite dichotomous, and there is also a thought that many sufficiently good future civilizations would over time asymptote to the optimal track.

In a more extended and careful analysis there are good reasons to consider second-order effects that are not captured by the simple concept of existential risk. Reducing the probability of negative-value outcomes is obviously important, and some parameters such as global values and coordination may admit of more-or-less continuous variation in a certain class of scenarios and might affect the value of the long-term outcome in correspondingly continuous ways. (The degree to which these complications loom large also depends on some unsettled issues in axiology; so in an all-things-considered assessment, the proper handling of normative uncertainty becomes important. In fact, creating a future civilization that can be entrusted to resolve normative uncertainty well wherever an epistemic resolution is possible, and to find widely acceptable and mutually beneficial compromises to the extent such resolution is not possible---this seems to me like a promising convergence point for action.)

It is not part of the xrisk concept or the maxipok principle that we ought to adopt some maximally direct and concrete method of reducing existential risk (such as asteroid defense): whether one best reduces xrisk through direct or indirect means is an altogether separate question.

[i] I am grateful to Nick Bostrom, Paul Christiano, Luke Muehlhauser, Vipul Naik, Carl Shulman, and Jonah Sinick for feedback on earlier drafts of this post.

Machine superintelligence appears to be a uniquely foreseeable and impactful source of stable trajectory change.

If you think (as I do) that a machine superintelligence is largely inevitable (Bostrom forthcoming), then it seems our effects on the far future must almost entirely pass through our effects on the development of machine superintelligence.

Someone once told me they thought that giving to the Against Malaria Foundation is, via a variety of ripple effects, more likely to positively affect the development of machine superintelligence than direct work on AI risk strategy and Friendly AI math. I must say I find this implausible, but I'll also admit that humanity's current understanding of ripple effects in general, and our understanding of how MIRI/FHI-style research in particular will affect the world, leaves much to be desired.

So I'm glad that Givewell, MIRI, FHI, Nick Beckstead, and others are investing resources to figure out how these things work.

a machine superintelligence singleton is largely inevitable

So do you think that while we can't be very confident about when AI will be created, we can still be quite confident that it will be created?

...yes? This seems like a quite reasonable epistemic state.

Is there any time line where if it hasn't happened by that point you'd start doubting whether it will occur?

Is there any time line where if it has happened by that point you'd start doubting whether it will occur?

While I acknowledge that this sort of counterintuitive anti-inductivist position has precedent on this site, I suspect you mean "hasn't happened".

Yes, fixed, thank you.

My difficulty imagining a genuinely realistic mechanism of impossibility is such that I want to see the details of how it doesn't happen before I update. I could make up dumb stories but they would be the wrong explanation if it actually happened, because I don't think those dumb stories are actually plausible.

(1) I agree with the grandparent.

(2) Yes, of course. But I feel that there's enough evidence to assign very low probability to AGI not being inventable if humanity survives, but not enough evidence to assign very low probability to it being very hard and taking very long; eyeballing, it might well be thousands of years of no AGI before even considering AGI-is-impossible seriously (assuming that there is no other evidence cropping up why AGI is impossible, besides humanity having no clue how to do it; conditioning on impossible AGI, I would such expect such evidence to crop up earlier). Eliezer might put less weight on the tail of the time-to-AGI distribution and may have to have a correspondingly shorter time before considering impossible AGI seriously.

If we have had von Neumann-level AGI for a while and still no idea how to make a more efficient AGI, my update towards "superintelligence is impossible" would be very much quicker than the update towards "AGI is impossible" in the above scenario, I think. [ETA: Of course I still expect you can run it faster than a biological human, but I can conceive of a scenario where it's within a few orders of magnitude of a von Neumann WBE, the remaining difference coming from the emulation overhead and inefficiencies in the human brain that the AGI doesn't have but that don't lead super-large improvements.]

See my reply to diegocaleiro.

Aron, what makes you think otherwise?

Not sure whether I do think otherwise. But if Luke had written "smarter-than-human machine intelligence" instead, I probably wouldn't have reacted. In comparison, "machine superintelligence singleton" is much more specific, indicating both (i) that the machine intelligence will be vastly smarter than us, and (ii) that multipolar outcomes are very unlikely. Though perhaps there are very convincing arguments for both of these claims.

If you think (as I do) that a machine superintelligence singleton is largely inevitable (Bostrom forthcoming)

I can grant the "machine superintelligence" part as largely inevitable, but why "singleton"? Are you suggesting that Bostrom has a good argument for the inevitability of such a singleton, that he hasn't written down anywhere except in his forthcoming book?

To some degree, yes (I've seen a recent draft). But my point goes through without this qualification, so I've edited my original comment to remove "singleton."

On this specific question (AI risk strategy and math vs. AMF), I have similar intuitions, though maybe with somewhat less confidence. But see my comment exchange with Holden Karnofsky here.

I worry that this post seems very abstract.

The specific case I've made for "just build the damn FAI" does not revolve only around astronomical waste, but subtheses like:

Stable goals in sufficiently advanced self-improving minds imply very strong path dependence on the point up to where the mind is sufficiently advanced, and no ability to correct mistakes beyond that point
Friendly superintelligences negate other x-risks once developed
CEV (more generally indirect normativity) implies that there exists a broad class of roughly equivalently-expectedly-good optimal states if we can pass a satisficing test (i.e., in our present state of uncertainty, we would expect something like CEV to be around as good as it gets, given our uncertainty on the details of goodness, assuming you can build a CEV-SI); there is not much gain from making FAI programmers marginally nicer people or giving them marginally better moral advice provided that they are satisficingly non-jerks who try to build an indirectly normative AI
Very little path dependence of the far future on anything except the satisficing test of building a good-enough FAI, because a superintelligent singleton has enough power to correct any bad inertia going into that point
The Fragility of Value thesis implies that value drops off very fast short of a CEV-style FAI (making the kinds of mistakes that people like to imagine leading to flawed-utopia story outcomes will actually just kill you instantly when blown up to a superintelligent scale) so there's not much point in trying to make things nicer underneath this threshold
FAI is hard (relative to the nearly nonexistent quantity and quality of work that we've seen most current AGI people intending to put into it, or mainstream leaders anticipating a need for, or current agencies funding, FAI is technically far harder than that); so most of the x-risk comes from failure to solve the technical problem
Trying to ensure that "Western democracies remain the most advanced and can build AI first" or "ensuring that evil corporations don't have so much power that they can influence AI-building" is missing the point (and a rather obvious attempt to map the problem into someone's favorite mundane political hobbyhorse) because goodness is not magically sneezed into the AI from well-intentioned builders, and favorite-good-guy-of-the-week is not making anything like a preliminary good-faith-effort to do high-quality work on technical FAI problems, and probably won't do so tomorrow either

You can make a case for MIRI with fewer requirements than that, but my model of the future is that it's just a pass-fail test on building indirectly normative stable self-improving AI, before any event occurs which permanently prevents anyone from building FAI (mostly self-improving UFAI (possibly neuromorphic) but also things like nanotechnological warfare). If you think that building FAI is a done deal because it's such an easy problem (or because likely builders are already guaranteed to be supercompetent), you'd focus on preventing nanotechnological warfare or something along those lines. To me it looks more like we're way behind on our dues.

Eliezer, I see this post as a response to Nick Bostrom's papers on astronomical waste and not a response to your arguments that FAI is an important cause. I didn't intend for this post to be any kind of evaluation of FAI as a cause or MIRI as an organization supporting that cause. Evaluating FAI as a cause would require lots of analysis I didn't attempt, including:

Whether many of the claims you have made above are true
How effectively we can expect humanity in general to respond to AI risk absent our intervention
How tractable the cause of improving humanity's response is
How much effort is currently going into this cause
Whether the cause could productively absorb additional resources
What our leading alternatives are

My arguments are most relevant to evaluating FAI as a cause for people whose interest in FAI depends heavily on their acceptance of Bostrom's astronomical waste argument. Based on informal conversations, there seem to be a number of people who fall into this category. My own view is that whether FAI is a promising cause is not heavily dependent on astronomical waste considerations, and more dependent on many of these messy details.

Mm, k. I was trying more to say that I got the same sense from your post that Nick Bostrom seems to have gotten at the point where he worried about completely general and perfectly sterile analytic philosophy. Maxipok isn't derived just from the astronomical waste part, it's derived from pragmatic features of actual x-risk problems that lead to ubiquitous threshold effects that define "okayness" - most obviously Parfit's "Extinguishing the last 1000 people is much worse than extinguishing seven billion minus a thousand people" but also including things like satisficing indirect normativity and unfriendly AIs going FOOM. The degree to which x-risk thinking has properly adapted to the pragmatic landscape, not just been derived starting from very abstract a priori considerations, was what gave me that worried sense of overabstraction while reading the OP; and that trigged my reflex to start throwing out concrete examples to see what happened to the abstract analysis in that case.

It may be overly abstract. I'm a philosopher by training and I have a tendency to get overly abstract (which I am working on).

I agree that there are important possibilities with threshold effects, such as extinction and perhaps including your point about threshold effects with indirect normativity AIs. I also think that other scenarios, such as Robin Hanson's scenario, other decentralized market/democracy set-ups, and other scenarios we can't think of are live possibilities. More continuous trajectory changes may be very relevant in these other scenarios.

For what it's worth, I loved this post and don't think it was very abstract. Then again, my background is also in philosophy.

I see Nick's post as pointing out a nontrivial minimum threshold that x-risk reduction opportunities need to meet in order to be more promising than broad interventions, even within the astronomical waste framework. I agree that you have to look at the particulars of the x-risk reduction opportunities, and of the broad intervention opportunities, that are on the table, in order to argue for focus on broad interventions. But that's a longer discussion.

I agree but remark that so long as at least one x-risk reduction effort meets this minimum threshold, we can discard all non-xrisk considerations and compare only x-risk impacts to x-risk impacts, which is how I usually think in practice. The question "Can we reduce all impacts to probability of okayness?" seems separate from "Are there mundane-seeming projects which can achieve comparably sized xrisk impacts per dollar as side effects?", and neither tells us to consider non-xrisk impacts of projects. This is the main thrust of the astronomical waste argument and it seems to me that this still goes through.

It's important to note that:

There may be highly targeted interventions (other than x-risk reduction efforts) which can have big trajectory changes (including indirectly improving humans' ability to address x-risks).
With consideration #1 in mind, in deciding whether to support x-risk interventions, one has to consider room for more funding and marginal diminishing returns on investment.

(I recognize that the claims in this comment aren't present in the comment that you responded to, and that I'm introducing them anew here.)

Mm, I'm not sure what the intended import of your statement is, can we be more concrete? This sounds like something I would say in explaining why I directed some of my life effort toward CFAR - along with, "Because I found that really actually in practice the number of rationalists seemed like a sharp limiting factor on the growth of x-risk efforts, if I'd picked something lofty-sounding in theory that was supposed to have a side impact I probably wouldn't have guessed as well" and "Keeping in mind that the top people at CFAR are explicitly x-risk aware and think of that impact as part of their job".

Something along the lines of CFAR could fit the bill. I suspect CFAR could have a bigger impact if it targeted people with stronger focus on global welfare, and/or people with greater influence, than the typical CFAR participant. But I recognize that CFAR is still in a nascent stage, so that it's necessary to cooptimize for the development of content, and growth.

I believe that there are other interventions that would also fit the bill, which I'll describe in later posts.

CFAR is indeed so cooptimizing and trying to maximize net impact over time; if you think that a different mix would produce a greater net impact, make the case! CFAR isn't a side-effect project where you just have to cross your fingers and hope that sort of thing happens by coincidence while the leaders are thinking about something else, it's explicitly aimed that way.

There may be highly targeted interventions (other than x-risk reduction efforts) which can have big trajectory changes (including indirectly improving humans' ability to address x-risks).

This is, more or less, the intended purpose behind spending all this energy on studying rationality rather than directly researching FAI. I'm not saying I agree with that reasoning, by the way. But that was the initial reasoning behind Less Wrong, for better or worse. Would we be farther ahead if rather than working on rationality, Eliezer started working immediately on FAI? Maybe, but but likely not. I could see it being argued both ways. But anyway, this shows an actual, very concrete, example of this kind of intervention.

Another issue is that if you accept the claims in the post, when you are comparing the ripple effects of different interventions, you can't just compare the ripple effects on x-risk. Ripple effects on other trajectory changes are non-negligible as well.

I agree with Jonah's point and think my post supports it.

The main reason to focus on existential risk generally, and human extinction in particular, is that anything else about posthuman society can be modified by the posthumans (who will be far smarter and more knowledgeable than us) if desired, while extinction can obviously never be undone. For example, any modification to the English language, the American political system, the New York Subway or the Islamic religion will almost certainly be moot in five thousand years, just as changes to Old Kingdom Egypt are moot to us now.

The only exception would be if the changes to post-human society are self-reinforcing, like a tyrannical constitution which is enforced by unbeatable strong nanotech for eternity. However, by Bostrom's definition, such a self-reinforcing black hole would be an existential risk.

Are there any examples of changes to post-human society which a) cannot ever be altered by that society, even when alteration is a good idea, b) represent a significant utility loss, even compared to total extinction, c) are not themselves total or near-total extinction (and are thus not existential risks), and d) we have an ability to predictably effect at least on par with our ability to predictably prevent x-risk? I can't think of any, and this post doesn't provide any examples.

After looking at the pattern of upvotes and downvotes on my replies, re-reading these comments, and thinking about this exchange I've concluded that I made some mistakes and would like to apologize.

I didn't acknowledge some important truths in this comment. Surely, the reason people worry more about human extinction than other trajectory changes is because we can expect most possible flaws in civilization to be detected and repaired by people alive at the time, provided the people have the right values and are roughly on the right track. And very plausibly any very specific aspects of most standards will wash out in the long run. Partly this was me getting defensive, and not checking my "combat reflexes," to use a phrase I learned from Julia Galef.

I also didn't really answer the question, though it was a reasonable question. I agree with amcknight that values changes are a plausible example that meets all of the conditions. I think if you had something that was a significant utility loss in comparison with extinction it would be an existential catastrophe by definition. I'll give some other examples of ways in which the future could be flawed, but not repairable or not repairable at reasonable costs. I am still thinking through these issues, but here are some tentative possibilities:

Resource distribution: You have some type of market set-up, and during early stages of a post-human civilization, shares of available resources are parceled out in a way that depends on individual wealth. People who want to spend the resources on good stuff have a smaller share of the wealth, and less resources ever get spent on good stuff. So the future is some fraction worse than it could have been. (This involves a values failure, but maybe you could change the outcome without changing values.)
Legal system: You have a constitutional legal system, and it requires a large supermajority of support to alter it. The current rules inefficiently favor certain classes of people over others. The support necessary to reform it never arrives, and the future is some fraction worse than it could have been. Perhaps you have some similar things happen with social norms, though less formally.
Standards that would have been better in the first place, but aren't worth switching to: Maybe you could have a certain type of hardware or software that was developed first, but wasn't totally optimal. After it is very generally used, there could be switching costs. You could pay them, but perhaps it wouldn't be worth it. (Perhaps the US in a situation like this with the imperial system vs. the metric system.) This could only be an extremely small fraction of the possible value, as far as I can tell. But perhaps there could be many things like this in technology, government, and social norms.

I think I also should have done more to acknowledge the tentative nature of my claims. I certainly do not have quantitative arguments, and I don't see a clear way to develop them. I have some intuitions that further thought on these topics could be fruitful, but I wouldn't want to say that I've settled major issues by making this post. What I want to suggest is that further thought about smaller trajectory changes could lead us to see broad attempts to shape the far future more favorably. It is something I will think harder about in the future and try to defend better.

Value drift fits your constraints. Our ability to drift accelerates as enhancement technologies increase in power. If values drift substantially and in undesirable ways because of, e.g. peacock contests, (a) our values lose what control they currently have (b) could significantly lose utility because of the fragility of value (c) is not an extinction event (d) seems as easy to effect as x-risk reduction.

You wrote:

For example, any modification to the English language, the American political system, the New York Subway or the Islamic religion will almost certainly be moot in five thousand years, just as changes to Old Kingdom Egypt are moot to us now.

I disagree, especially with the religion example. Religions partially involve values and I think values are a plausible area for path-dependence. And I'm not the only one who has the opposite intuition. Here is Robin Hanson:

S – Standards – We can become so invested in the conventions, interfaces, and standards we use to coordinate our activities that we each can’t afford to individually switch to more efficient standards, and we also can’t manage to coordinate to switch together. Conceivably, the genetic code, base ten math, ASCII, English language and units, Java, or the Windows operating system might last for trillions of years.

You wrote:

The only exception would be if the changes to post-human society are self-reinforcing, like a tyrannical constitution which is enforced by unbeatable strong nanotech for eternity. However, by Bostrom's definition, such a self-reinforcing black hole would be an existential risk.

Not all permanent suboptimal states are existential catastrophes, only ones that "drastically" curtail the potential for desirable future development.

You wrote:

Are there any examples of changes to post-human society which a) cannot ever be altered by that society, even when alteration is a good idea, b) represent a significant utility loss, even compared to total extinction, c) are not themselves total or near-total extinction (and are thus not existential risks), and d) we have an ability to predictably effect at least on par with our ability to predictably prevent x-risk? I can't think of any, and this post doesn't provide any examples

It sounds like you are asking me for promising highly targeted strategies for addressing specific trajectory changes in the distant future. One of the claims in this post is that this is not the best way to create smaller trajectory changes. I said:

For example, it may be reasonable to try to assess, in detail, questions like, “What are the largest specific existential risks?” and, “What are the most effective ways of reducing those specific risks?” In contrast, it seems less promising to try to make specific guesses about how we might create smaller positive trajectory changes because there are so many possibilities and many trajectory changes do not have significance that is predictable in advance....Because of this, promising ways to create positive trajectory changes in the world may be more broad than the most promising ways of trying to reduce existential risk specifically. Improving education, improving parenting, improving science, improving our political system, spreading humanitarian values, or otherwise improving our collective wisdom as stewards of the future could, I believe, create many small, unpredictable positive trajectory changes.

For specific examples of changes that I believe could have very broad impact and lead to small, unpredictable positive trajectory changes, I would offer political advocacy of various kinds (immigration liberalization seems promising to me right now), spreading effective altruism, and supporting meta-research.

Religions partially involve values and I think values are a plausible area for path-dependence.

Please explain the influence that, eg., the theological writings of Peter Abelard, described as "the keenest thinker and boldest theologian of the 12th Century", had on modern-day values that might reasonably have been predictable in advance during his time. And that was only eight hundred years ago, only ten human lifetimes. We're talking about timescales of thousands or millions or billions of current human lifetimes.

Conceivably, the genetic code, base ten math, ASCII, English language and units, Java, or the Windows operating system might last for trillions of years.

This claim is prima facie preposterous, and Robin presents no arguments for it. Indeed, it is so farcically absurd that it substantially lowers my prior on the accuracy of all his statements, and it lowers my prior on your statements that you would present it with no evidence except a blunt appeal to authority. To see why, consider, eg., this set of claims about standards lasting two thousand years (a tiny fraction of a comparative eyeblink), and why even that is highly questionable. Or this essay about programming languages a mere hundred years from now, assuming no x-risk and no strong-AI and no nanotech.

For specific examples of changes that I believe could have very broad impact and lead to small, unpredictable positive trajectory changes, I would offer political advocacy of various kinds (immigration liberalization seems promising to me right now), spreading effective altruism, and supporting meta-research.

Do you have any numbers on those? Bostrom's calculations obviously aren't exact, but we can usually get key numbers (eg. # of lives that can be saved with X amount of human/social capital, dedicated to Y x-risk reduction strategy) pinned down to within an order of magnitude or two. You haven't specified any numbers at all for the size of "small, unpredictable positive trajectory changes" in comparison to x-risk, or the cost-effectiveness of different strategies for pursuing them. Indeed, it is unclear how one could come up with such numbers even in theory, since the mechanisms behind such changes causing long-run improved outcomes remain unspecified. Making today's society a nicer place to live is likely worthwhile for all kinds of reasons, but expecting it to have direct influence on the future of a billion years seems absurd. Ancient Minoans from merely 3,500 years ago apparently lived very nicely, by the standards of their day. What predictable impacts did this have on us?

Furthermore, pointing to "political advocacy" as the first thing on the to-do list seems highly suspicious as a signal of bad reasoning somewhere, sorta like learning that your new business partner has offices only in Nigeria. Humans are biased to make everything seem like it's about modern-day politics, even when it's obviously irrelevant, and Cthulhu knows it would be difficult finding any predictable effects of eg. Old Kingdom Egypt dynastic struggles on life now. Political advocacy is also very unlikely to be a low-hanging-fruit area, as huge amounts of human and social capital already go into it, and so the effect of a marginal contribution by any of us is tiny.

Please explain the influence that, eg., the theological writings of Peter Abelard, described as "the keenest thinker and boldest theologian of the 12th Century", had on modern-day values that might reasonably have been predictable in advance during his time. And that was only eight hundred years ago, only ten human lifetimes. We're talking about timescales of thousands or millions or billions of current human lifetimes.

My claim--very explicitly--was that lots of activities could indirectly lead to unpredictable trajectory changes, so I don't see this rhetorical question as compelling. I think it's conventional wisdom that major world religions involve path dependence, so I feel the burden of proof is on those who wish to argue otherwise.

This claim is prima facie preposterous, and Robin presents no arguments for it. Indeed, it is so farcically absurd that it substantially lowers my prior on the accuracy of all his statements, and it lowers my prior on your statements that you would present it with no evidence except a blunt appeal to authority.

You made a claim I disagreed with in a very matter-of-fact way, and I pointed to another person you were likely to respect and said that they also did not accept your claim. This was not supposed to be a "proof" that I'm right, but evidence that it isn't as cut-and-dried as your comments suggested. I honestly didn't think that hard about what he had said. I think if you weaken his claim so that he is saying these things could involve some path dependence, but not that they would last in their present form, then it does seem true to me that this could happen.

Do you have any numbers on those? Bostrom's calculations obviously aren't exact, but we can usually get key numbers (eg. # of lives that can be saved with X amount of human/social capital, dedicated to Y x-risk reduction strategy) pinned down to within an order of magnitude or two. You haven't specified any numbers at all for the size of "small, unpredictable positive trajectory changes" in comparison to x-risk, or the cost-effectiveness of different strategies for pursuing them. Indeed, it is unclear how one could come up with such numbers even in theory, since the mechanisms behind such changes causing long-run improved outcomes remain unspecified. Making today's society a nicer place to live is likely worthwhile for all kinds of reasons, but expecting it to have direct influence on the future of a billion years seems absurd. Ancient Minoans from merely 3,500 years ago apparently lived very nicely, by the standards of their day. What predictable impacts did this have on us?

I don't agree that popular x-risk charities have cost-effectiveness estimates that are nearly as uncontroversial as you claim. I know of no cost-effectiveness estimate for any x-risk organization at all that has uncontroversially been estimated within two orders of magnitude, and it's even rare to have cost-effectiveness estimates for global health charities that are uncontroversial within an order of magnitude.

I also don't see it as particularly damning that I don't have ready calculations and didn't base my arguments on such calculations. I was making some broad, big-picture claims, and using these as examples where lots of alternatives might work as well.

And just to be clear, political advocacy is not my favorite cause. It just seemed like it might be a persuasive example in this context.

Politics of the past did have some massive non-inevitable impacts on the present day. For example, if you believe Jesus existed and was crucified by Roman Prefect Pontius Pilate, then Pilate's rule may have been responsible for the rise of Christianity, which led to the Catholic Church, Islam, the Protestant Reformation, religious wars in Europe, religious tolerance, parts of the Enlightenment, parts of the US constitution, the Holocaust, Israel-Palestine disputes, the 9/11 attacks, and countless other major parts of modern life. Even if you think these things only ultimately matter through their effects on extinction risk, they matter a fair amount for extinction risk.

Where this breaks down is whether these effects were predictable in advance (surely not). But it's plausible there could be states of affairs today that are systematically more conducive to good outcomes than others.

In any event, even if you only want to address x-risk, it may be most effective to do so in the political arena.

Conceivably, the genetic code,

Interestingly, aside from not being present in our inorganic machines, synthetic biologists including George Church are already at work on bioengineered organisms with new genetic codes because of advantages like disease resistance.

Also, it's important to note that when Robin talks about things lasting "a long time" he usually doesn't mean in the sense of astronomical waste and the true long-run, but in economic doublings, i.e. he considers something that lasts for 2 years after whole brain emulations get going to be long-lasting, even if it is a miniscule portion of future history.

I think values are a plausible area for path-dependence. And I'm not the only one who has the opposite intuition. Here is Robin Hanson...

35

A Proposed Adjustment to the Astronomical Waste Argument

35

Path-dependence and trajectory changes

Some small trajectory changes could be suboptimal

The astronomical waste argument and trajectory changes

Broad and narrow strategies for shaping the far future

Summary

Comment from Nick Bostrom on this post

35

35