My guess would be that the most common variety of alien is “unconscious brethren”, followed by “unconscious squiggle maximizer”, then “conscious brethren”, then “conscious squiggle maximizer”.
It might sound odd to call an unconscious entity “brother”, but it's plausible to me that on reflection, humanity strongly prefers universes with evolved-creatures doing evolved-creature-stuff (relative to an empty universe), even if none of those creatures are conscious.
Somehow, thinking of ourselves from the perspective of an unconscious alien really drives home how extremely weird and meaningless-from-other-perspectives alien values are.
Like, we care about this very specific configuration of reflection, and if an entity doesn't have that very specific configuration, that changes its moral value from "a person, of neigh inconceivable moral worth" to "an object or a houseplant; meh whatever." But from the unconscious-alien perspective, this distinction is inane. We look crazy and spastic, that such small differences in algorithm make such an enormous difference to to our sensibilities.
I think that this is good news about trade with other maximally advanced civilizations! Values in the universe might be super orthogonal, and so the gains from trade might be huge! We agree to adjust our computing substrate to have just the right shape of molecular squiggle, and they agree to run their optimization algorithms with just the right flavor of reflection, such that instead of their shard being dead and valueless, it is filled with joyful conscious life. The universe becomes many times more valuable by our lights, and all we had to do was incorporate a design choice that is so insignificant by our lights that we wouldn't have even bothered to pay any attention to it, otherwise.
Predictions, using the definitions in Nate's post:
Strong Utopia | |
Weak Utopia | |
"Pretty good" outcome | |
Conscious Meh outcome | |
Unconscious Meh outcome | |
Weak dystopia | |
Strong dystopia | |
Isn't "misaligned AI" by definition a bad thing and "ASI-boosted humans" by definition a good thing? You're basically asking "How likely is <good outcome> given that we have <a machine that creates good outcomes>"
The definitions given in the post are:
- ASI-boosted humans — We solve all of the problems involved in aiming artificial superintelligence at the things we’d ideally want.
[...]
- misaligned AI — Humans build and deploy superintelligent AI that isn’t aligned with what we’d ideally want.
I'd expect most people to agree that "We solve all of the problems involved in aiming artificial superintelligence at the things we'd ideally want" yields outcomes that are about as good as possible, and I'd expect most of the disagreement to turn (either overtly or in some subtle way) on differences in how we're defining relevant words (like "ideally", "good", and "problems").
I'd be fine with skipping over this question, except that some of the differences-in-definition might be important for the other questions, so this question may be useful for establishing a baseline.
With "misaligned AI", there are some definitional issues but I expect most of the disagreement to be substantive, since there are a lot of different levels of Badness you could expect even if you want to call all misaligned AI "bad" (at least relative to ASI-boosted humans).
In my own answers, I interpreted "misaligned AGI" as meaning: We weren't good enough at alignment to make the AGI do exactly what we wanted, so it permanently took control of the future and did "something that isn't exactly what we wanted" instead. (Which might be kinda similar to what we wanted, or might be wildly different, etc.)
If an alien only cared about maximizing the amount of computronium in the universe, and it built an AI that fills the universe with computronium because the AI values calculating pi, then I think I'd say that the AI is "aligned with that alien by default / by accident", rather than saying "the AI is misaligned with that alien but is doing ~exactly what we want anyway". So if someone thinks AI does exactly what humans want even with humans putting in zero effort to steer the AI toward that outcome, I'd classify that as "aligned-by-default AI", rather than "misaligned AI". (But there's still a huge range of possible-in-principle outcomes from misaligned AI, even if I think some a lot more likely than others.)
All the ASI-boosted humans one feel a bit tricky for me to answer, because it seems possible that we get strong aligned AI, in a distributed takeoff, but that we deploy it unwisely. Namely that world immediately collapses into Moloch, whereby everyone follows their myopic incentives off a cliff.
That cuts my odds of good outcomes by a factor of two or so.
I don't think my responses to this are correct unless normalized to sum to 1. this might be better on manifold.
Moreover, I suspect that it would be good (in expectation) for humans to encounter aliens someday, even though this means that we’ll control a smaller universe-shard.
I suspect this would be a genuinely better outcome than us being alone, and would make the future more awesome by human standards.
I don't get this. If encountering aliens is so great, we could make it happen, even in an empty universe, by simulating evolution (and the development of civilization up to super-intelligence) and then being friends and partners with those alien civilizations, if we want to. [Note that this is in contrast to creating a species intentionally, according to our own spec, which maybe (or maybe not!) leaves out something cool about meeting naturally-evolved aliens.]
Maybe we give those aliens a sizeable fraction of the cosmic endowment to do what they think is good with. (By my lights, I think we do owe them something for creating them, even if we don't like their values very much.)
This seems like a strictly better scenario than encountering aliens?
We have strictly more options in this situation than in the situation where it turned out to be an empty universe. We would prefer to have aliens + a big universe shard, instead of aliens + a smaller universe shard, right?
Encountering evolved aliens, instead of making our own, means that we're hanging out with and trading with whoever evolution happened to spit out, rather than the best possible aliens, given our sense of life.
For a non-ascended species, it might very well be that being forced into a situation that you wouldn't have chosen is actually a secret boon, because what you want is not the same as what is best for you. But if we're positing that being forced by the situation into doing something that you wouldn't otherwise have chosen is actually better...that seems to suggest that civilization has failed in a very deep way and we're not actually optimizing Fun very well at all.
Concretely, wouldn't be better if instead of the ant-people who would murder us all if they could, we shared the universe with something equally alien but much ore the sort of thing that we like?
Or to say it better: If we control the whole reachable universe, we can decide the relative proportions of human-descended minds, human-descended-designed minds, and luck-of-the-draw evolved alien minds. And we can balance that proportion to be optimal for cosmopolitan value and Fun.
Isn't that obviously better than having most of that division determined by luck, and forced upon us, regardless of what is optimal?
Really great overview. I’ll probably draw points from this in an explainer I’m going to write for a new audience.
Moreover, it’s observably the case that consciousness-ascription is hyperactive. We readily see faces and minds in natural phenomena. We readily imagine simple stick-figures in comic strips experiencing rich mental lives.
A concern I have with the whole consciousness discussion in EA-adjacent circles is that people seem to consider their empathic response to be important evidence about the distribution of qualia in Nature, despite the obvious hyperactivity.
This post is the single most persuasive piece of writing that I have encountered with regard to talking me out of my veganism.
Particularly the idea that humans having conscious experience being a contingent fact of human evolution such that other, related, intelligent species in nearby counterfactuals don't have anything that it is like to be them.
Considering that possibility, which seems hard to evaluate, given that I only have one datapoint (which is obviously influenced by anthropic considerations) makes it seem much more plausible that there's nothing that it is like to be a cow, and gives me a sense of a planet earth that is much more dead and empty than the mental world I had been inhabiting 10 minutes ago.
If you cared to write up more of your understanding of "somebody being home", I would read it with avid interest. It seems more likely than anything else that I can think of (aside from perhaps a similar post by Eliezer?) to change my mind with regards to veganism and how I should weigh the values of animals in factory farms in my philanthropic budget.
That said, I, at least, am not making this error, I think:
Another concern I have is that most people seem to neglect the difference between “exhibiting an external behavior in the same way that humans do, and for the same reasons we do”, and “having additional follow-on internal responses to that behavior”.
An example: If we suppose that it’s very morally important for people to internally subvocalize “I sneezed” after sneezing, and you do this whenever you sneeze, and all your (human) friends report that they do it too, it would nonetheless be a mistake to see a dog sneeze and say: “See! They did the morally relevant thing! It would be weird to suppose that they didn’t, when they’re sneezing for the same ancestral reasons as us!”
The ancestral reasons for the subvocalization are not the same as the ancestral reasons for the sneeze; and we already have an explanation for why animals sneeze, that doesn’t invoke any process that necessarily produces a follow-up subvocalization.
None of this rules out that dogs subvocalize in a dog-mental-language, on its own; but it does mean that drawing any strong inferences here requires us to have some model of why humans subvocalize.
Seeing a pig "scream in pain", when you cut off its tail does not make it a foregone conclusion that the pig is experiencing anything at all or something like what pain means to me. But it does seem like a pretty good guess.
And I definitely don't look at turtle doing any kind of planning at all and think "there must be an inner life in there!"
I'm real uncertain about what consciousness is and where it comes from, and there is an anthropic argument (which I don't know how to think clearly about) that it is rare among animals. But from my state of knowledge, it seems like a better than even chance that many mammals have some kind of inner listener. And if they have an inner listener at all, pain seems like one of the simplest and most convergent experiences to have.
Which makes industrial factory farming an unconscionable atrocity, much worse than American slavery. It is not okay to treat conscious beings like that, no matter how dumb they are, or how little they narrativize about themselves.
My understanding is that (assuming animal consciousness), there are 100 billion experience-years in factory farms every year.
It seems to me that, in my state of uncertainty, it is extremely irresponsible to say "eh, whatever" to the possible moral atrocity. We should shut up and multiply. My uncertainty about animal consciousness only reduces the expected number of experience-years of torture by a factor of 2 or so.
An expected 50 billion moral patients getting tortured as a matter of course is the worst moral catastrophe perpetrated by humans ever (with the exception of our rush to destroy all the value of the future).
Even if someone has more philosophical clarity than I do, they have to be confident at at a level of around 100,000 to 1 that livestock animals are not experiencing beings, before the expected value of this moral catastrophe starts being of comparable scale to well-known moral catastrophes like the Holocaust, and American slavery, and the Mongol invasion of the world. Anything less than that, and the expected value of industrial meat production is beating every other moral catastrophe by orders of magnitude (again, with the exception of x-risk).
(Admittedly there are some assumptions here about the moral value of pain and fear, relative to other good and bad things that can happen to a person, which might influence how we weight the experiences of animals compared to people. But "pain and terror are really bad, and it is really bad for someone to persistently experience them" seems like a not-very-crazy assumption.)
Anyway, this is a digression from the point of this post, but I apparently had a rant in me, and I don't want animal welfare considerations to be weak-maned. A concern for animal welfare isn't fundamentally based on shoddy philosophy. It seems to me that it is a very natural starting point, given our state of philosophical confusion.
EDIT: Added in the correct links.
Assuming Yudkowsky's position is quite similair to Nate's, which it sounds like given what both have written, I'd recommend reading this debate Yud was in to get a better understanding of this model[1]. Follow up on the posts Yud, Luke and Rob mention if you'd care to know more. Personally, I'm closer to Luke's position on the topic. He gives a clear and thorough exposition here.
Also, I anticipate that if Nate does have a fullly fleshed out model, he'd be reluctant to share it. I think Yud said he didn't wish to give too many specifics as he was worried trolls might implement a maximally suffering entity. And, you know, 4chan exists. Plenty of people there would be inclined to do such a thing to signal disbelief or simply to upset others.
I think this kind of model would fall under the illusionism school of thought. "Consciousness is an illusion" is the motto. I parse it as "the concept you have of consciousness is an illusion, a persistent part of your map that doesn't match the territory. Just as you may be convinced these tables are of different shapes, even after rotating and matching them onto one another, so too may you be convinced that you have this property known as consciousness." That doesn't mean the territory has nothing like consciousness in it, just that it doesn't have the exact form you believed it to. You can understand on a deliberative level how the shapes are the same and the process that generates the illusion whilst still experiencing the illusion. EDIT: The same for your intuition that "consciousness has to be more than an algorithm" or "more than matter" or so on.
Huh. I'm a bit surprised. I guess I thought that since a lot of the stuff I've read by Eliezer seems heavily influenced by Dennet. And he's also a physicalist. His approach also seems to be "explain our claims about consciousness". Plus there's all the stuff about self reflection, how an algorithm feels from the inside etc. I guess I was just bucketing that stuff together with (weak) illusionism. After writing that out, I can see how those points doesn't imply illusionism. Does Eliezer think we can save the phenomena of consciousness and hence it calling it an illusion is a mistake? Or is there something else going on there?
I think Dennett's argumentation about the hard problem of consciousness has usually been terrible, and I don't see him as an important forerunner of illusionism, though he's an example of someone who soldiered on for anti-realism about phenomenal consciousness for long stretches of time where the arguments were lacking.
I think I remember Eliezer saying somewhere that he also wasn't impressed with Dennett's takes on the hard problem, but I forget where?
His approach also seems to be "explain our claims about consciousness".
There's some similarity between heterophenomenology and the way Eliezer/Nate talk about consciousness, though I guess I think of Eliezer/Nate's "let's find a theory that makes sense of our claims about consciousness" as more "here's a necessary feature of any account of consciousness, and a plausibly fruitful way to get insight into a lot of what's going on", not as an argument for otherwise ignoring all introspective data. Heterophenomenology IMO was always a somewhat silly and confused idea, because it's proposing that we a priori reject introspective evidence but it's not giving a clear argument for why.
(Or, worse, it's arguing something orthogonal to whether we should care about introspective evidence, while winking and nudging that there's something vaguely unrespectable about the introspective-evidence question.)
There are good arguments for being skeptical of introspection here, but "that doesn't sound like it's in the literary genre of science" should not be an argument that Bayesians find very compelling.
And we should expect the time machine and the infrastructure it builds to be well-defended, since "you can't make the coffee if you're dead"
Does that follow? The time machine doesn't do any planning. So I would expect that in one timeline, something happens that accidentally drops an anvil on the time machine, breaking the reset mechanism, and there's no more time loops after that.
Indeed, in practice, I expect this time machine to optimize to destroy itself, not to fill the universe with paperclips.
The "anvil dropped on the time machine" scenario seems like a much more probable outcome that technically satisfies the optimization criteria, which was not "the universe is filled with paperclips" but "the time machine stops running, either because the the paperclip classifier evaluates this timeline to have maxed out the paperclips or for any other reason." (In exactly the same way that the outcome pump in this post has the true criterion "the Emergency Regret button was not pushed", and not "the user is satisfied with the outcome.")
In order for this optimizer to actually be fearsome, without doing any learning or steering, the timeline resetting mechanism would need to be supernaturally immune to harm.
Curated. I like the central thesis of this post, but a further point I like about it is it takes the conversation beyond a simple binary of "are we doomed or not?", and "how doomed are we?" to a more interesting discussion of possible outcomes, their likelihoods, and the gears behind them. And I think that's epistemically healthy. I think it puts things into a mode of "make predictions for reasons" over "argue for a simplified position". Plus, this kind of attention to values and their origins is also one thing I think that hasn't gotten as much airtime on LessWrong and is important, both in remembering what we're fighting for (in very broad terms) and how we need to fight (i.e. what's ok to build).
Hi Nate, great respect. Forgive a rambling stream-of-consciousness comment.
Without the advantages of maxed-out physically feasible intelligence (and the tech unlocked by such intelligence), I think we would inevitably be overpowered.
I think you move to the conclusion "if humans don't have AI, aliens with AI will stomp humans" a little promptly.
Hanson's estimate of when we'll meet aliens is 500 million years. I know very little about how Hanson estimated that & how credible the method is, and you don't appear to either: that might be worth investigating. But—
One million years is ten thousand generations of humans as we know them. If AI progress were impossible under the heel of a world-state, we could increase intelligence by a few points each generation. This already happens naturally and it would hardly be difficult to compound the Flynn effect.
Surely we could hit endgame technology that hits the limits of physical possibility/diminishing returns in one million years, let alone five hundred of those spans. You are aware of all we have done in just the past two hundred years — we can expect invention progress to eventually decelerate as untapped invention space narrows, but when that finally outweighs the accelerating factors of increasing intelligence and helpful technology it seems likely that we will already be quite close to finaltech.
In comparative terms, a five hundred year sabbatical from AI would reduce the share of resources we could reach by an epsilon only, and if AI safety premises are sound then it would greatly increase EV.
This point is likely moot, of course. I understand that we do not live in a totalitarian world state and your intent is just to assure people that AI safety people are not neoluddites. (I suppose one could attempt to help a state establish global dominance, then attempt to steer really hard towards AI-safety, but that requires two incredible victories for sufficiently murky benefits such that you'd have to be really confident of AI doom and have nothing better to try.)
a guess that a fair number of alien species are smarter, more cognitively coherent, and/or more coordinated than humans at the time they reach our technological level. (E.g., a hive-mind species would probably have an easier time solving alignment, since they wouldn’t need to rush.)
Anyway this was fun to think about have a good day!! :D
Thanks for the comment, Amelia! :)
One million years is ten thousand generations of humans as we know them. If AI progress were impossible under the heel of a world-state, we could increase intelligence by a few points each generation. This already happens naturally and it would hardly be difficult to compound the Flynn effect.
I think the "unboosted humans" hypothetical is meant to include mind-uploading (which makes the generation time an underestimate), but we're assuming that the simulation overlords stop us from drastically improving the quality of our individual reasoning.
Nate assigns "base humans, left alone" an ~82% chance of producing an outcome at least as good as “tiling our universe-shard with computronium that we use to run glorious merely-human civilizations", which seems unlikely to me if we can't upload humans at all. (But maybe I'm misunderstanding something about his view.)
Surely we could hit endgame technology that hits the limits of physical possibility/diminishing returns in one million years, let alone five hundred of those spans.
I think we hit the limits of technology we can think about, understand, manipulate, and build vastly earlier than that (especially if we have fast-running human uploads). But I think this limit is a lot lower than the technologies you could invent if your brain were as large as the planet Jupiter, you had native brain hardware for doing different forms of advanced math in your head, you could visualize the connection between millions of different complex machines in your working memory and simulate millions of possible complex connections between those machines inside your own head, etc.
Even when it comes to just winning a space battle using a fixed pool of fighters, I expect to get crushed by a superintelligence that can individually think about and maneuver effectively arbitrary numbers of nanobots in real time, versus humans that are manually piloting (or using crappy AI to pilot) our drones.
In comparative terms, a five hundred year sabbatical from AI would reduce the share of resources we could reach by an epsilon only, and if AI safety premises are sound then it would greatly increase EV.
Oh, agreed. But we're discussing a scenario where we never build ASI, not one where we delay 500 years.
This point is likely moot, of course. I understand that we do not live in a totalitarian world state and your intent is just to assure people that AI safety people are not neoluddites.
Yep! And more generally, to share enough background model (that doesn't normally come up in inside-baseball AI discussions) to help people identify cruxes of disagreement.
I suppose one could attempt to help a state establish global dominance
Seems super unrealistic to me, and probably bad if you could achieve it.
A different scenario that makes a lot more sense, IMO, is an AGI project pairing with some number of states during or after an AGI-enabled pivotal act. But that assumes you've already solved enough of the alignment problem to do at least one (possibly state-assisted) pivotal act.
I think there's kind of a lot of room between 95% of potential value being lost and 5%!!
My intuition is that capturing even 1% of the future's total value is an astoundingly conjunctive feat -- a narrow enough target that it's surprising if we can hit that target and yet not hit 10%, or 99%. Think less "capture at least 1% of the negentropy in our future light cone and use it for something", more "solve the first 999 digits of a 1000-digit decimal combination lock specifying an extremely complicated function of human brain-states that somehow encodes all the properties of Maximum Extremely-Weird-Posthuman Utility".
(This is based on the idea that even if the alignment problem is solved such that we know how to specify a goal rigorously to an AI, it doesn't follow that the people who happen to be programming the goal will be selfless. You work in AI so presumably you have practiced rebuttals to this concept; I do not so I'll state my thought but be clear that I expect this is well-worn territory to which I expect you to have a solid answer.)
Why do they need to be selfless? What are the selfish benefits to making the future less Fun for innumerable numbers of posthumans you'll never meet or hear anything about?
(The future light cone is big, and no one human can interact with very much of it. You swamp the selfish desires of every currently-living human before you've even used up the negentropy in one hundredth of a single galaxy. And then what do you do with the rest of the universe? We aren't guaranteed to use the rest of the universe well, but if we use it poorly the explanation probably can't be "selfishness".)
It seems to assume that things like hive-mind species are possible or common, which I don't have information about but maybe you do.
I dunno Nate's reasoning, but AFAIK the hive-mind thing may just be an example, rather than being central to his reasoning on this point.
I like this text but I find your take on Fermi paradox wholly unrealistic.
Let's even assume, for the sake of the argument, that both P(life) and P(sapience|life) are bigger than 1/googol (though why?) so your hunch on how many planets originally evolve sapient aliens is broadly correct. A very substantial part of alternative histories of the last century (I wanted to say "most" but most, of course, is uninteresting differences such as whether a random human puts a right shoe or a left shoe on first) result in humanity dead or thrown into possibly-irrecoverable barbarism. The default take for aliens that have evolved is to fail their version of Berlin crisis, or Caribbean crisis, or whatever other near-total-destruction situation we've had even without AI (not necessarily with nuclear weapons, mind you - say, what if instead of pretty-harmless-in-comparison COVID we got a sterilizing virus on the loose that kills genitalia instead of osmotic nerves? Since its method of proliferation does not depend on the host's ability to procreate, you could imagine sterilized population of the planet). And then you tack on the fact that you also predict very high chance of AGI ruin; so most of the hypothetical aliens that survived the kind of hurdles humanity somehow survived (again, with possibly totally different specifics) are replaced by misaligned AGI, throwing a huge hurdle into the cosmopolitan result you predict - meeting paperclip-maximiser built by ant-people is more likely than meeting ant-people themselves, given your background beliefs.
Much of the value of alien civilizations might well come from the interaction of their civilization and ours, and from the fairness (which may well turn out to be a major terminal human value) of them getting their just fraction of the universe.
Won't the size of the universe-shard that a civilization controls be determined entirely by how early or late they started grabbing galaxies? Which is itself almost entirely determined by how early or late they evolved?
That doesn't sound like a fair distribution to me.
I guess we could redistribute some of our galaxies to civilizations that were less lucky than ours in the race, but I wouldn't expect the same treatment from those that are are more lucky than us...I think. Maybe causal trade / a veil of ignorances does end up equalizing thing here.
I was surprised by Nate's high confidence in Unconscious Meh given misaligned ASI. Other people also seem to be quite confident in the same way. In contrast, my own ass-numbers for {the misaligned ASI scenario} are something like
(And it would be closer to 50-50 between Unconscious Meh and Weak Dystopia, before I take into account others' views.)
In a lossy nutshell, my reasons for the relatively high Weak Dystopia probabilities are something like
I'm very curious about why people have high confidence in {Unconscious Meh given misaligned ASI}, and why people seem to assign such low probabilities to {(Weak) Dystopia given misaligned ASI}.
many approaches to training AGI currently seem to have as a training target something like "learn to predict humans", or some other objective that is humanly-meaningful but not-our-real-values,
I don't know whether this will continue in the future (all the way up to AGI). If it does, then it strikes me as a sufficiently coarse-grained approach (that's bad enough at inner alignment, and bad enough at outer-alignment-to-specific-things-we-actually-care-about) that I'd still be pretty surprised if the result (in the limit of superintelligence) bears any resemblance to stuff we care much about, good or bad.
E.g., there are many more "unconscious configurations of matter that bear some relation to things you learn in trying to predict humans" than there are "conscious configurations of matter that bear some relation to things you learn in trying to predict humans". Building an entire functioning conscious mind is still a very complicated end-state that requires getting lots of bits into the AGI's terminal goals correctly; it doesn't necessarily become that much easier just because we're calling the ability we're training "human prediction". Like, a superintelligent paperclipper would also be excellent at the human prediction task, given access to information about humans.
(I'll also mention that I think it's a terrible idea for safety-conscious AI researchers to put all their eggs in the "train AI via lots of data on humans" basket. But that's a separate question from what AI researchers are likely to do in practice.)
A big chunk of my uncertainty about whether at least 95% of the future’s potential value is realized comes from uncertainty about "the order of magnitude at which utility is bounded". That is, if unbounded total utilitarianism is roughly true, I think there is a <1% chance in any of these scenarios that >95% of the future's potential value would be realized. If decreasing marginal returns in the [amount of hedonium -> utility] conversion kick in fast enough for 10^20 slightly conscious humans on heroin for a million years to yield 95% of max utility, then I'd probably give >10% of strong utopia even conditional on building the default superintelligent AI. Both options seem significantly probable to me, causing my odds to vary much less between the scenarios.
This is assuming that "the future’s potential value" is referring to something like the (expected) utility that would be attained by the action sequence recommended by an oracle giving humanity optimal advice according to our CEV. If that's a misinterpretation or a bad framing more generally, I'd enjoy thinking again about the better question. I would guess that my disagreement with the probabilities is greatly reduced on the level of the underlying empirical outcome distribution.
Since molecular squiggle maximizers and paperclip maximizers both result in a universe-shard that's a boring wasteland, despite the fact that they maximize different things, what's the practical difference between talking about molecular squiggle maximizers instead of paperclip maximizers?
The phrase "paperclip maximizers" was originally intended to be a catch-all term for things analogous to molecular squiggle maximizers. Alas, it often was taken too literally, to be about literal paperclips.
My guess is that the aliens-control-the-universe-shard scenario is net-positive, but that it loses orders of magnitude of cosmopolitan utility compared to the “cognitively constrained humans” scenario.
Why? How?
It seems like something weird is happening if we claim that we expect human values to be more cosmopolitan than alien values. Is that what you're claiming?
That's what he's claiming, because he's claiming "cosmopolitan value" is itself a human value. (Just a very diversity-embracing one.)
Is super-intelligent AI necessarily AGI (for this amazing future), or can it be ANI ?
i.e. why insist on all of the work-arounds we force with pursuing AGI, when, with ANI, don't we already have Safety, Alignment, Corrigibility, Reliability, and super-human ability, today?
Eugene
How are you defining "super-intelligent", "AGI", and "ANI" here?
I'd distinguish two questions:
My guess is that successful pivotal act AI will need to be AGI, though I'm not highly confident of this. By "AGI" I mean "something that's doing qualitatively the right kind of reasoning to be able to efficiently model physical processes in general, both high-level and low-level".
I don't mean that the AGI that saves the world necessarily actually has the knowledge or inclination to productively reason about arbitrary topics -- e.g., we might want to limit AGI to just reasoning about low-level physics (in ways that help us build tech to save the world), and keep the AGI from doing dangerous things like "reasoning about its operators' minds". (Similarly, I would call a human a "general intelligence" insofar as they have the right cognitive machinery to do science in general, even if they've never actually thought about physics or learned any physics facts.)
In the case of CEV-style AI, I'm much more confident that it will need to be AGI, and I strongly expect it to need to be aligned enough (and capable enough) that we can trust it to reason about arbitrary domains. If it can safely do CEV at all, then we shouldn't need to restrict it -- needing to restrict it is a flag that we aren't ready to hand it such a difficult task.
(Note: Rob Bensinger stitched together and expanded this essay based on an earlier, shorter draft plus some conversations we had. Many of the key conceptual divisions here, like "strong utopia" vs. "weak utopia" etc., are due to him.)
I hold all of the following views:
The reason I expect AGI to produce a “valueless wasteland” by default, is not that I want my own present conception of humanity’s values locked into the end of time.
I want our values to be able to mature! I want us to figure out how to build sentient minds in silicon, who have different types of wants and desires and joys, to be our friends and partners as we explore the galaxies! I want us to cross paths with aliens in our distant travels who strain our conception of what’s good, such that we all come out the richer for it! I want our children to have values and goals that would make me boggle, as parents have boggled at their children for ages immemorial!
I believe machines can be people, and that we should treat digital people with the same respect we give biological people. I would love to see what a Matrioshka mind can do.[3] I expect that most of my concrete ideas about the future will seem quaint and outdated and not worth their opportunity costs, compared to the rad alternatives we'll see when we and our descendants and creations are vastly smarter and more grown-up.
Why, then, do I think that it will take a large effort by humanity to ensure that good futures occur? If I believe in a wondrously alien and strange cosmopolitan future, and I think we should embrace moral progress rather than clinging to our present-day preferences, then why do I think that the default outcome is catastrophic failure?
In short:
The practical take-away from the first point is “the AI alignment problem is very important”; the take-away from the second point is “we shouldn’t just destroy ourselves and hope aliens end up colonizing our future light cone, and we shouldn’t just try to produce AI via a more evolution-like process”;[4] and the take-away from the third point is “we shouldn’t just permanently give up on building superintelligent AI”.
To clarify my views, Rob Bensinger asked me how I’d sort outcomes into the following broad bins:
For each of the following four scenarios, Rob asked how likely I think it is that the outcome is a Strong Utopia, a Weak Utopia, etc.:
These probabilities are very rough, unstable, and off-the-cuff, and are “ass numbers” rather than the product of a quantitative model. I include them because they provide somewhat more information about my view than vague words like “likely” or “very unlikely” would.
(If you’d like to come up with your own probabilities before seeing mine, here’s your chance. Comment thread.)
.
.
.
.
.
(Spoiler space)
.
.
.
.
.
With rows representing odds ratios:
“~0” here means (in probabilities) “greater than 0%, but less than 0.5%”. Converted into (rounded) probabilities by Rob:
Below, I’ll explain why my subjective distributions look roughly like this.
Unboosted humans << Friendly superintelligent AI
I don’t think it’s plausible, in real life, that humanity goes without ever building superintelligence. I’ll discuss this scenario anyway, though, in order to explain why I think it would be a catastrophically bad idea to permanently forgo superintelligence.
If humanity were magically unable to ever build superintelligence, my default expectation (ass number: 4:1 odds in favor) is that we’d eventually be stomped by an alien species (or an alien-built AI). Without the advantages of maxed-out physically feasible intelligence (and the tech unlocked by such intelligence), I think we would inevitably be overpowered.
At that point, whether the future goes well or poorly would depend entirely on the alien's / AI’s values, with human values only playing a role insofar as the alien/AI terminally cares about our preferences.
Why think that humanity will ever encounter aliens?
My current tentative take on the Fermi paradox is:
This argument also suggests that we should expect the nearest aliens to be more than 100 million light-years away; and we shouldn't expect aliens to have more of a head start than they are distant. E.g., aliens that evolved a billion years earlier than we did are probably more than a billion light-years away.
This means that even if there are aliens in our future light-cone, and even if those aliens are friendly, there's still quite a lot at stake in humanity’s construction of AGI, in terms of whether the Earth-centered ~250-million-light-year-radius sphere of stars goes towards Fun vs. towards paperclips.
(Robin Hanson has made some related arguments about the Fermi paradox, and various parts of my model are heavily Hanson-influenced. I attribute many of the ideas above to him, though I haven’t actually read his “grabby aliens” paper and don’t know whether he would disagree with any of the above.)
Why think that most aliens succeed in their version of the alignment problem?
I don’t have much of an argument for this, just a general sense that the problem is “hard but not that hard”, and a guess that a fair number of alien species are smarter, more cognitively coherent, and/or more coordinated than humans at the time they reach our technological level. (E.g., a hive-mind species would probably have an easier time solving alignment, since they wouldn’t need to rush.)
I’m currently pessimistic about humanity’s odds of solving the alignment problem and escaping doom, but it seems to me that there are a decent number of disjunctive paths by which a species could be better-equipped to handle the problem, given that it’s strongly in their interest to handle it well.
If I have to put a number on it, I’ll wildly guess that 1/3 of technologically advanced aliens accidentally destroy themselves with misaligned AI.[6]
My ass-number distribution for “how well does the future go if humans just futz around indefinitely?” is therefore the sum of “50% chance we get stomped by evolved aliens, 30% chance we get stomped by misaligned alien-built AI, 20% chance we retain control of the universe-shard”:
As with many of the numbers in this post, I haven’t reflected on these much, and might revise them if I spent more minutes considering them. But, again, I figure unstable numbers are more informative in this context than just saying “(un)likely”.
Why build superintelligence at all?
So that we don’t get stomped by superintelligent aliens or alien AI; and so that we can leverage superhuman intelligence to make the future vastly better.
(Seriously, humans, with our <10 working memory slots, are supposed to match minds that can potentially attend to millions of complex thoughts in their mind at once in all sorts of complex relationships??)
In real life, the reason I’m in a hurry to solve the AI alignment problem is because humanity is racing to build AGI, at which point we’ll promptly destroy ourselves (and all of the future value in our universe-shard) with misaligned AGI, if the tech proliferates much. And AGI is software, so preventing proliferation is hard — hard enough that I haven’t heard of a more promising solution than “use one of the first AGIs to restrict proliferation”. But this requires that we be able to at least align that system, to perform that one act.
In the long run, however, the reason I care about the alignment problem is that “what should the future look like?” is a subtle and important problem, and humanity will surely be able to answer it better if we have access to reliable superintelligent cognition.
(Though “we need superintelligence for this” doesn’t entail “superintelligence will do everything for us”. It's entirely plausible to me that aligned AGI does something like "set up some guardrails for humanity, but then pass lots of the choices about how our future goes back to us", with the result that mere-humans end up having lots of say over how the future looks (including the sorts of weirder minds we build or become).)
The “easy” alignment problem is the problem of aiming AGI at a task that restricts proliferation (at least until we can get our act together as a species).
But the main point of restricting proliferation, from my perspective, is to give humanity as much time as it needs to ultimately solve the “hard” alignment problem: aiming AGI at arbitrary tasks, including ones that are far more open-ended and hard-to-formalize.
Intelligence is our world’s universal problem-solver; and more intelligence can mean the difference between finding a given solution quickly, and never finding it at all. So my default guess is that giving up on superintelligence altogether would result in a future that’s orders of magnitude worse than a future where we make use of fully aligned superintelligence.
Fortunately, I see no plausible path by which humanity would prevent itself from ever building superintelligence; and not many people are advocating for such a thing. (Instead, EAs are doing the sane thing of advocating for delaying AGI until we can figure out alignment.) But I still think it’s valuable to keep the big picture in view.
OK, but what if we somehow don't build superintelligence? And don’t get stomped by aliens or alien AI, either?
My ass-number distribution for that scenario, stated as an odds ratio, is something like:
I.e.:
The outcome’s goodness depends a lot on exactly how much intelligence amplification or AI assistance we allow in this hypothetical; and it depends a lot on whether we manage to destroy ourselves (or permanently cripple ourselves, e.g., with a stable totalitarian regime) before we develop the civilizational tech to keep ourselves from doing that.
If we really lock down on human intelligence and coordination ability, that seems real rough. But if there's always enough freedom and space-to-expand that pilgrims can branch off and try some new styles of organization when the old ones are collapsing under their bureaucratic weight or whatever, then I expect that eventually even modern-intelligence humans start capturing lots and lots of the stars and converting lots and lots of stellar negentropy into fun.[7]
If you don't have the pilgrimage-is-always-possible clause, then there's a big chance of falling into a dark attractor and staying there, and never really taking advantage of the stars.
In constraining human intelligence, you’re closing off the vast majority of the space of exploration (and a huge fraction of potential value). But there’s still a lot of mindspace to explore without going too far past current intelligence levels.
In good versions of this scenario, a lot of the good comes from humans being like, “I guess we do the same things the superintelligence would have done, but the long way.” Humanity has to do the work that a superintelligence would naturally handle at a lot of junctures to make the future eudaimonic.
It’s probably possible to eventually do a decent amount of that work with current-human minds, if you have an absurdly large number of them collaborating just right, and if you’re willing to go very slow. (And if humanity hasn’t locked itself into a bad state.)
I’ll note in passing that the view I’m presenting here reflects a super low degree of cynicism relative to the surrounding memetic environment. I think the surrounding memetic environment says "humans left unstomped tend to create dystopias and/or kill themselves", whereas I'm like, "nah, you'd need somebody else to kill us; absent that, we'd probably do fine". (I am not a generic cynic!)
Still, ending up in this scenario would be a huge tragedy, relative to how good the future could go.
A different way of framing the question “how good is this scenario?” is “would you rather really quite a lot of the alien ant-queen’s will, or a smidge of poorly-implemented fun?”.
In that case, I suspect (non-confidently) that I’d take the fun over the ant-queen’s will. My guess is that the aliens-control-the-universe-shard scenario is net-positive, but that it loses orders of magnitude of cosmopolitan utility compared to the “cognitively constrained humans” scenario.
To explain why I suspect this, I’ll state some of my (mostly low-confidence) guesses about the distribution of smart non-artificial minds.
Alien CEV << Human CEV
On the whole, I’m highly uncertain about the expected value of “select an evolved alien species at random, and execute their coherent extrapolated volition (CEV) on the whole universe-shard”.
(Quoting Eliezer: "In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.")
My point estimate is that this outcome is a whole lot better than an empty universe, and that the bad cases (such as aliens that both are sentient and are unethical sadists) are fairly rare. But humans do provide precedent for sadism and sentience! And it sure is hard to be confident, in either direction, from a sample size of 1.
Moreover, I suspect that it would be good (in expectation) for humans to encounter aliens someday, even though this means that we’ll control a smaller universe-shard.
I suspect this would be a genuinely better outcome than us being alone, and would make the future more awesome by human standards.
To explain my perspective on this, I'll talk about a few different questions in turn:
How many advanced alien species are sentient?
I expect more overlap between alien minds and human minds, than between AI minds (of the sort we’re likely to build first, using methods remotely resembling current ML) and human minds. But among aliens that were made by some process that’s broadly similar to how humans evolved, it's pretty unclear to me what fraction we would count as "having somebody home” in the limit of a completed science of mind.
I have high enough uncertainty here that picking a median doesn’t feel very informative. I have maybe 1:2 or 1:3 odds on “lots of advanced alien races are sentient” : “few advanced alien races are sentient”, conditional on my current models not including huge mistakes. (And I’d guess there’s something like a 1/4 chance of my models containing huge mistakes here, in which case I’m not sure what my distribution looks like.)
“A nonsentient race that develops advanced science and technology” may sound like a contradiction in terms: How could a species be so smart and yet lack "somebody there to feel things" in the way humans seem to? How could something perform such impressive computations and yet “the lights not be on”?
I won’t try to give a full argument for this conclusion here, as this would require delving into my (incomplete but nontrivial) models of what's going on in humans just before they insist that there's something it's like to be them.[8] (As well as my model of evolutionary processes and general intelligence, and how those connect to consciousness.) But I’ll say a few words to hopefully show why this claim isn’t a wild claim, even if you aren’t convinced of it.
My current best models suggest that the “somebody-is-home” property is a fairly contingent coincidence of our evolutionary history.
On my model, human-style consciousness is not a necessary feature of all optimization processes that can efficiently model the physical world and sort world-states by some criterion; nor is it a necessary feature of all optimization processes that can abstractly represent their own state within a given world-model.
To better intuit the idea of a very smart and yet unconscious process, it might help to consider a time machine that outputs a random sequence of actions, then resets time and outputs a new sequence of actions, unless a specified outcome occurs.
The time machine does no planning, no reflection, no learning, no thinking at all. It just detects whether an outcome occurs, and hits “refresh” on the universe if the outcome didn’t happen.
In spite of this lack of reasoning, this time machine is an incredibly powerful optimizer. It exhibits all the behavioral properties of a reasoner, including many of the standard difficulties of (outer) AI alignment.
If the machine resets any future that isn't full of paperclips, then we should expect it to reset until machinery exists that's busily constructing von Neumann probes for the sake of colonizing the universe and paperclipping it.
And we should expect the time machine and the infrastructure it builds to be well-defended, since "you can't make the coffee if you're dead", and you can’t make paperclips without manufacturing equipment. The optimization process exhibits convergent instrumental behavior and behaves as though it's "trying" to route around obstacles and adversaries, even though there's no thinking-feeling mind guiding it.[9]
You can’t actually build a time machine like this, but the example helps illustrate the fact that in principle, powerful optimization — steering the future into very complicated and specific states of affairs, including states that require long sequences of events to all go a specific way — does not require consciousness.
We recognize that a textbook can store a lot of information and yet not “experience” that information. What’s less familiar is the idea that powerful optimization processes can optimize without “experience”, partly (I claim) because we live in a world where there are many simpler information-storing and computational systems, but where the only powerful optimization processes are humans.
Moreover, we don’t know on a formal level what general intelligence or sentience consists in, so we only have our evolved empathy to help us model and predict the (human) general intelligences in our environment. Our “subjective point of view”, from that empathic perspective, feels like something basic and intrinsic to every mental task we perform, rather than feeling like a complicated set of cogs and gears doing specific computational tasks.
So when something is not only “storing a lot of useful information” but “using that information to steer environments, like an agent”, it’s natural for us to use our native agent-modeling software (i.e., our human-brain-modeling software) to try to simulate its behavior. And then it just “feels as though” this human-like system must be self-aware, for the same reason it feels obvious that you’re conscious, that other humans are conscious, etc.
Moreover, it’s observably the case that consciousness-ascription is hyperactive. We readily see faces and minds in natural phenomena. We readily imagine simple stick-figures in comic strips experiencing rich mental lives.
A concern I have with the whole consciousness discussion in EA-adjacent circles is that people seem to consider their empathic response to be important evidence about the distribution of qualia in Nature, despite the obvious hyperactivity.
Another concern I have is that most people seem to neglect the difference between “exhibiting an external behavior in the same way that humans do, and for the same reasons we do”, and “having additional follow-on internal responses to that behavior”.
An example: If we suppose that it’s very morally important for people to internally subvocalize “I sneezed” after sneezing, and you do this whenever you sneeze, and all your (human) friends report that they do it too, it would nonetheless be a mistake to see a dog sneeze and say: “See! They did the morally relevant thing! It would be weird to suppose that they didn’t, when they’re sneezing for the same ancestral reasons as us!”
The ancestral reasons for the subvocalization are not the same as the ancestral reasons for the sneeze; and we already have an explanation for why animals sneeze, that doesn’t invoke any process that necessarily produces a follow-up subvocalization.
None of this rules out that dogs subvocalize in a dog-mental-language, on its own; but it does mean that drawing any strong inferences here requires us to have some model of why humans subvocalize.
We can debate what follow-on effects are morally relevant (if any), and debate what minds exhibit those effects. But it concerns me that “there are other parts downstream of the sneeze / flinch / etc. that are required for sentience, and not required for the sneeze” doesn’t seem to be in many people’s hypothesis space. Instead, they observe a behavioral analog, and move straight to a confident ascription “the internal processes accompanying this behavior must be pretty similar”.
In general, I want to emphasize that a blank map doesn't correspond to a blank territory. If you currently don't understand the machinery of consciousness, you should still expect that there are many, many details to learn, whether consciousness is prevalent among alien races or rare.
If a machine isn't built to notice how complicated or contingent it is when it does a mental action we choose to call "introspection", it doesn't thereby follow that the machine is simple, or that it can only be built one way.
Our prior shouldn’t be that consciousness is simple, given the many ways it appears to interact with a wide variety of human mental faculties and behaviors (e.g., its causal effects on the words I’m currently writing); and absent a detailed model of consciousness, you shouldn’t treat your empathic modeling as a robust way of figuring out whether an alien has this particular machinery, since the background facts that make empathic inference pretty reliable in humans (overlap in brain architecture, genes, evolutionary history, etc.) don’t hold across the human-alien gap.
Again, I haven’t given my fragments-of-a-model of consciousness here (which would be required to argue for my probabilities). But I’ve hopefully said enough to move my view from “obviously crazy” to “OK, I see how additional arguments could potentially plug in here to yield non-extreme credences on the prevalence of sapient-but-nonsentient evolved optimizers“.
How likely are extremely good and extremely bad outcomes?
If we could list out the things that 90+% of spacefaring alien races have in common, there’s no guarantee that this list would be very long. I recommend stories like Three Worlds Collide and “Kindness to Kin” for their depiction of genuinely different aliens minds, as opposed to the humans in funny suits common to almost all sci-fi.
That said, I do think there’s more overlap (in expectation) between minds produced by processes similar to biological evolution, than between evolved minds and (unaligned) ML-style minds. I expect more aliens to care about at least some things that we vaguely recognize, even if the correspondence is never exact.
On my models, it’s entirely possible that there just turns out to be ~no overlap between humans and aliens, because aliens turn out to be very alien. But “lots of overlap” is also very plausible. (Whereas I don’t think “lots of overlap” is plausible for humans and misaligned AGI.)
To the extent aliens and humans overlap in values, it's unclear to me whether this is mostly working to our favor or detriment. It could be that a random alien world tends to be worse than a random AI-produced world, exactly because the alien shares more goal-content in common with us, and is therefore more likely to optimize or pessimize quantities that we care about.
If I had to guess, though, I would guess that this overlap makes the alien scenario better in expectation than the misaligned-AI scenario, rather than worse.
A special case of “values overlap increases variance” is that the worst outcomes non-human optimizers produce, as well as the best ones, are likely to come from conscious aliens. This is because:
Since I think it’s pretty plausible that most aliens are nonsentient, I expect most alien universe-shards to look “pretty good” or “meh” from a human perspective, rather than “amazing” or “terrible”.
Note that there’s an enormous gap between "pretty dystopian" and "pessimally dystopian". Across all the scenarios (whether alien, human, or AI), I assign ~0% probability to Strong Dystopia, the sort of scenario you get if something is actively pessimizing the human utility function. "Aliens who we'd rather there be nothing than their CEV" is an immensely far cry from "negative of our CEV". But I’d guess that even Weak Dystopias are fairly rare, compared to “meh” or good outcomes of alien civilizations.
How will aliens feel about us?
Given that I think aliens plausibly tend to produce pretty cool universe-shards, a natural next question is: if we encounter a random alien race one day, will they tend to be glad that they found us? Or will they tend to be the sort of species that would have paid a significant number of galaxies to have paved over earth before we ascended, so that they could have had all our galaxies instead?
I think my point estimate there is "most aliens are not happy to see us", but I’m highly uncertain. Among other things, this question turns on how often the mixture of "sociality (such that personal success relies on more than just the kin-group), stupidity (such that calculating the exact fitness-advantage of each interaction is infeasible), and speed (such that natural selection lacks the time to gnaw the large circle of concern back down)" occurs in intelligent races’ evolutionary histories.
These are the sorts of features of human evolutionary history that resulted in us caring (at least upon reflection) about a much more diverse range of minds than “my family”, “my coalitional allies”, or even “minds I could potentially trade with” or “minds that share roughly the same values and faculties as me”.
Humans today don’t treat a family member the same as a stranger, or a sufficiently-early-development human the same as a cephalopod; but our circle of concern is certainly vastly wider than it could have been, and it has widened further as we’ve grown in power and knowledge.
My tentative median guess is that there are a lot of aliens out there who would be grudging trade partners (who would kill us if we were weaker), and also a smaller fraction who are friendly.
I don’t expect significant violent conflict (or refusal-to-trade) between spacefaring aliens and humans-plus-aligned-AGI, regardless of their terminal values, since I expect both groups to be at the same technology level (“maximal”) when they meet. At that level, I don’t expect there to be a cheap way to destroy rival multi-galaxy civilizations, and I strongly expect civilizations to get more of what they want via negotiation and trade than via a protracted war.[10]
I also don’t think humans ought to treat aliens like enemies just because they have very weird goals. And, extrapolating from humanity’s widening circle of concern and increased soft-heartedness over the historical period — and observing that this trend is caused by humans recognizing and nurturing seeds of virtue that they had within themselves already — I don’t expect our descendants in the distant future to behave cruelly toward aliens, even if the aliens are too weak to fight back.[11]
I also feel this way even if the aliens don’t reciprocate!
Like, one thing that is totally allowed to happen is that we meet the ant-people, and the ant-people don’t care about us (and wouldn’t feel remorse about killing us, a la the buggers in Ender’s Game). So they trade with us because they’re not able to kill us, and the humans are like “isn’t it lovely that there’s diversity of values and species! we love our ant-friends” while the aliens are like “I would murder you and lay eggs in your corpse given the slightest opening, and am refraining only because you’re well-defended by force-backed treaty”, and the humans are like “oh haha you cheeky ants” and make webcomics and cartoons featuring cute anthropomorphized ant-people discovering the real meaning of love and friendship and living in peace and harmony with their non-ant-person brothers and sisters.
To which the ant-person response is, of course, “You appear to be imagining empathic levers in my mind that did not receive selection pressure in my EEA. How I long to murder you and lay eggs in your corpse!”
To which my counter-response is, of course: “Oh, you cheeky ants!”
(Respectfully. I don’t mean to belittle them, but I can’t help but be charmed to some degree.)
Like, reciprocity helps, but my empathy and goodwill for others is not contingent upon reciprocation. We can realize the gains from peace, trade, and other positive-sum interactions without being best buddies; and we can like the ants even if the ants don’t like us back.
Cosmopolitan values are good even if they aren't reciprocated. This is one of the ways that you can tell that cosmopolitan values are part of us, rather than being universal: We'd still want to be fair and kind to the ant-folk, even if they were wanting to lay eggs in our corpse and were refraining only because of force-backed treaty.
This is part of my response to protests "why are you looking at everything from the perspective of human values?" Regard for all sentients, including aliens, isn't up for grabs, regardless of whether it's found only in us, or also in them.
How likely (and how good) are various outcomes on the paperclipper-to-brethren continuum?
Short answer: I’m wildly uncertain about how likely various points on this continuum are, and (outside of the most extreme good and bad outcomes) I’m very uncertain about their utility as well.
I expect an alien’s core goals to reflect pretty different shatterings of evolution’s “fitness” goal, compared to core human goals, and compared to other alien races’ goals. (See also the examples in “Niceness is unnatural.”)
I expect most aliens either…
Figuring out the utility of different points on this continuum (from an optimally reasonable and cosmopolitan perspective) seems like a wide-open philosophy and (xeno)psychology question. Ditto for figuring out the probability of different classes of outcomes.
Concretely: I expect that there's a big swath of aliens whose minds and preferences are about as weird and unrecognizable to us as the races in Three Worlds Collide — crystalline self-replicators, entities with no brain/genome segregation, etc. — and that turn out to fall somewhere between “explosive self-replicating process that paperclipped the universe and doesn’t have feelings/experiences/qualia” and “buddies”.
If we cross this question with “how likely are aliens to be conscious?”, we get 2x2 scenarios:
I think my point estimate is "a lot more aliens fall on the very-alien-brethren side than on the squiggle-maximizer side”. But I wouldn’t be surprised to learn I’m wrong about that.
My guess would be that the most common variety of alien is “unconscious brethren”, followed by “unconscious squiggle maximizer”, then “conscious brethren”, then “conscious squiggle maximizer”.
It might sound odd to call an unconscious entity “brother”, but it's plausible to me that on reflection, humanity strongly prefers universes with evolved-creatures doing evolved-creature-stuff (relative to an empty universe), even if none of those creatures are conscious.
Indeed, I consider it plausible that “a universe full of humans trading with a weird extraterrestrial race of crystal formations that don’t have feelings” could turn out to be more awesome than the universe where we never run into any true aliens, even though this means that humans control a smaller universe-shard. It's plausible to me that we'd turn out not to care all that much about our alien buddies having first-person “experiences”, if they still make fascinating conversation partners, have an amazing history and a wildly weird culture, have complex and interesting minds, etc. (The question of how much we care about whether aliens are in fact sentient, as opposed to merely sapient, seems open to me.)
And also, it’s not clear that “feelings” or “experiences” or “qualia” (or the nearest unconfused versions of those concepts) are pointing at the right line between moral patients and non-patients. These are nontrivial questions, and (needless to say) not the kinds of questions humans should rush to lock in an answer on today, when our understanding of morality and minds is still in its infancy.
How should we feel about encountering alien brethren?
Suppose that we judge that the ant-queen is more like a brother, not a squiggle maximizer. As I noted above, I think that encountering alien brethren would be a good thing, even though this means that the descendants of humanity will end up controlling a smaller universe-shard. (And I’d guess that many and perhaps most spacefaring aliens are probably brethren-ish, rather than paperclipper-ish.)
This is not to say that I think human-CEV and alien-CEV are equally good (as humans use the word “good”). It's real hard to say what the ratios are between "human CEV", “unboosted humans”, "random alien CEV (absent any humans)", and "random misaligned AI", but my vague intuition is that there's a big factor drop at each of those steps; and I would guess that this still holds even if we filter out the alien paperclippers and alien unethical sadists.
But it is to say that I think we would be enriched by getting to meet minds that were not ourselves, and not of our own creation. Intuitively, that sounds like an awesome future. And I think this sense of visceral fascination and excitement, the “holy shit that’s cool!” reaction, tends to be an important (albeit fallible) indicator of “which outcomes will we end up favoring upon reflection?”.
It’s a clue to our values that we find this scenario so captivating in our fiction, and that our science fiction takes such a strong interest in the idea of understanding and empathizing with alien minds.
Much of the value of alien civilizations might well come from the interaction of their civilization and ours, and from the fairness (which may well turn out to be a major terminal human value) of them getting their just fraction of the universe.
And in most scenarios like “we meet alien space ants and become trading partners”, I’d guess that the space ants’ own universe-shard probably has more cosmopolitan value than a literally empty universe-shard of the same size. It’s cool, at least! Maybe the ant-queens are even able to experience it, and their experiences are cool; that would make me much more confident that indeed, their universe-shard is a lot better than an empty one. And maybe the ant-queens come pretty close to caring about their kids, in ways that faintly echo human values; who knows?
We should be friendly toward an alien race like that, I claim. But still, I’d expect the vast majority of the cosmopolitan value in a mixed world of humans+ants to come from the humans, and from the two groups’ interaction.
So, for example, my guess is that we shouldn’t be indifferent about whether a particular galaxy ends up in our universe-shard versus an alien neighbor’s shard. (Though this is another question where it seems good to investigate far more thoroughly before locking in a decision.)
And if our reachable universe-shard turns out to be 3x as large and resource-rich as theirs, we probably shouldn’t give them a third of our stars to make it fifty-fifty. I think that humanity values fairness a great deal, but not enough to outweigh the other cosmopolitan value that would be burnt (in the vast majority of cases) if we offered such a gift.[12]
Hold up, how is this “cosmopolitan”?
A reasonable objection to raise here is: “Hold on, how can it be ‘cosmopolitan’ to favor human values over the values of a random alien race? Isn’t the whole point of ‘cosmopolitan value’ that you’re not supposed to prioritize human-specific values over strange and beautiful alien perspectives?”
In short, my response is to emphasize that cosmopolitanism is a human value. If it’s also an alien value, then that’s excellent news; but it’s at least a value that is in us.
When we speak of “better” or “worse” outcomes, we (probably) mean “better/worse according to cosmopolitan values (that also give fair fractions to the human-originated styles of Fun in particular)”, at least if these intuitions about cosmopolitanism hold on reflection. (Which I strongly suspect they do.)
In more detail, my response is:
1. Cosmopolitanism isn’t “indifference” or “take an average of all possible utility functions”.
E.g., a good cosmopolitan should be happier to hear that a weird, friendly, diverse, sentient alien race is going to turn a galaxy into an amazing megacivilization, than to hear that a paperclipper is going to turn a galaxy into paperclips. Cosmopolitanism (of the sort that we should actually endorse) shouldn’t be totally indifferent to what actually happens with the universe.
It’s allowed to turn out that we find a whole swath of universe that is the moral equivalent of "destroyed by the Blight", which kinda looks vaguely like life if you squint, but clearly isn't sentient, and we're like "well let's preserve some Blight in museums, but also do a cleanup operation". That's just also a way that interaction with aliens can go; the space of possible minds (and things left in that mind's wake) is vast.
And if we do find the Blight, we shouldn’t lie to ourselves that blighted configurations of matter are just as good as any other possible configuration of matter.
It’s allowed to turn out that we find a race of ant-people (who want to kill us and lay eggs in our corpse, yadda yadda), and that the ant-people are getting ready to annihilate the small Fuzzies that haven’t yet reached technological maturity, on a planet that’s inside the ant-people’s universe-shard.
Where, obviously, you trade rather than war for the rights of the Fuzzies, since war is transparently an inefficient way to resolve conflicts.
But the one thing you don’t do is throw away some of your compassion for the Fuzzies in order to “compromise” with the ant-people’s lack-of-compassion.
The right way to do cosmopolitanism is to care about the Fuzzies’ welfare along with the ant-people’s welfare — regardless of whether the Fuzzies or ant-people reciprocate, and regardless of how they feel about each other — and to step up to protect victims from their aggressors.
There’s a point here that the cosmopolitan value is in us, even though it’s (in some sense) not just about us.
These values are not necessarily in others, no matter how much we insist that our values aren’t human-centric, aren’t speciesist, etc. And because they’re in us, we’re willing to uphold them even when we aren’t reciprocated or thanked.
It’s those values that I have in mind when I say that outcomes are “better” or “worse”. Indeed, I don’t know what other standard I could appeal to, if not values that bear some connection to the contents of our own brains.
But, again, the fact that the values are in us, doesn’t mean that they’re speciesist. A human can genuinely prefer non-speciesism, for the same reason a citizen of a nation can genuinely prefer non-nationalism. Looking at the universe through a lens that is in humans does not mean looking at the universe while caring only about humans. The point is that we'll keep on caring about others, even if we turn out to be alone in that.
2. Cosmopolitan value is fragile, for the same reason unenlightened present-day human values are fragile.
See “Complex Value Systems Are Required to Realize Valuable Futures” and the Arbital article on cosmopolitan value.
There are many ways to lose an enormous portion of the future’s cosmopolitan value, because the simple-sounding phrase “cosmopolitan value” translates into a very complex logical object (making many separate demands of the future) once we start trying to pin it down with any formal precision.
Our prior shouldn’t be that a random intelligent species would happen to have a utility function pointing at exactly the right high-complexity object. So it should be no surprise if a large portion of the future’s value is lost in switching between different alien species’ CEVs, e.g., because half of the powerful aliens are the Blight and another half are the ant-queens, and both of them are steamrolling the Fuzzies before the Fuzzies can come into their own. (That's a way the universe could be, for all that we protest that cosmopolitanism is not human-centric.)
And even if the aliens turn out to have some respect for something roughly like cosmopolitan values, that doesn't mean that they'll get as close as they could if they had human buddies (who have another five hundred million years of moral progress under our belts) in the mix.
3. There is no radically objective View-From-Nowhere utility function, no value system written in the stars.
(... And if there were, the mere fact that it exists in the heavens would not be a reason for human CEV to favor it. Unless there’s some weird component of human CEV that says something like “if you encounter a pile of sand on a planet somewhere that happens to spell out a utility function in morse code, you terminally value switching to some compromise between your current utility function and that utility function”. … Which does not seem likely.)
If our values are written anywhere, they’re written in our brain states (or in some function of our brain states).
And this holds for relatively enlightened, cosmopolitan, compassionate, just, egalitarian, etc. values in exactly the same way that it holds for flawed present-day human values.
In the long run, we should surely improve on our brains dramatically, or even replace ourselves with an entirely new sort of mind (or a wondrously strange intergalactic patchwork of different sorts of minds).
But we shouldn’t be indifferent about which sorts of minds we become or create. And the answer to “which sorts of minds/values should we bring into being?” is some (complicated, not-at-all-trivial-to-identify) function of our current brain. (What else could it be?)
Or, to put it another way: the very idea that our present-day human values are “flawed” has to mean that they’re flawed relative to some value function that’s somehow pointed at by the human brain.
There’s nothing wrong (or even particularly strange) about a situation like “Humans have deeper, stronger (‘cosmopolitan’) values that override other human values like ‘xenophobia’”.
Mostly, we’re just not used to thinking in those terms because we’re used to navigating human social environments, where an enormous number of implicit shared values and meta-values can be taken for granted to some degree. It takes some additional care and precision to bring genuinely alien values into the conversation, and to notice when we’re projecting our own values. (Onto other species, or onto the Universe.)
If a value (or meta-value or meta-meta-value or whatever) can move us to action, then it must be in some sense a human value. We can hope to encounter aliens who share our values to some degree; but this doesn’t imply that we ought (in the name of cosmopolitanism, or any other value) to be indifferent to what values any alien brethren possess. We should probably assist the Fuzzies in staving off the Blight, on cosmopolitan grounds. And given value fragility (and the size of the cosmic endowment), we should expect the cosmopolitan-utility difference between totally independent evolved value systems to be enormous.
This, again, is no reason to be any less compassionate, fair-minded, or tolerant. But also, compassion and fair-mindedness and tolerance don’t imply indifference over utility functions either!
3. The superintelligent AI we’re likely to build by default << Aliens
In the case of aliens, we might imagine encountering them hundreds of millions or billions of years in the future — plenty of time to anticipate and plan for a potential encounter.
In the case of AI, the issue is much more pressing. We have the potential to build superintelligent AI systems very soon; and I expect far worse outcomes from misaligned AI optimizing a universe-shard than from a random alien doing the same (even though there’s obviously nothing inherently worse about silicon minds than about biological minds, alien crystalline minds, etc.).
For examples of why the first AGIs are likely to immediately blow human intelligence out of the water, see AlphaGo Zero and the Foom Debate and Sources of advantage for digital intelligence. For a discussion of why alignment seems hard, and why such systems are likely to kill us if we fail to align them, see So Far and AGI Ruin.
The basic reason why I expect AI systems to produce worse outcomes than aliens is that other evolved creatures are more likely to have overlap with us, by dint of their values being forced by more similar processes. And some of the particular ways in which misaligned AI is likely to differ from an evolved species suggests a much more homogeneous and simple future. (Like “a universe tiled with molecular squiggles”.)[13]
The classic example of AGI ruin is the "paperclip maximizer" (which should probably be called a "molecular squiggle maximizer" instead):
This example is obviously comically conjunctive; the point is in no way "we have a crystal ball, and can predict that things will go down in this ridiculously-specific way". Rather, the point is to highlight ways in which the development process of misaligned superintelligent AI is very unlike the typical process by which biological organisms evolve.
Some relatively important differences between intelligences built by evolution-ish processes and ones built by stochastic-gradient-descent-ish processes:
Evolution tends to build patterns that hang around and proliferate, whereas AGIs are likely to come from an optimization target that's more directly like "be good at these games that we chose with the hope that being good at them requires intelligence", and the shatterings of the latter are less likely to overlap with our values.[14]
To be clear, “I trained my AGI in a big pen of other AGIs and rewarded it for proliferating” still results in AGIs that kill you. Most ways of trying won't replicate the relevant properties of evolution. And many aliens would murder Earth in its cradle if they could too. And even if your goal were just “get killed by an AGI that produces a future as good as the average alien’s CEV”, I would expect the "reward AGI for proliferating" approach to result in almost-zero progress toward that goal, because there’s a huge architectural gap between AI and biology, and (in expectation) another huge gap in the various ways that you built the pen wrong.[15]
You've really got to have a lot of things line up favorably in order to get niceness into your AGI system; and evolution’s much more likely to spit that out than AGI training, and so some aliens are nice (even though we didn’t build them), to a far greater degree than some AGIs are nice (if we don’t figure out alignment).
I would also predict that aliens have a much higher rate of somebody-is-home (sentience, consciousness, etc.), because of the contingencies of evolutionary history that I think resulted in human consciousness. I have wide error bars on how common these contingencies are across evolved species, but a much lower probability that the contingencies also arise when you’re trying to make the thing smart rather than good-at-proliferating.
The mechanisms behind qualia seem to me to involve at least one epistemically-derpy shortcut — the sort of thing that’s plausibly rare among aliens, and very likely rare among misaligned AI systems.
If we get lucky on consciousness being a super common hiccup, I could see more worlds where misaligned AI produces good outcomes. My current probability is something like 90% that if you produced hundreds of random uncorrelated superintelligent AI systems, <1% of them would be conscious.[16]
The most important takeaway from this post, I’d claim, is: If humanity creates superintelligences without understanding much about how our creations reason, then our creations will kill literally everyone and do something boring with the universe instead.
I'm not saying "it will take joy in things that I don't recognize; but I want the future to have my values rather than the values of my child, like many a jealous parent before me." I'm saying that, by default, you get a wasteland of molecular squiggles.
We basically have to go for superintelligence at some point, given the overwhelming amount of value that we can expect to lose if we rely on crappy human brains to optimize the future. But we also have to achieve this transition to AGI in the right way, on pain of wiping out ~everything.
Right now it looks to me like the world is rushing headlong down the "wipe out ~everything" branch, for lack of having even put a nontrivial amount of serious thought into the question of how to shape good outcomes via highly capable AI.
And so I try to redirect that path, or protest against the most misdirected attempts to address the problem.
I note that we have no plan, we have no science of differentially selecting AGI systems that produce good outcomes, and a reasonable planet would not race off a cliff before thinking about the implications.
And when I do that, I worry that it's easy to misread me as being anti-superintelligence, and anti-singularity. So I’ve written this post in part for the benefit of the rare reader who doesn’t already know this: I'm pro-singularity.
I consider myself a transhumanist. I think the highest calling of humanity today is to bring about a glorious future for a wondrously strange universe of posthuman minds.
And I'd really appreciate it if we didn't kill literally everyone and turn the universe into an empty wasteland before then.
And my concept of “what makes life worth living” is very likely an impoverished one today, and a friendly superintelligence could guide us to discovering even cooler versions of things like “art” and “adventure”, transcending the visions of fun that humanity has considered to date. The limit of how good the universe could become, once humanity has matured and grown into its full potential, likely far surpasses what any human today can concretely imagine.
I’ll flag that I do think that some people overestimate how “unimaginable” the future is likely to be, out of some sense of humility/modesty.
I think there's a decent chance that if you showed me the future I'd be like “ah, so that's what computronium looks like” or “so reversible computers wrapped around black holes did turn out to be best”, and that when you show me the experiences running on those computers, I'm like "neato, yeah, lots of minds having fun, I'm sure some of that stuff would look pretty fun to me if you decoded it". I wouldn’t expect to immediately understand everything going on, but I wouldn’t be surprised if I can piece together the broad strokes.
In that sense, I find it plausible that ~optimal futures will turn out to be familiar/recognizable/imaginable to a digital-era transhumanist in a way they wouldn't be to an ancient Roman. We really are better able to see the whole universe and its trajectory than they were.
To be clear, it's very plausible to me that it'll somehow be unrecognizable or shocking to me, as it would have been to an ancient Roman, at least on some axes. But it's not guaranteed, and we don't have to pretend that it's guaranteed in order to avoid insinuating that we're in a better epistemic position than people were in the past. We are in a better epistemic position than people were in the past!
There's a separate point about how much translation work you need to do before I recognize a particular arc of fun unfolding before me as something actually fun. On that point I’m like, "Yeah, I'm not going to recognize/understand my niece's generation's memes, never mind a posthuman’s varieties of happiness, without a lot more context (and plausibly a much bigger and deeply-changed mind)".
Separately, I don't want to make any claims about how hard and fast humanity becomes "strongly transhuman" / changes to using minds that would be unrecognizable (as humans) to the present. I'd be surprised if it were super-fast for everyone, and I'd be surprised if some humans' minds weren’t very different a thousand sidereal years post-singularity. But I have wide error bars.
Provided that this turns out to be a good use of stellar resources. (I'm not confident one way or the other. E.g., I'm not confident that human-originated minds get relevantly more interesting/fun at Matrioshka-brain scales. Maybe we’ll learn that slapping on more matter at that scale lets you prove some more theorems or whatever, but isn’t the best way to convert negentropy into fun, compared to e.g. spending that compute on whole civilizations full of interacting and flourishing people who don't have star-sized brains.)
A separate reason it’s a terrible idea to destroy ourselves is that, e.g., if the nearest aliens are 500 million years away then our death means that a ~500 million lightyear radius sphere of stellar fuel is going to be entirely wasted, instead of spent on rad stuff.
As I’ll note later, this odds ratio is a result of giving 0.2x weight to “humans control the universe-shard”, 0.5x to “aliens control it”, and 0.3x to “unfriendly AI built by aliens controls it”. Rob rounded the resulting odds ratio in this table to 1 : 5 : 7 : 5 : 14 : 1 : ~0.
Also, as a general reminder: I’m giving my relatively off-the-cuff thoughts in this post, recognizing that I’ll probably recognize some of my numbers as inconsistent — or otherwise mistaken — if I reflect more. But absent more reflection, I don’t know which direction the inconsistencies would shake out.
I’d have some inclination to go lower, but for the one evolved species we've seen seeming dead-set on destroying itself.
Though another input to the value of the future, in this scenario, is “What happens to the places that the pilgrims had to leave behind until some pilgrim group hit upon a non-terrible organizational system?” Hopefully it’s not too terrible, but it’s hard to say with humans!
One note of optimism is that there’s likely to be a strong negative correlation (in this ~impossible hypothetical) between “how terrible is the civilization?” and “how interested is it in spreading to the stars, or spreading far?” Many ways of shutting down moral progress, robust civic debate, open exploration of ideas, etc. also cripple scientific and technological progress in various ways, or involve commitment to a backwards-looking ideology. It’s possible for the universe-shard to be colonized by Space Amish, but it’s a weirder hypothetical.
Note that I’ll use phrasings like “there’s something it’s like to be them”, “they’re sentient”, and “they’re conscious” interchangeably in this post. (This is not intended to be a bold philosophical stance, but rather a flailing attempt to wave at properties of personhood that seem plausibly morally relevant.)
Eliezer uses the term “outcome pump” to introduce a similar idea:
I think his example is underspecified, though. Suppose that you ask the outcome pump for paperclips, and physics says “sorry, this outcome is too improbable” and exhibits a mechanical failure. This would then mean that it’s true that the outcome pump outputting paperclips is “improbable”, which makes the hypothetical consistent. We need some way to resolve which internally-consistent set of physical laws compatible with this description (“make paperclips” or “don’t make paperclips”) actually occurs; the so-called "outcome pump" is not necessarily pumping the desired outcome.
Giving the time machine the ability to output a random sequence of actions addresses this problem: we can say that the machine only undergoes a mechanical failure if some large number (e.g., Graham’s number) of random action sequences all fail to produce the target outcome. We can then be confident that the outcome pump will eventually brute-force a solution, provided that one is physically possible.
Other examples of easily-understood non-conscious optimization processes that can achieve very impressive things include AIXI and natural selection. The AIXI example is made pedagogically complicated for present purposes, however, by the fact that AIXI’s hypothesis space contains many smaller conscious optimizers (that don't much matter to the point, but that might confuse those who can see that some hypotheses contain conscious reasoners and can't see their irrelevance to the point at hand); and the natural selection example is weakened by the fact that selection isn't a very powerful optimizer.
A possible objection here is “Human emotional responses often cause us to get into violent conflicts in cases where this foreseeably isn’t worth it; why couldn’t aliens be the same?”. But “technology for widening the space of profitable trades” is in the end just another technology, and ambitious spacefaring species are likely to discover such tech for the same reason they’re likely to discover other tech that’s generally useful for getting more of what you want. Humans have certainly gotten better at this over time, and if we continue to advance our scientific understanding, we’re likely to get far better still.
Like, we've seen that the seeds are there, and it would be pretty weird for us to go around uprooting seeds of value on a whim.
As a side-note: one of my hot takes about how morality shakes out is "we don't sacrifice anything (among the seeds of value)". Like, values like sadism and spite might be tricky to redeem, but if we do our job right I think we should end up finding a way to redeem them.
Unless we’ve made some bargain across counterfactual worlds that justifies our offering this gift in our world. But there are friction costs to bargains, and my guess is that the way it pans out is that you keep what you can get in your branch and it evens out across branches.
As a side-note, another possible implication of my view on “alien brethren” is: in the much less likely event that we meet weak young non-spacefaring aliens, the future might go drastically better if we help guide their development as a species, teaching them about the Magic of Friendship and all that.
(Or perhaps not. I remain very uncertain about whether it’s positive-human-EV to guide alien development.)
Though some aliens may shake out to be simple too! Humans are pretty far from "tile the universe with vats of genes", but it's not clear how contingent that fact is.
Though it should be emphasized that we're totally allowed to find that evolved life tends to go some completely different way than how humans shook out. Generalizing from one example is hard!!
And even if you succeeded, it’s not clear that you’d get any utility as a result; my guess that evolved aliens tend to be better than paperclippers can just be wrong, easily.
And even if you got some utility, it’s going to be a paltry amount compared to if you’d built aligned AGI.
Possibly this is too extreme; I haven’t refined these probabilities much, and am still just giving my off-the-cuff numbers.
In any case, I want to emphasize that my view isn’t “most misaligned AGIs aren’t sentient, but if you randomly spin up a large number of them you’ll occasionally get a sentient one”. Rather, my view is “almost no random misaligned AGIs are sentient” (but with some uncertainty about whether that’s true). I’m much more uncertain about whether this background view is true than I am uncertain about whether, given this background view, a given misaligned AGI will happen to be sentient.
(Like how I think the chance that the lightspeed limit turns out to be violable is greater than 1 in a billion; but that doesn't mean that if you threw a billion baseballs, I would expect one of them to break the lightspeed limit on average.)