I think an AI takeover is reasonably likely to involve billions of deaths, but it's more like a 50% than a 99% chance. Moreover, I think this post is doing a bad job of explaining why the probability is more like 50% than 1%.
This whole thread (starting with Paul's comment) seems to me like an attempt to delve into the question of whether the AI cares about you at least a tiny bit. As explicitly noted in the OP, I don't have much interest in going deep into that discussion here.
The intent of the post is to present the very most basic arguments that if the AI is utterly indifferent to us, then it kills us. It seems to me that many people are stuck on this basic point.
Having bought this (as it seems to me like Paul has), one might then present various galaxy-brained reasons why the AI might care about us to some tiny degree despite total failure on the part of humanity to make the AI care about nice things on purpose. Example galaxy-brained reasons include "but what about weird decision theory" or "but what if aliens predictably wish to purchase our stored brainstates" or "but what about it caring a tiny degree by chance". These are precisely the sort of discussions I am not interested in getting into here, and that I attempted to ward off with the final section.
In my reply to Paul, I was (among other things) emphasizing various points of agreement. In my last bullet point in particular, I was emphasizing that, while I find these galaxy-brained retorts relatively implausible (see the list in the final section), I am not arguing for high confidence here. All of this seems to me orthogonal to the question of "if the AI is utterly indifferent, why does it kill us?".
Most people care a lot more about whether they and their loved ones (and their society/humanity) will in fact be killed than whether they will control the cosmic endowment. Eliezer has been going on podcasts saying that with near-certainty we will not see really superintelligent AGI because we will all be killed, and many people interpret your statements as saying that. And Paul's arguments do cut to the core of a lot of the appeals to humans keeping around other animals.
If it is false that we will almost certainly be killed (which I think is right, I agree with Paul's comment approximately in full), and one believes that, then saying we will almost certainly be killed would be deceptive rhetoric that could scare people who care less about the cosmic endowment into worrying more about AI risk. Since you're saying you care much more about the cosmic endowment, and in practice this talk is shaped to have the effect of persuading people to do the thing you would prefer it's quite important whether you believe the claim for good epistemic reasons. That is important to disclaiming the hypothesis that this is something being misleadingly presented or drifted into because of its rhetorical convenience without vetting it (where you would vet it if it were rhetorically inconvenient).
I think being right on this is important for the same sorts of reasons climate activists should not falsely say that failing to meet the latest emissions target on time will soon thereafter kill 100% of humans.
This thread continues to seem to me to be off-topic. My main takeaway so far is that the post was not clear enough about how it's answering the question "why does an AI that is indifferent to you, kill you?". In attempts to make this clearer, I have added the following to the beginning of the post:
This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.
I acknowledge (for the third time, with some exasperation) that this point alone is not enough to carry the argument that we'll likely all die from AI, and that a key further piece of argument is that AI is not likely to care about us at all. I have tried to make it clear (in the post, and in comments above) that this post is not arguing that point, while giving pointers that curious people can use to get a sense of why I believe this. I have no interest in continuing that discussion here.
I don't buy your argument that my communication is misleading. Hopefully that disagreement is mostly cleared up by the above.
In case not, to clarify further: My reason for not thinking in great depth about this issue is that I ...
I assign that outcome low probability (and consider that disagreement to be off-topic here).
Thank you for the clarification. In that case my objections are on the object-level.
This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.
This does exclude random small terminal valuations of things involving humans, but leaves out the instrumental value for trade and science, uncertainty about how other powerful beings might respond. I know you did an earlier post with your claims about trade for some human survival, but as Paul says above it's a huge point for such small shares of resources. Given that kind of claim much of Paul's comment still seems very on-topic (e.g. hsi bullet point .
Insofar as you're arguing that I shouldn't say "and then humanity will die" when I mean something more like "and then humanity will be confined to the solar system, and shackled forever to a low tech level", I agree, and
Yes, close to this (although more like 'gets a small resource share' than necessarily confinement to the solar system or low tech level, both of ...
RE: decision theory w.r.t how "other powerful beings" might respond - I really do think Nate has already argued this, and his arguments continue to seem more compelling to me than the the opposition's. Relevant quotes include:
It’s possible that the paperclipper that kills us will decide to scan human brains and save the scans, just in case it runs into an advanced alien civilization later that wants to trade some paperclips for the scans. And there may well be friendly aliens out there who would agree to this trade, and then give us a little pocket of their universe-shard to live in, as we might do if we build an FAI and encounter an AI that wiped out its creator-species. But that's not us trading with the AI; that's us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.
[...]
...Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than
If you condition on misaligned AI takeover, my current (extremely rough) probabilities are:
Edit: I now think mass death and extinction are notably less likely than these probabilites. Perhaps more like 40% on >50% of people killed and 20% on >99% of people killed.
By 'kill' here I'm not including things like 'the AI cryonically preserves everyone's brains and then revives people later'. I'm also not including cases where the AI lets everyone live a normal human lifespan but fails to grant immortality or continue human civilization beyond this point.
My beliefs here are due to a combination of causal/acausal trade arguments as well as some intuitions that it's likely that AIs will be slightly cooperative/nice for decision theory reasons (ECL mostly) or just moral reasons.
To be clear, it seems totally insane to depend on this or think that this makes the situation ok. Further, note that I think it's reasonably likely that there is a bloody and horrible conflict between AIs and humanity (it just seems unlikely that this conflict kills >99% of people...
None of this is particularly new; it feels to me like repeating obvious claims that have regularly been made [. . .] But I've been repeating them aloud a bunch recently
I think it's Good and Valuable to keep simplicity-iterating on fundamental points, such as this one, which nevertheless seem to be sticking points for people who are potential converts.
Asking people to Read the Sequences, with the goal of turning them into AI-doesn't-kill-us-all helpers, is not Winning given the apparent timescales.
I really hope this isn't a sticking point for people. I also strongly disagree with this being 'a fundamental point'.
Ryan is saying “AI takeover is obviously really bad and scary regardless of whether the AI is likely to literally kill everybody. I don’t see why someone’s sticking point for worrying about AI alignment would be the question of whether misaligned AIs would literally kill everyone after taking over.”
From observing recent posts and comments, I think this:
A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.
is where a lot of people get stuck.
To me, it feels very intuitive that there are levels of atom-rearranging capability that are pretty far above current-day human-level, and "atom rearranging," in the form of nanotech or biotech or advanced materials science seems plausibly like the kind of domain that AI systems could move through the human-level regime into superhuman territory pretty rapidly.
Others appear to have the opposite intuition: they find it implausible that this level of capabilities is attainable in practice, via any method. Even if such capabilities have not been conclusively ruled impossible by the laws of physics, they might be beyond the reach of even superintelligence. Personally, I am not convinced or reassured by these arguments, but I can see how others' intuitions might differ here.
When you write "the AI" throughout this essay, it seems like there is an implicit assumption that there is a singleton AI in charge of the world. Given that assumption, I agree with you. But if that assumption is wrong, then I would disagree with you. And I think the assumption is pretty unlikely.
No need to relitigate this core issue everywhere, just thought this might be useful to point out.
Here's a nice recent summary by Mitchell Porter, in a comment on Robin Hanson's recent article (can't directly link to the actual comment unfortunately):
...Robin considers many scenarios. But his bottom line is that, even as various transhuman and posthuman transformations occur, societies of intelligent beings will almost always outweigh individual intelligent beings in power; and so the best ways to reduce risks associated with new intelligences, are socially mediated methods like rule of law, the free market (in which one is free to compete, but also has incentive to cooperate), and the approval and disapproval of one's peers.
The contrasting philosophy, associated especially with Eliezer Yudkowsky, is what Robin describes with foom (rapid self-enhancement) and doom (superintelligence that cares nothing for simpler beings). In this philosophy, the advantages of AI over biological intelligence are so great, that the power differential really will favor the individual self-enhanced AI, over the whole of humanity. Therefore, the best way to reduce risks is through "alignment" of individual AIs - giving them human-friendly values by design, and also a disposition which will prefer
I too was talking about takeoff speeds. The website I linked to is takeoffspeeds.com.
Me & the other LWers you criticize do not expect indefinite exponential growth based on exploiting a single resource; we are well aware that real-world growth follows sigmoidal curves. We are well aware of those constraints and considerations and are attempting to model them with things like the model underlying takeoffspeeds.com + various other arguments, scenario exercises, etc.
I agree that much of LW has moved past the foom argument and is solidly on Eliezers side relative to Robin Hanson; Hanson's views seem increasingly silly as time goes on (though they seemed much more plausible a decade ago, before e.g. the rise of foundation models and the shortening of timelines to AGI). The debate is now more like Yud vs. Christiano/Cotra than Yud vs. Hanson. I don't think it's primarily because of selection effects, though I agree that selection effects do tilt the table towards foom here; sorry about that, & thanks for engaging. I don't think your downvotes are evidence for this though, in fact the pattern of votes (lots of upvotes, but disagreement-downvotes) is evidence for the opposite.
I just skimmed Hanson's article and find I disagree with almost every paragraph. If you think there's a good chance you'll change your mind based on what I say, I'll take your word for it & invest time in giving a point-by-point rebuttal/reaction.
One of the unstated assumptions here is that an AGI has the power to kill us. I think it's at least feasible that the first AGI that tries to eradicate humanity will lack the capacity to eradicate humanity - and any discussion about what an omnipotent AGI would or would not do should be debated in a universe where a non-omnipotent AGI has already tried and failed to eradicate humanity.
I just want to express my surprise at the fact that it seems that the view that the default outcome from unaligned AGI is extinction is not as prevalent as I thought. I was under the impression that literally everyone dying was considered by far the most likely outcome, making up probably more than 90% of the space of outcomes from unaligned AGI. From comments on this post, this seems to not be the case.
I am know distinctly confused as to what is meant by “P (doom)”. Is it the chance of unaligned AGI? Is it the chance of everyone dying? Is it the chance of just generally bad outcomes?
I think a motivation likely to form by default (in messy AI values vaguely inspired by training on human culture) is respect for boundaries of moral patients, with a wide scope of moral patienthood that covers things like humans and possibly animals. This motivation has nothing to do with caring about humans in particular. If humans weren't already present, such values wouldn't urge AIs to bring humans into existence. But they would urge to leave humans alone and avoid stepping on them, specifically because they are already present (even if humanity only g...
My main counterarguments to such "disassemble us for atoms" common arguments, is that they hinge on the idea that extremely efficient dry nanotechnology for this will ever be possible. Some problems, like laws of thermodynamics, speed of light, etc simply cannot be solved by throwing more Intelligence at it, they are likely to be "hard capped" by the basic principles of physical reality.
My completely uneducated guess is that the "supertech" that AI would supposedly use to wipe us out, fall into one of the 3 tiers:
Pipedreams (impossible, or at least unachie...
A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem.
That's doubtful. A superintelligence is a much stronger, more capable builder of the next generation of superintelligences than humanity (that's the whole idea behind foom). So what the superintelligence needs to worry about in this sense is whether the next generations of...
How do you suppose the AGI is going to be able to wrap the sun in a dyson sphere using only the resources available on earth? Do you have evidence that there are enough resources on asteroids or nearby planets for their mining to be economically viable? At the current rate, mining an asteroid costs billions while their value is nothing. Even then we don't know if they'll have enough of the exact kind of materials necessary to make a dyson sphere around an object which has 12000x the surface area of earth. You could have von nuemman replicators do the minin...
Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day.
Even if this is true, it's only because that square meter of biosphere has been accumulating solar energy over an extended period of time. Burning biofuel may help accelerate things in the short term, but it will always fall short of long-term sustainability. Of course, if humanity never makes it to the long-term, this is a moot point.
...Disassembling us for parts seems likely to be easier than buildin
Regarding the last point. Can you explain why existing language models, which seem to care more than a little about humans, aren't significant evidence against your view?
Current LLM behavior doesn't seem to me like much evidence that they care about humans per se.
I'd agree that they evidence some understanding of human values (but the argument is and has always been "the AI knows but doesn't care"; someone can probably dig up a reference to Yudkowsky arguing this as early as 2001).
I contest that the LLM's ability to predict how a caring-human sounds is much evidence that the underlying coginiton cares similarly (insofar as it cares at all).
And even if the underlying cognition did care about the sorts of things you can sometimes get an LLM to write as if it cares about, I'd still expect that to shake out into caring about a bunch of correlates of the stuff we care about, in a manner that comes apart under the extremes of optimization.
(Search terms to read more about these topics on LW, where they've been discussed in depth: "a thousand shards of desire", "value is fragile".)
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Watching how image and now text generation are sweeping society, I think it's likely that the AI we invest in will resemble humanity more than you're giving it credit for. We seem to define "intelligence" in the AI sense as "humanoid behavior" when it comes down to it, and humanoid behavior seems inexorably intertwined with caring quite a lot about other individuals and species.
Of course, this isn't necessarily a good thing -- historically, when human societies have encountered intelligences that at the time were considered "lesser" and "not really people"...
You can make AI care about us with this one weird trick:
1. Train a separate agent action reasoning network. For LLM tech this should be training on completing interaction sentences, think "Alice pushed Bob. ___ fell due to ___", with a tokenizer that generalizes agents(Alice and Bob) into generic {agent 1, agent n} and "self agent". Then we replace various Alices and Bobs in various action sentences with generic agent tokens, and train on guessing consequences or prerequisites of various actions from real situations that you can get from any text corpus.
2....
Two things.
I think being essentially homicidal and against nature is entirely a human construct. If I look at the animal kingdom, a lion does not needlessly go around killing everything it can in sight. Civilizations that were more in tune with the planet and nature than current civilizations never had the homicidal problems modern society has.
Why would AGI function any differently than any other being? Because it would not be 'a part of nature'? Why not? Almost 80% of the periodic table of elements is metal. The human body requires small amounts of several met...
But WHY would the AGI "want" anything at all unless humans gave it a goal(/s)? If it's a complex LLM-predictor what could it want besides calculate a prediction of its own predictions? Why by default it would want anything at all unless we assigned that as a goal and turned it into an agent? IF AGI got hell bent on own survival and improvement of itself to maximize goal "X" even then it might value the informational formations of our atoms more than the energy it could gain from those atoms, depending on what "X" is. Same goes for other species: evolution ...
I think it's plausible the A.I. would reshape the world but not in a way that would kill us, at least not for a long time - and not because it cares about us a little, or because of acausal incentives, or because it won't be that powerful (though @paulfchristiano's story about this is somewhat likely and adds to mine more or less disjunctively).
If this seems impossible to you, perhaps you're imagining a gray goo scenario as the central outcome. But that is a very questionable assumption, and I think it is load bearing - if the A.G.I. does something m...
Status: Partially in response to We Don't Trade With Ants, partly in response to watching others try to make versions of this point that I didn't like. None of this is particularly new; it feels to me like repeating obvious claims that have regularly been made in comments elsewhere, and are probably found in multiple parts of the LessWrong sequences. But I've been repeating them aloud a bunch recently, and so might as well collect the points into a single post.
This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.
Might the AGI let us live, not because it cares but because it has no particular reason to go out of its way to kill us?
As Eliezer Yudkowsky once said:
There's lots of energy in the biosphere! (That's why animals eat plants and animals for fuel.) By consuming it, you can do whatever else you were going to do better or faster.
(Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day. But I haven't done the calculation for years and years and am pulling that straight out of a cold cache. That energy boost could yield a speedup (in your thinking, or in your technological design, or in your intergalactic probes themselves), which translates into extra galaxies you manage to catch before they cross the cosmic event horizon!)
But there's so little energy here, compared to the rest of the universe. Why wouldn't it just leave us be, and go mine asteroids or something?
Well, for starters, there's quite a lot of energy in the sun, and if the biosphere isn't burned for fuel then it will freeze over when the AI wraps the sun in a dyson sphere or otherwise rips it apart. It doesn't need to consume your personal biomass to kill you; consuming the sun works just fine.
And separately, note that if the AI is actually completely indifferent to humanity, the question is not "is there more energy in the biosphere or in the sun?", but rather "is there more energy available in the biosphere than it takes to access that energy?". The AI doesn't have to choose between harvesting the sun and harvesting the biosphere, it can just harvest both, and there's a lot of calories in the biosphere.
I still just think that it might decide to leave us be for some reason.
That answers above are sufficient to argue that the AI kills us (if the AI's goals are orthogonal to ours, and can be better achieved with more resources). But the answer is in fact overdetermined, because there's also the following reason.
A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem. Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that's robust to whatever superintelligence humanity coughs up next. Better to nip that problem in the bud.[1]
But we don't kill all the cows.
Sure, but the horse population fell dramatically with the invention of the automobile.
One of the big reasons that humans haven't disassembled cows for spare parts is that we aren't yet skilled enough to reassemble those spare parts into something that is more useful to us than cows. We are trying to culture meat in labs, and when we do, the cow population might also fall off a cliff.
A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.[2] And humans are probably not the optimal trading partners.
But there's still a bunch of horses around! Because we like them!
Yep. The horses that are left around after they stopped being economically useful are around because some humans care about horses, and enjoy having them around.
If you can make the AI care about humans, and enjoy having them around (more than it enjoys having-around whatever plethora of puppets it could build by disassembling your body and rearranging the parts), then you're in the clear! That sort of AI won't kill you.
But getting the AI to care about you in that way is a big alignment problem. We should totally be aiming for it, but that's the sort of problem that we don't know how to solve yet, and that we don't seem on-track to solve (as far as I can tell).
Ok, maybe my objection is that I expect it to care about us at least a tiny bit, enough to leave us be.
This is a common intuition! I won't argue against it in depth here, but I'll leave a couple points in parting:
And disassembling us for spare parts sounds much easier than building pervasive monitoring that can successfully detect and shut down human attempts to build a competing superintelligence, even as the humans attempt to subvert those monitoring mechanisms. Why leave clever antagonists at your rear? ↩︎
Or a drone that doesn't even ask for payment, plus extra fuel for the space probes or whatever. Or actually before that, so that we don't create other AIs. But whatever. ↩︎