When people think about street-fights and what they should do when they find themselves in the unfortunate position of being in one, they tend to stumble across a pretty concerning thought relatively early on: "What if my attacker has a knife?" . Then they will put loads of cognitive effort into strategies for how to deal with attackers wielding blades. On first glance this makes sense. Knives aren't that uncommon and they are very scary, so it feels pretty dignified to have prepared for such scenarios (I apologize if this anecdote is horribly unrelatable to Statesians). The issue is that –all in all– knife related injuries from brawls or random attacks aren't that common in most settings. Weapons of opportunity (a rock, a brick, a bottle, some piece of metal, anything you can pick up in the moment) are much more common. They are less scary, but everyone has access to them and I've met few people without experience who come up with plans for defending against those before they start thinking about knives. It's not the really scary thing that kills you. It's the minimum viable thing.

When deliberating poisons, people tend to think of the flashy, potent ones. Cyanide, Strychnine, Tetrodotoxin. Anything sufficiently scary with LDs in the low milligrams. The ones that are difficult to defend against and known first and foremost for their toxicity. On first pass this seems reasonable, but the fact that they are scary and hard to defend against means that it is very rare to encounter them. It is staggeringly more likely that you will suffer poisoning from Acetaminophen or the likes. OTC medications, cleaning products, batteries, pesticides, supplements. Poisons which are weak enough to be common. It's not the really scary thing that kills you. It's the minimum viable thing.

My impression is that people in AI safety circles follow a similar pattern of directing most of their attention at the very competent, very scary parts of risk-space, rather than the large parts. Unless I am missing something, it feels pretty clear that the majority of doom-worlds are ones in which we die stupidly. Not by the deft hands of some superintelligent optimizer tiling the universe with its will, but the clumsy ones of a process that is powerful enough to kill a significant chunk of humanity but not smart enough to do anything impressive after that point. Not a schemer but an unstable idiot placed a little too close to a very spooky button by other unstable idiots.

Killing enough of humanity that the rest will die soon after isn't that hard. We are very very fragile. Of course the sorts of scenarios which kill everyone immediately are less likely in worlds where there isn't competent, directed effort, but the post-apocalypse is a dangerous place and the odds that the people equipped to rebuild civilisation will be among the survivors, find themselves around the means to do so, make a few more lucky rolls on location and keep that spark going down a number of generations are low. Nowhere near zero but low. In bits of branch-space in which it is technically possible to bounce back given some factors, lots of timelines get shredded. You don't need a lot of general intelligence to design a bio-weapon or cause the leak of one. With militaries increasingly happy to hand weapons to black-boxes, you don't need to be very clever to start a nuclear incident. The meme which makes humanity destroy itself too might be relatively simple. In most worlds, before you get competent maximizers with the kind of goal content integrity, embedded agency and all the rest to kill humanity deliberately, keep the lights on afterwards and have a plan for what to do next, you get a truly baffling number of flailing idiots next to powerful buttons, or things with some but not all of the relevant capabilities in place – competent within the current paradigm but with a world-model that breaks down in the anomalous environments it creates. Consider the humble rock.

Another way of motivating this intuition is great-filter flavoured. Not only do we not see particularly many alien civs whizzing around, we also don't see particularly many of the star-eating Super-Ints that might have killed them. AI as a great filter makes more sense if most of the failure modes are stupid – if the demon kills itself along with those who summoned it.

This is merely an argument for a recalibration of beliefs, not necessarily an argument that you should change something about your policies. In fact there are some highly compelling arguments for why the assumption that we're likely to die stupidly shouldn't actually matter for the way you proceed in some relevant ways.

One of them is that the calculus doesn't work. That 1/100 odds of an unaligned maximizer are significantly worse than 1/10 odds of a stupid apocalypse because the stupid apocalypse only kills humanity. The competent maximizer kills the universe. This is an entirely fair point, but I'd like you to make sure that this is actually the calculus you're running rather than a mere rationalization of pre-existing beliefs.

The second is that the calculus is irrelevant because most people in AI-safety positions have much more sway on levers that lead to competent maximizers than they do on levers which lead to idiots trusting idiots with doomsday-tech. There is a Garrabrantian notion that most of your caring should be tangled up with outcomes that are significantly causally downstream from you, so while one of those risks is greater, you have a comparative advantage on minimizing the smaller one, which outweighs the difference. This too might very well be true and I'd merely ask you to check if it's the real source of your beliefs or whether you are unduly worried about the scarier thing because it is scary. Due to a narrativistic thinking where the story doesn't end in bathos. Where the threat is powerful. Where you don't just get hit over the head with a rock.

It might in this specific case be dignified to put all your effort into preparing for knife fights, but I think your calibration is off if you think that those aren't a small subset of worlds in which we die. It's not the really scary thing that kills you. It's the minimum viable thing.

New Comment
11 comments, sorted by Click to highlight new comments since:

I think the chances that something that doesn't immediately kill humanity, and isn't actively trying to kill humanity, polishes us off for good is pretty low at the very least.

Humans have survived as hunter gatherers for a million years. We've thrived in every possible climate under the sun. We're not just going to roll over and die because civilisation has collapsed.

Not that this is much of a comfort if 99% of humanity dies.

You can totally have something which is trying to kill humanity in this framework though. Imagine something in the style of chaos-GPT, locally agentic & competent enough to use state-of-the-art AI biotech tools to synthesize dangerous viruses or compounds to release into the atmosphere. (note that In this example the critical part is the narrow-AI biotech tools, not the chaos-agent)

You don't need solutions to embedded agency, goal-content integrity & the like to build this. It is easier to build and is earlier in the tech-tree than crisp maximizers. It will not be stable enough to coherently take over the lightcone. Just coherent enough to fold some proteins and print them.

But why would anyone do such a stupid thing?

I think that for a long time the alignment community was betting on recursive self-improvement happening pretty early in which there wouldn't a gap between AI being able to develop bioweapons and developing AGI.

Over time, people have updated towards thinking that the path towards AGI will be slower and more gradual, and I think people were either hoping that we could either tank whatever harms happen in this period or that governments might deal with it.

Now that some of these harms are basically imminent, I think the community has updated towards being more worried about some of the less weird harms like misinformation or bioweapons, and I agree with this, but I don't want to take this too far either, as I mostly just see this work as buying time for us to solve alignment.

I think part of the "calculus" being run by the AI safety folks is as follows:

  1. there are certainly both some dumb ways humanity could die (ie, AI-enabled bioweapon terrorism that could have easily been prevented by some RLHF + basic checks at protein synthesis companies), as well as some very tricky, advanced ways (AI takeover by a superintelligence with a very subtle form of misalignment, using lots of brilliant deception, etc)

  2. It seems like the dumber ways are generally more obvious / visible to other people (like military generals or the median voter), wheras these people are skeptical of the trickier paths (ie, not taking the prospect of agentic, superintelligent AI seriously; figuring alignment will probably continue to be easy even as AI gets smarter, not believing that you could ever use AI to do useful AI research, etc).

  3. The trickier paths also seem like we might need to get a longer head start on them, think about them more carefully, etc.

  4. Therefore, I (one of the rare believers in things like "deceptive misalignment is likely" or "superintelligence is possible") should work on the trickier paths; others (like the US military, or other government agencies, or whatever) will eventually recognize and patch the dumber paths.

re: your comments on Fermi paradox -- if an alien super-civilization (or alien-killing AI) is expanding in all directions at close to the speed of light (which you might expect a superintelligence to do), then you mostly don't see them coming until it's nearly too late, since the civilization is expanding almost as fast as the light emitted by the civilization. So it might look like the universe is empty, even if there's actually a couple of civilizations racing right towards you!

There is some interesting cosmological evidence that we are in fact living in a universe that will eventually be full of such civilizations; see the Robin Hanson idea of "Grabby Aliens": https://www.youtube.com/watch?v=l3whaviTqqg

"Close to the speed of light" has to be quite close to the speed of light for that argument to hold (at 0.8c about half of the volume in the light cone of an expanding civilization is outside of that civilization's expansion front).

It doesn't really need to be that fast, provided that the expansion front is deep. Seed probes that act as a nucleus for construction could be almost impossible to see, and the parent civilization might be very distant.

Even if the parent civilization did megaengineering of a galaxy (e.g. enclosing all the stars in Dyson swarms or outright disassembling them), we'd probably see that as a natural phenomenon. We can't tell what would otherwise have been there instead, and such large-scale changes probably do still take a long time to carry out even with advanced technology.

There are in fact a great many observations in astronomy where we don't really know what's happening. Obviously nobody is claiming "aliens did it", especially after the pulsar debacle last century. There are moderately plausible natural hypotheses. But if aliens were doing it, we probably couldn't conclusively say so.

Yes, it does have to be fast IMO, but I think fast expansion (at least among civilizations that decide to expand much at all) is very likely.

Of course the first few starships that a civilization sends to colonize the nearest stars will probably not be going anywhere near the speed of light.  (Unless it really is a paperclips-style superintelligence, perhaps.)  But within a million years or so, even with relatively slow-moving ships, you have colonized thousands of solar systems, built dyson swarms around every star, have a total population in the bajilions, and have probably developed about all the technology that it is physically possible to develop.  So, at some point it's plausible that you start going very close to the speed of light, because you'll certainly have enough energy + technology to do so, and because it might be desirable for a variety of reasons:

- Maybe we are trying to maximize some maximizable utility function, be that paperclips or some more human notion, and want to minimize what Nick Bostrom calls "astronomical waste".
- Maybe we fail to coordinate (via a strong central government or etc), and the race to colonize the galaxy becomes a free-for-all, rewarding the fastest and most rapacious settlers, a la Robin Hanson's "Burning the cosmic commons".

Per your own comment -- if you only colonize at 0.8c so your ships can conserve energy, you are probably actually missing out on lots and lots of energy, since you will only be able to harvest resources from about half the volume that you could grab if you traveled at closer to lightspeed!

[-]jmh62

Maybe, but I have to also put this all in a somewhat different frame. Is the universe populated by birds or mice? Are the resources nice ground full or worms or perhaps dangerous traps with the cheese we want? 

So if we're birds and the universe resources are worms, maybe a race. If we're all mice and resources are those dangerous traps with cheese, well, the old saying "The early bird might get the worm but the second mouse gets the cheese." In a universe populated by mice & cheese, civilation expansion may well be much slower and measured.

Perhpas we can add one of the thoughts from the Three Body Problem series -- advertising your civilation in the universe might be a sure way to kill yourself. Possibly fits with the Grabby Aliens thought but would argue for a different type of expansion patter I would think.

That, and I'm not sure how the apparent solution ot energy problems (apparenly a civilization has no engery problem so accelleration and decellerations costs don't really matter) impacts a desire for additional resources. And if the energy problem is not solved then we need to know the cost curves for accelleration and decelleration to optimize speed in that resource search/grab.

What kinds of space resources are like "mice & cheese"?  I am picturing civilizations expanding to new star systems mostly for the matter and energy (turn asteroids & planets into a dyson swarm of orbiting solar panels and supercomputers on which to run trillions of emulated minds, plus constructing new probes to send onwards to new star systems).

re: the Three Body Problem books -- I think the book series imagines that alien life is much, much more common (ie, many civilizations per galaxy) than Robin Hanson imagines in his Grabby Aliens hypothesis, such that there are often new, not-yet-technologically-mature civilizations popping up nearby each other, around the same time as each other.  Versus an important part of the Grabby Aliens model is the idea that the evolution of complex life is actually spectacularly rare (which makes humans seem to have evolved extremely early relative to when you might expect, which is odd, but which is then explained by some anthropic reasoning related to the expanding grabby civilizations -- all new civilizations arise "early", because by the mid-game, everything has been colonized already).  If you think that the evolution of complex life on other planets is actually a very common occurrence, then there is no particular reason to put much weight on the Grabby Aliens hypothesis.

In The Three Body Problem, Earth would be wise to keep quiet so that the Trisolarians don't overheard our radio transmissions and try to come and take our nice temperate planet, with its nice regular pattern of seasons.  But there is nothing Earth could do about an oncoming "grabby" civilization -- the grabby civilization is already speeding towards Earth at near-lightspeed, and wants to colonize every solar system (inhabited and uninhabited, temperate planets with regular seasons or no, etc), since it doesn't care about temperate continents, just raw matter that it can use to create dyson swarms.  The grabby civilizations are already expanding as fast as possible in every direciton, coming for every star -- so there is no point trying to "hide" from them.

Energy balance situation:
- the sun continually emits around 10^26 watts of light/heat/radiation/etc.
- per some relativity math at this forum comment, it takes around 10^18 joules to accelerate 1kg to 0.99c
- so, using just one second of the sun's energy emissions, you could afford to accelerate around 10^8 kg (about the mass of very large cargo ships, and of the RMS Titanic) to 0.99c.  Or if you spend 100 days' worth of solar energy instead of one second, you could accelerate about 10^15 kg, the mass of Mt. Everest, to 0.99c.
- of course then you have to slow down on the other end, which will take a lot of energy, so the final size of the von neumann probe that you can deliver to the target solar system will have to be much smaller than the Titanic or Mt Everest or whatever.
- if you go slower, at 0.8c, you can launch 10x as much mass with the same energy (and you don't have to slow down as much on the other end, so maybe your final probe is 100x bigger), but of course you arrive more slowly -- if you're travelling 10 light years, you show up 1.9 years later than the 0.99c probe.  If you're travelling 100 light years, you show up 19 years later.
- which can colonize the solar system and build a dyson swarm faster -- a tiny probe that arrives as soon as possible, or a 100x larger probe that arrives with a couple years' delay?  this is an open question that depends on how fast your von neuman machine can construct solar panels, automated factories, etc.  Carl Shulman in a recent 80K podcast figures that a fully-automated economy pushing up against physical limits, could double itself at least as quickly as once per year.  So mabye the 0.99c probe would do better over the 100 light-year distance (arriving 19 years early gives time for 19 doublings!), but not for the 10 light-year distance (the 0.99c probe would only have doubled itself twice, to 4x its initial mass, by the time the 0.8c probe shows up with 100x as much mass)
- IMO, if you are trying to rapaciously grab the universe as fast as possible (for the ultimate purpose of maximizing paperclips or whatever), probably you don't hop from nearby star to nearby star at efficient speeds like 0.8c, waiting to set up a whole new dyson sphere (which probably takes many years) at each stop.  Rather, your already-completed dyson swarms are kept busy launching new probes all the time, targeting ever-more-distant stars.  By the time a new dyson swarm gets finished, all the nearby stars have also been visited by probes, and are already constructing dyson swarms of their own.  So you have to fire your probes not at the nearest stars, but at stars some distance further away.  My intuition is that the optimal way to grab the most energy would end up favoring very fast expansion speeds, but I'm not sure.  (Maybe the edge of your cosmic empire expands at 0.99c, and then you "mop up" some interior stars at more efficient speeds?  But every second that you delay in capturing a star, that's a whopping 10^26 joules of energy lost!)

After even the first million years as slow as 0.1c, the galaxy is full and it's time to go intergalactic. A million years is nothing in the scale of the universe's age.

When sending a probe millions of light years to other galaxies, the expense of 0.999c probes start to look more useful than 0.8c ones, saving hundreds of thousands of years. Chances are that it wouldn't just be one probe either, but billions of them seeding each galaxy within plausible reach.

Though as with any discussion about these sorts of things, we have no idea what we don't know about what a civilization a million years old might achieve. Discussions of relativistic probes are probably even more laughably primitive than those of using swan's wings to fly to the abode of the Gods.