Special thanks to Kate Woolverton for comments and feedback.
There has been a lot of work and discussion surrounding the speed and continuity of AI takeoff scenarios, which I do think are important variables, but in my opinion ones which are relatively less important when compared to many other axes on which different takeoff scenarios could differ.
In particular, one axis on which different takeoff scenarios can differ that I am particularly interested in is their homogeneity—that is, how similar are the different AIs that get deployed in that scenario likely to be? If there is only one AI, or many copies of the same AI, then you get a very homogenous takeoff, whereas if there are many different AIs trained via very different training regimes, then you get a heterogenous takeoff. Of particular importance is likely to be how homogenous the alignment of these systems is—that is, are deployed AI systems likely to all be equivalently aligned/misaligned, or some aligned and others misaligned? It's also worth noting that a homogenous takeoff doesn't necessarily imply anything about how fast, discontinuous, or unipolar the takeoff might be—for example, you can have a slow, continuous, multipolar, homogenous takeoff if many different human organizations are all using AIs and the development of those AIs is slow and continuous but the structure and alignment of all of them are basically the same (a scenario which in fact I think is quite plausible).
In my opinion, I expect a relatively homogenous takeoff, for the following reasons:
- I expect that the amount of compute necessary to train the first advanced AI system will vastly outpace the amount of compute necessary to run it such that once you've trained an advanced AI system you will have the resources necessary to deploy many copies of that trained system and it will be much cheaper to do that than to train an entirely new system for each different application. Even in a CAIS-like scenario, I expect that most of what you'll be doing to create new services is fine-tuning existing ones rather than doing entirely new training runs.
- I expect training compute to be sufficiently high such that the cost of training a competing system to the first advanced AI system will be high enough that it will be far cheaper for most organizations to simply buy/license/use a copy of the first advanced AI from the organization that built it rather than train an entirely new one on their own.
- For those organizations that do choose to compete (because they're a state actor that's worried about the national security issues involved in using another state's AI, for example), I think it is highly likely that they will attempt to build competing systems in basically the exact same way as the first organization did, since the cost of a failed training run is likely to be very high and so the most risk-averse option is just to copy exactly what was already shown to work. Furthermore, even if an organization isn't trying to be risk averse, they're still likely to be building off of previous work in a similar way to the first organization such that the results are also likely to be fairly similar. More generally, I expect big organizations to generally take the path of least resistance, which I expect to be either buying or copying what already exists with only minimal changes.
- Once you start using your first advanced AI to help you build more advanced AI systems, if your first AI system is relatively competent at doing alignment work, then you should get a second system which has similar alignment properties to the first. Furthermore, to the extent that you're not using your first advanced AI to help you build your second, you're likely to still be using similar techniques, which will likely have similar alignment properties. This is especially true if you're using the first system as a base to build future ones (e.g. via fine-tuning). As a result, I think that homogeneity is highly likely to be preserved as AI systems are improved during the takeoff period.
- Eventually, you probably will start to get more risk-taking behavior as the barrier to entry gets low enough for building an equivalent to the first advanced AI and thus a larger set of actors become capable of doing so. By that point, however, I expect the state-of-the-art to be significantly beyond the first advanced AI such that any systems created by such smaller, lower-resourced, more risk-taking organizations won't be very capable relative to the other systems that already exist in that world—and thus likely won't pose an existential risk.
Once you accept homogenous takeoff, however, I think it has a bunch of far-reaching consequences, including:
- It's unlikely for there to exist both aligned and misaligned AI systems at the same time—either all of the different AIs will be aligned to approximately the same degree or they will all be misaligned to approximately the same degree. As a result, scenarios involving human coalitions with aligned AIs losing out to misaligned AI coalitions are relatively unlikely, which rules out some of the ways in which the strategy-stealing assumption might fail.
- Cooperation and coordination between different AIs is likely to be very easy as they are likely to be very structurally similar to each other if not share basically all of the same weights. As a result, x-risk scenarios involving AI coordination failures or s-risk scenarios involving AI bargaining failures (at least those that don't involve acausal trade) are relatively unlikely.
- It's unlikely you'll get a warning shot for deceptive alignment, since if the first advanced AI system is deceptive and that deception is missed during training, once it's deployed it's likely for all the different deceptively aligned systems to be able to relatively easily coordinate with each other to defect simultaneously and ensure that their defection is unrecoverable (e.g. Paul's “cascading failures”).
- Homogeneity makes the alignment of the first advanced AI system absolutely critical (in a similar way to fast/discontinuous takeoff without the takeoff actually needing to be fast/discontinuous), since whether the first AI is aligned or not is highly likely to determine/be highly correlated with whether all future AIs built after that point are aligned as well. Thus, homogenous takeoff scenarios demand a focus on ensuring that the first advanced AI system is actually sufficiently aligned at the point when it's first built rather than relying on feedback mechanisms after the first advanced AI's development to correct issues.
Regardless, in general, I'd very much like to see more discussion of the extent to which different people expect homogenous vs. heterogenous takeoff scenarios—similar to the existing discussion of slow vs. fast and continuous vs. discontinuous takeoffs—as it's an in my opinion very important axis on which takeoff scenarios can differ that I haven't seen much discussion of.
The different standards for what counts as a warning shot might be causing problems here -- if by warning shot you include minor ones like the boat race thing, then yeah I feel fairly confident that there'd be a discontinuity conditional on there being no warning shots. In case you are still curious, I've responded to everything you said below, using my more restrictive notion of warning shot (so, perhaps much of what I say below is obsolete).
Working backwards:
1. I mostly agree there are warning shots for deception in the case of humans. I think there are some human cases where there are no warning shots for deception. For example, suppose you are the captain of a ship and you suspect that your crew might mutiny. There probably won't be warning shots, because muntinous crewmembers will be smart enough to keep quiet about their treachery until they've built up enough strength (e.g. until morale is sufficiently low, until the captain is sufficiently disliked, until common knowledge has spread sufficiently much) to win. This is so even though there is no discontinuity in competence, or treacherousness, etc. What would you say about this case?
2. Yes, for purposes of this discussion I was assuming there are no warning shots and then arguing that there might nevertheless be no discontinuity. This is a reasonable approach, because what I was trying to do was justify my original claim, which was:
Which was my way of objecting to your claim here:
3.
I might actually agree with this, since I think discontinuities (at least in a loose, likely-to-happen sense) are reasonably likely. I also think it's plausible that in slow takeoff scenarios we'll get warning shots. (Indeed, the presence of warning shots is part of how I think we should define slow takeoff!) I chimed in just to say specifically that Evan's argument didn't depend on a discontinuity, at least as I interpreted it.
Hmmm. I thought I was giving you reasons when I said
and anyhow I'm happy to elaborate more if you like on some scenarios in which we get no warning shots despite no discontinuities.
In general though I feel like the burden of proof is on you here; if you were claiming that "If warning shots don't happen, it's definitely because of a discontinuity" then that's a strong claim that needs argument. If you are just claiming "If warning shots don't happen, it's probably because of a discontinuity" that's a weaker claim which I might actually agree with.
4. I like your arguments that AIs will be heterogenous. I think they are plausible. This is a different discussion, however, from the issue of whether homogeneity can lead to no-warning without the help of a discontinuity.
5. I do generally think slow implies continuous and I don't think that the world will be unipolar etc.