Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it's mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.
TBC, I think it's a plausible and reasonable assumption to make. But I think this assumption ends up meaning that "the plan" excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.
Here's an alternative frame: I would call the plan described in Marius's post something like the "short timelines plan assuming that governments do not get involved and assuming that technical tools (namely control/AI-automated AI R&D) are the only/main tools we can use to achieve good outcomes."
You could imagine an alternative plan described as something like the "short timelines plan assuming that technical tools in the current AGI development race/paradigm are not sufficient and governance tools (namely getting the USG to provide considerably more oversight into AGI development, curb race dynamics, make major improvements to security) are the only/main tools we can use to achieve good outcomes." This kind of plan would involve a very different focus.
Here are some examples of things that I think would be featured in a "government-focused" short timelines plan:
One possible counter is that under short timelines, the USG is super unlikely to get involved. Personally, I think we should have a lot of uncertainty RE how the USG will react. Examples of factors here: (a) new Administration, (b) uncertainty over whether AI will produce real-world incidents, (c) uncertainty over how compelling demos will be, (d) chatGPT being an illustrative example of a big increase in USG involvement that lots of folks didn't see coming, and (e) examples of the USG suddenly becoming a lot more interested in a national security domain (e.g., 9/11--> Patriot Act, recent Tik Tok ban), (f) Trump being generally harder to predict than most Presidents (e.g., more likely to form opinions for himself, less likely to trust the establishment views in some cases).
(And just to be clear, this isn't really a critique of Marius's post. I think it's great for people to be thinking about what the "plan" should be if the USG doesn't react in time. Separately, I'd be excited for people to write more about what the short timelines "plan" should look like under different assumptions about USG involvement.)
At first glance, I don’t see how the point I raised is affected by the distinction between expert-level AIs vs earlier AIs.
In both cases, you could expect an important part of the story to be “what are the comparative strengths and weaknesses of this AI system.”
For example, suppose you have an AI system that dominates human experts at every single relevant domain of cognition. It still seems like there’s a big difference between “system that is 10% better at every relevant domain of cognition” and “system that is 300% better at domain X and only 10% better at domain Y.”
To make it less abstract, one might suspect that by the time we have AI that is 10% better than humans at “conceptual/serial” stuff, the same AI system is 1000% better at “speed/parallel” stuff. And this would have pretty big implications for what kind of AI R&D ends up happening (even if we condition on only focusing on systems that dominate experts in every relevant domain.)
Models that don’t even cause safety problems, and aren't even goal-directedly misaligned, but that fail to live up to their potential, thus failing to provide us with the benefits we were hoping to get when we trained them. For example, sycophantic myopic reward hacking models that can’t be made to do useful research.
Would this kind of model present any risk? Could a lab just say "oh darn, this thing isn't very useful– let's turn this off and develop a new model"?
Do you have any suggestions RE alternative (more precise) terms? Or do you think it's more of a situation where authors should use the existing terms but make sure to define them in the context of their own work? (e.g., "In this paper, when I use the term AGI, I am referring to a system that [insert description of the capabilities of the system.])
The point I make here is also likely obvious to many, but I wonder if the "X human equivalents" frame often implicitly assumes that GPT-N will be like having X humans. But if we expect AIs to have comparative advantages (and disadvantages), then this picture might miss some important factors.
The "human equivalents" frame seems most accurate in worlds where the capability profile of an AI looks pretty similar to the capability profile of humans. That is, getting GPT-6 to do AI R&D is basically "the same as" getting X humans to do AI R&D. It thinks in fairly similar ways and has fairly similar strengths/weaknesses.
The frame is less accurate in worlds where AI is really good at some things and really bad at other things. In this case, if you try to estimate the # of human equivalents that GPT-6 gets you, the result might be misleading or incomplete. A lot of fuzzier things will affect the picture.
The example I've seen discussed most is whether or not we expect certain kinds of R&D to be bottlenecked by "running lots of experiments" or "thinking deeply and having core conceptual insights." My impression is that one reason why some MIRI folks are pessimistic is that they expect capabilities research to be more easily automatable (AIs will be relatively good at running lots of ML experiments quickly, which helps capabilities more under their model) than alignment research (AIs will be relatively bad at thinking deeply or serially about certain topics, which is what you need for meaningful alignment progress under their model).
Perhaps more people should write about what kinds of tasks they expect GPT-X to be "relatively good at" or "relatively bad at". Or perhaps that's too hard to predict in advance. If so, it could still be good to write about how different "capability profiles" could allow certain kinds of tasks to be automated more quickly than others.
(I do think that the "human equivalents" frame is easier to model and seems like an overall fine simplification for various analyses.)
I'm glad you're doing this, and I support many of the ideas already suggested. Some additional ideas:
More ambitiously, I feel like I don't really understand Anthropic's plan for how to manage race dynamics in worlds where alignment ends up being "hard enough to require a lot more than RSPs and voluntary commitments."
From a policy standpoint, several of the most interesting open questions seem to be along the lines of "under what circumstances should the USG get considerably more involved in overseeing certain kinds of AI development" and "conditional on the USG wanting to get way more involved, what are the best things for it to do?" It's plausible that Anthropic is limited in how much work it could do on these kinds of questions (particularly in a public way). Nonetheless, it could be interesting to see Anthropic engage more with questions like the ones Miles raises here.
I think this a helpful overview post. It outlines challenges to a mainstream plan (bootstrapped alignment) and offers a few case studies of how entities in other fields handle complex organizational challenges.
I'd be excited to see more follow-up research on organizational design and organizational culture. This work might be especially useful for helping folks think about various AI policy proposals.
For example, it seems plausible that at some point the US government will view certain kinds of AI systems as critical national security assets. At that point, the government might become a lot more involved in AI development. It might be the case that a new entity is created (e.g., a DOD-led "Manhattan Project") or it might be the case that the existing landscape persists but with a lot more oversight (e.g., NSA and DoD folks show up as resident inspectors at OpenAI/DeepMind/Anthropic and have a lot more decision-making power).
If something like this happens, there will be a period of time in which relevant USG stakeholders (and AGI lab leaders) have to think through questions about organizational design, organizational culture, decision-making processes, governance structures, etc. It seems likely that some versions of this look a lot better than others.
Examples of questions I'd love to see answered/discussed:
(I'm approaching this from a government-focused AI policy lens, partly because I suspect that whoever becomes in charge of the "What Should the USG Do" decision will have spent less time thinking through these considerations than Sam Altman or Demis Hassabis. Also, it seems like many insights would be hard to implement in the context of the race toward AGI. But I suspect there might be insights here that are valuable even if the government doesn't get involved & it's entirely on the labs to figure out how to design/redesign their governance structures or culture toward bootstrapped alignment.)
@Nikola Jurkovic I'd be interested in timeline estimates for something along the lines of "AI that substantially increases AI R&D". Not exactly sure what the right way to operationalize this is, but something that says "if there is a period of Rapid AI-enabled AI Progress, this is when we think it would occur."
(I don't really love the "95% of fully remote jobs could be automated frame", partially because I don't think it captures many of the specific domains we care about (e.g., AI-enabled R&D, other natsec-relevant capabilities) and partly because I suspect people have pretty different views of how easy/hard remote jobs are. Like, some people think that lots of remote jobs today are basically worthless and could already be automated, whereas others disagree. If the purpose of the forecasting question is to get a sense of how powerful AI will be, the disagreements about "how much do people actually contribute in remote jobs" seems like unnecessary noise.)
(Nitpicks aside, this is cool and I appreciate you running this poll!)
@elifland what do you think is the strongest argument for long(er) timelines? Do you think it's essentially just "it takes a long time for researchers learn how to cross the gaps"?
Or do you think there's an entirely different frame (something that's in an ontology that just looks very different from the one presented in the "benchmarks + gaps argument"?)
@ryan_greenblatt can you say more about what you expect to happen from the period in-between "AI 10Xes AI R&D" and "AI takeover is very plausible?"
I'm particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during this period. Would be curious for your takes on how much of this stays relatively private/internal (e.g., only a handful of well-connected SF people know how good the systems are) vs. obvious/public/visible (e.g., the majority of the media-consuming American public is aware of the fact that AI research has been mostly automated) or somewhere in-between (e.g., most DC tech policy staffers know this but most non-tech people are not aware.)