Thanks very much for writing this. We appreciate all the feedback across the board, and I think this a well done and in-depth write up.
On the specific numerical thresholds in the report (i.e., your Key Proposal section), I do need to make one correction that also applies to most of Brooks's commentary. All the numerical thresholds mentioned in the report, and particularly in that subsection, are solely examples and not actual recommendations. They are there only to show how one can calculate self-consistent licensing thresholds under the principles we recommend. They are not themselves recommendations. We had to do it this way for the same reason we propose granting fairly broad rule-setting flexibility to the regulatory entity. The field is changing so quickly that any concrete threshold risks being out of date (for one reason or the other) in very short order. We would have liked to do otherwise, but that is not a realistic expectation for a report that we expect to be digested over the course of several months.
To avoid precisely this misunderstanding, the report states in several places that those very numbers are, in fact, only examples for illustration. A few screencaps of those disclaimers are below, but there are several others. Of course we could have included even more, but beyond a certain point one is simply adding more length to what you correctly point out is already quite a sizeable document. Note that the Time article, in the excerpt you quoted, does correctly note and acknowledge that the Tier 3 AIMD threshold is there as an example (emphasis added):
the report suggests, as an example, that the agency could set it just above the levels of computing power used to train current cutting-edge models like OpenAI’s GPT-4 and Google’s Gemini.
Apart from this, I do think overall you've done a good and accurate job of summarizing the document and offering sensible and welcome views, emphasis, and pushback. It's certainly a long report, so this is a service to anyone who's looking to go one or two levels deeper than the Executive Summary. We do appreciate you giving it a look and writing it up.
Excellent. On the thresholds, got it, sad that I didn't realize this, and that others didn't either from what I saw.
I appreciate the 'long post is long' problem but I do think you need the warnings to be in all the places someone might see the 10^X numbers in isolation, if you don't want this to happen, and it probably happens anyway, on the grounds of 'yes that was technically not a proposal but of course it will be treated like one.' And there's some truth in that, and that you want to use examples that are what you would actually pick right now if you had to pick what to actually do (or propose).
I do think the numbers I suggest are about as low as one could realistically get until we get much stronger evidence of impending big problems.
One problem is that due to algorithmic improvements, any FLOP threshold we set now is going to be less effective at reducing risk to acceptable levels in the future. It seems to me very unlikely that regulatory thresholds will ever be reduced, so they need to be set substantially lower than expected given current algorithms.
This means that a suitable FLOP threshold for preventing future highly dangerous models might well need to exclude GPT-4 today. It is to be expected that people who don't understand this argument might find such a threshold ridiculously low. That's one reason why I expect that any regulations based on compute quantity will be ineffective in the medium term, but they may buy us a few years in the short term.
"One problem is that due to algorithmic improvements, any FLOP threshold we set now is going to be less effective at reducing risk to acceptable levels in the future."
And this goes doubly so if we explicitly incentivize low-FLOP models. When models are non-negligibly FLOP-limited by law, then FLOP-optimization will become a major priority for AI researchers.
This reminds me of Goodhart's Law, which states “when a measure becomes a target, it ceases to be a good measure."
I.e., if FLOPs are supposed to be a measure of an AI's danger, and we then limit/target FLOPs in order to limit AGI danger, then that targeting itself interferes or nullifies the effectiveness of FLOPs as a measure of danger.
It is (unfortunately) self-defeating. At a minimum, you need to re-evaluate regularly the connection between FLOPs and danger: it will be a moving target. Is our regulatory system up to that task?
Meta, with 350k H100s, assuming sxm models with 989 terra flops rated, training in dense fp32 (so half perf) assuming they dedicate half the cluster to a single model and train for a year, and have 2/3 utilization, would have put 10^27 flops into the model by eoy.
Assuming Nvidia is correct that the next generation is 5 times faster, a B100 cluster purchased over 2025 and 2026 at the same investment rate as today (approximately 10-20 billion annually) would hit 10^28 flops by EOY 2027.
I am also going to note that Nvidia stands to earn well over 100 billion annually with the B100 now they are bundling it with hardware. They have a lot of incentive to lobby for no compute limit, and the funds to pay the lobbyists.
Presumably AMD/Intel/other USA IC companies have the same directional incentive.
"Columbus, Ohio, into what CEO Pat Gelsinger described to reporters on Tuesday as "the largest AI chip manufacturing site in the world", starting as soon as 2027."
Compute limits likely prevent models from being useful enough to justify these investments.
Does the case for x-risk stand up to the arguments these wealthy and successful us companies are going to predictably make? Such as "how do you know n flops is enough for the model to be dangerous. Show me a model on your computer that is too dangerous to control".
ElevenLabs powered reading of this post: https://open.substack.com/pub/askwhocastsai/p/on-the-gladstone-report-by-zvi-mowshowitz
Hmm, I feel like our on-site audio recording is better than this (click on the speaker button at the top of this post), so I am a bit confused why you went through the effort of getting it AI narrated.
I have to say I'm slightly surprised, I personally think the ElevenLabs voice is more emotive, and I find it much easier listening. Is the general opinion that it is actually inferior?
The worlds were artificial superintelligence (ASI) is coming very soon with only roughly current levels of compute, and where ASI by default goes catastrophically badly, are not worlds I believe we can afford to save.
For those kinds of worlds, we probably can't afford to save them by heavy-handed measures (meaning that not only the price is too much to pay, given that it would have to be paid in all other worlds too, but also that it would not be possible to avoid evasion of those measures, if it is already relatively easy to create ASI).
But this does not mean that more light-weight measures are hopeless. For example, when we look at Ilya's thought process he has shared with us last year, what he has been trying to ponder have been various potentially feasible relatively light-weight measures to change this default of ASI going catastrophically badly to more acceptable outcomes: Ilya Sutskever's thoughts on AI safety (July 2023): a transcript with my comments.
More people should do more thinking of this kind. AI existential safety is a pre-paradigmatic field of study. A lot of potentially good approaches and possible non-standard angles have not been thought of at all. A lot of potentially good approaches and possible non-standard angles have been touched upon very lightly and need to be explored further. The broad consensus in the "AI existential safety" community seems to be that we currently don't have good approaches we can comfortably rely upon for future ASI systems. This should encourage more people to look for completely novel approaches (and in order to cover the worlds where timelines are short, some of those approaches have to be simple rather than ultra-complicated, otherwise they would not be ready by the time we need them).
In particular, it is certainly true that
A smarter thing that is more capable and competitive and efficient and persuasive being kept under indefinite control by a dumber thing that is less capable and competitive and efficient and persuasive is a rather unnatural and unlikely outcome. It should be treated as such until proven otherwise.
So it makes sense spending more time brainstorming the approaches which would not require "a smarter thing being kept under indefinite control by a dumber thing". What might be a relatively light-weight intervention which would avoid ASI going catastrophically badly without trying to rely on this unrealistic "control route"? I think this is underexplored. We tend to impose too many assumptions, e.g. the assumption that ASI has to figure out a good approximation to our Coherent Extrapolated Volition and to care about that in order for things to go well. People might want to start from scratch instead. What if we just have a goal of having good chances for a good trajectory conditional on ASI, and we don't assume that any particular feature of earlier approaches to AI existential safety is a must-have, but only this goal is a must-have. Then this gives one a wider space of possibilities to brainstorm: what other properties besides caring about a good approximation to our Coherent Extrapolated Volition might be sufficient for the trajectory of the world with ASI to likely be good from our point of view?
You can imagine making a superintelligence whose mission is to prevent superintelligences from reshaping the world, but there are pitfalls, e.g. you don't want it deeming humanity itself to be a distributed intelligence that needs to be stopped.
In the end, I think we need lightweight ways to achieve CEV (or something equivalent). The idea is there in the literature; a superintelligence can read and act upon what it reads; the challenge is to equip it with the right prior dispositions.
When I ponder all this I usually try to focus on the key difficulty of AI existential safety. ASIs and ecosystems of ASIs normally tend to self-modify and self-improve rapidly. Arbitrary values and goals are unlikely to be preserved through radical self-modifications.
So one question is: what are values and goals which might be natural, what are values and goals many members of the ASI ecosystem including entities which are much smarter and much more powerful than unaugmented humans might be naturally inclined to preserve through drastic self-modifications, and which might also have corollaries which are good from viewpoints of various human values and interests?
Basically, values and goals which are likely to be preserved through drastic self-modifications are probably non-anthropocentric, but they might have sufficiently anthropocentric corollaries, so that even the interests of unaugmented humans are protected (and also so that humans who would like to self-augment or to merge with AI systems can safely do so).
So what do we really need? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
What might the road towards adoption of those values (of preservation and protection of weaker entities and their interests) and incorporation of those kinds of goals be, so that these values and goals are preserved as the ASI ecosystem and many of its members self-modify drastically? One really needs the situation where many entities on varying levels of smartness and power care a lot about those values and goals.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be a reasonable starting point towards having values and goals which are likely to be preserved through drastic self-modifications and self-improvements and which are likely to imply good future for unaugmented humans and for the augmented humans as well.
Perhaps, we can nudge things towards making this kind of future more likely. (The alternatives are bleak, unrestricted drastic competition and, perhaps, even warfare between supersmart entities which would probably end badly not just for us, but for the ASI ecosystem as well)...
Like the the government-commissioned Gladstone Report on AI itself, there are two sections here.
First I cover the Gladstone Report’s claims and arguments about the state of play, including what they learned talking to people inside the labs. I mostly agree with their picture and conclusions, both in terms of arguments and reported findings, however I already mostly agreed. If these arguments and this information is new to someone, and the form of a government-backed report helps them process it and take it seriously, this is good work. However, in terms of convincing an already informed skeptic, I believe this is a failure. They did not present their findings in a way that should be found convincing to the otherwise unconvinced.
Second I cover the Gladstone Report’s recommended courses of action. It is commendable that the report lays out a concrete, specific and highly detailed proposal. A lot of the details, and the broader outline, seem good. The compute thresholds seem too aggressive. I would suggest working to get agreement on the structure of intervention, while also talking price on the thresholds, and hopefully getting them to a better place.
Executive Summary of Their Findings: Oh No
According to the report, things are very much Not Great, Bob.
Here is their Twitter summary thread.
It’s a running joke, and also probably true, as I keep noticing. All our talk of ‘but what about China’ has to contend with the fact that China gets almost all its AI abilities directly from the United States. Some of it is spying. Some of it is training using our models. Some of it is seeing what is possible. Some of it is flat out open source and open model weights. But it is all on us.
Needless to say, if a major lab has this kind of running joke, that is completely unacceptable, everyone involved should be at least highly ashamed of themselves. More importantly, fix it.
More detail on this issue can be found in this good Time article, Employees at Top Labs Fear Safety Is an Afterthought, Report Says.
Quotes there are not reassuring.
Catastrophic is very different from existential. If people were saying 4%-20% existential risk for 2024 alone, I would say those numbers are crazy high and make no sense. But this is a 4%-20% risk of a merely catastrophic outcome. If this is defined the way Anthropic defines it (so 1000+ deaths or 200b+ in damages) then I’d be on the low end of this for 2024, but I’m sure as hell not laying 26:1 against it.
I expect this researcher is wrong about this particular model, no matter which model it is, unless it is a fully unreleased one well above GPT-4, in which case yeah it’s pretty terrifying that they expect it to get open sourced within three years. Otherwise, of course, that particular one might not get opened up, but other similar ones well.
Sometimes I wonder if people think we have some sort of magical unbroken democracy lying around?
The low-hanging fruit least you can do is to make coordination mechanisms easier, enforceable and legal, so that if this was indeed the situation then one could engage in trade:
Gladstone Makes Its Case
Do I mostly believe the picture painted in that Twitter thread?
Yes, I do. But I already believed a similar picture before reading the report.
Does the report make the case for that picture, to someone more skeptical?
And what statements contain new information?
Section 0, the attempt to give background and make the case, begins mostly with facts everyone knows.
This is the first thing that could be in theory new info, and while I had no public confirmation of it before (whether or not I had private confirmation) it is pretty damn obvious.
At minimum, of course OpenAI has a version of GPT-4 that is less crippled by RLHF.
They remind the reader, who might not know, that the major labs have exponentially grown their funding into the billions, and are explicitly aiming to build AGI, and that one (Meta) is now saying its goal is to then open source its AGI.
In 0.2 they lay out categories of risk, and start making claims some disagree with.
0.2.1 talks weaponization, that AI could become akin to or enable WMD, such as via cyberattacks, disinformation campaigns or bioweapon designs. Here most agree they are on solid ground, whether or not one quibbles with the risk from various potential weaponization scenarios, or whether one has hope for natural ‘offense-defense balance’ or what not. Taken broadly, this is clearly a risk.
0.2.2 talks about loss of control. This too is very obviously a risk. Yet many dispute this, and refuse to take the possibility seriously.
Here Gladstone focuses on loss of control due to alignment failure.
This is all a very standard explanation. I have many times written similar paragraphs. In theory this explanation should be sufficiently convincing. Pointing out the danger should be enough to show that there is a serious problem here, whatever the difficulty level in solving it. Since the report’s release, Devin has been shown to us, providing an additional example of long term planning and also in various ways power seeking.
If anything I see this as underselling the situation, as it says ‘in the absence of countermeasures’ without noting how difficult finding and implementing those countermeasures is likely to be, and also it only considers the case where the loss of control is due to an alignment failure.
I expect that even if there is not an alignment failure, we face a great danger that humans will have local incentives and competitive dynamics that still push us towards loss of control, and towards intentionally making AIs seek power and do instrumental convergence, not as a failure or even anyone’s ill will but as the outcome we choose freely while meaning well. And of course plenty of people will seek power or money or what not and have an AI do that, or do it for the lulz, or because they think AI is more worthy than us or deserves to be set free, or what not.
A smarter thing that is more capable and competitive and efficient and persuasive being kept under indefinite control by a dumber thing that is less capable and competitive and efficient and persuasive is a rather unnatural and unlikely outcome. It should be treated as such until proven otherwise. Instead, many treat this unnatural outcome as ‘normal’ and the default.
That risk sub-category is not mentioned in the report, whereas I would say it definitely belongs.
They do talk about some other risk categories in 0.2.3, and agree they are important, but note that this report is not addressing those risks.
They emphasize here as they do elsewhere that the catastrophic risks are concentrated in only a few places, at least at present, most AI-related activity does not pose such risks and is good and should be allowed to proceed freely, and I again respond that this means they chose the wrong thresholds (price).
0.3 asks where the catastrophic risks might come from, in terms of who creates it?
They list:
Yes. Those would be the four ways things could go wrong. In what they call the ‘medium term’ of 1-5 years, I am not so worried about foreign frontier labs. Open model weights threats are possible towards the end of that timeframe, things are escalating quickly and Meta might become competent some day. But most of my worry is on what comes out of the frontier labs of OpenAI, Anthropic and Google DeepMind, or a similar lab that catches up. That includes the possibility that some entity, foreign or domestic, state or non-state, steals the weights and then acts irresponsibly.
Again I emphasize that this mostly was not an argument. If you were already looking at the existing debate and were unconvinced, nothing so far should convince you.
Section 0.4 attempts to address arguments against regulatory action.
Why Is Self-Regulation Insufficient?
First they tackle in 0.4.1 the argument that self-regulation will be sufficient. They note that yes self-regulation is better than nothing, but it will not be enough, and provide the obvious counterarguments. I would summarize them as:
I would then add a fifth, which is that many major labs such as Meta and Mistral have already announced their intention to be irresponsible. Many influential individuals with deep pockets, such as Marc Andreessen, react with defiance to even milktoast non-binding statements of the form ‘we should try to ensure that our AIs are safe.’
I consider this objection well-answered, here and elsewhere.
This is distinct from the argument that there is no underlying problem requiring a response of any kind. There are those who think there is no danger at all even if capabilities advance a lot, or that until the danger is ‘proven’ or modeled or more concrete we should act as if it is not there.
My response at this point is some combination of an exasperated sigh and a version of ‘you are not serious people.’
There is also the objection that AI is not going to be sufficiently capable any time soon to be catastrophically dangerous. That all the talk from the labs is some combination of optimism and hype, this is all going to ‘hit a wall’ soon in one of various ways. I do think such an outcome is possible.
The counterargument offered by The Gladstone Report is that this is very much not what the labs themselves and their engineers, in best position to know, expect, and that this has to carry substantial weight, enough to justify well-targeted intervention that would do little if there were no major pending capabilities gains. I buy that, although it again raises the question of where they put the thresholds.
What About Competitiveness?
In 0.4.2 they address the objection that regulation could damage American innovation and competitiveness. They somewhat bite the bullet, as one must.
The first part of their response is to note their plan is targeted, to not interfere more than necessary to ensure safety and national security, and it would have little or no impact on most American AI efforts and thus not hurt their competitiveness.
The second part of their response is to say, well, yes. If the result of being maximally competitive and innovative would be to risk existential or other catastrophe, if the way we are being that competitive and innovative is to disregard (and effectively socialize) the downside risks, you are going to hurt competitiveness and innovation by not letting the labs do that. This is not an avoidable cost, only one we can minimize, as they claim their plan attempts to do.
The third part of the response is that the cybersecurity requirements in particular favor American competitiveness on net. If foreigners can steal the weights and other lab secrets, then this helps them compete and nullifies our advantages. Not pricing in that externality does not help us compete, quite the opposite.
They do not make other counterarguments one could make here. This is an ongoing thing, but I’ll briefly reiterate here.
It is common for those making this objection to fully adapt the argumento ad absurdum, claiming that any regulation of AI of any kind would doom America to lose to China, or at least put as at grave risk of this. And they say that any move towards any regulation starts a process impossible to stop let alone undo.
When they see any regulation of any kind proposed or pushed anywhere, they warn of that area’s inevitable economic doom (and in the case of the EU, they kind of have a point when all taken together, but I have not been able to bring myself to read the full bill all the way and ow my eyes make it stop someone make it stop). I really do not think I am strawmanning.
The response of course is that America has a big lead over its competitors in numerous ways. If your main rival is China (or the EU, or even the UK), are you saying that China is not regulated? Why shouldn’t China inevitably ‘lose to America’ given all the vastly worse restrictions it places on its citizens and what they can do, that mostly have nothing directly to do with AI?
Also, I won’t go over the evidence here, but China is not exactly going full Leroy Jenkins with its AI safety protocols or public statements, nor is it showing especially impressive signs of life beyond copying or potentially stealing our stuff (including our open model weights stuff, which is free to copy).
As with everything else, the real question is price.
How much competitiveness? How much innovation? What do we get in return? How does this compare to what our rivals are doing on various fronts, and how would they react in response?
If implemented exactly as written in The Gladstone Report, with its very low thresholds, I believe that these restrictions would indeed gravely harm American competitiveness, and drive AI development overseas, although mostly to the UK and other friendly jurisdictions rather than China.
If implemented with the modified thresholds I suggest, I expect minimal impact on competitiveness and innovation, except for future potentially dangerous frontier models that might hit the higher threshold levels. In which case, yes, not disregarding the risks and costs is going to mean you take a hit to how fast you get new things. How else could this possibly work?
Do you think that it doesn’t take longer or cost more when you make sure your bridge won’t fall down? I still highly recommend making sure the bridges won’t fall down.
How Dare You Solve Any Problem That Isn’t This One?
0.4.3 responds to the standard argument from the ethics crowd, that catastrophic risk ‘could divert attention’ from other issues.
The report gives the standard argument that we should and must do both, and there is no conflict, we should deal with catastrophic risks in proportion to the threat.
Nothing proposed here makes other issues worse, or at least the other issues people tend to raise here. More than that, the interventions that are proposed here would absolutely advance the other issues that people say this would distract from, and lay groundwork for further improvements.
As always, the distraction argument proves too much. It is a general counterargument against ever doing almost anything.
It also ignores the very real dangers to these ‘other issues.’
Working for social justice is like fetching the coffee, in the sense that you cannot do it if you are dead.
Some instrumental convergence from the humans, please.
If you think the catastrophic risks are not worth a response, that there is no threat, then by all means argue that. Different argument. See 0.4.4, coming right up.
If you think the catastrophic risks are not worth a response, because they impact everyone equally and thus do not matter in the quest for social justice, or because you only care about today’s concerns and this is a problem for future Earth and you do not much care about the future, or something like that?
Then please speak directly into this microphone.
What Makes You Think We Need To Worry About This?
0.4.4 deals with the objection I’ve been differentiating above, that there is nothing to worry about, no response is needed. They cite Yann LeCun and Andrew Ng as saying existential risk from AI is very low.
The report responds that they will present their evidence later, but that numerous other mainstream researchers disagree, and these concerns are highly credible.
As presented here all we have are dueling arguments from authority, which means we have exactly nothing useful. Again, if you did not previously believe there was any catastrophic risk in the room after considering the standard arguments, you should not as of this point be changing your mind.
They also mention Richard Sutton and others who think humanity’s replacement by AI is inevitable and we should not seek to prevent it, instead we should do ‘succession planning.’ To which I traditionally say, of course: Please speak directly into this microphone.
The fact that by some accounts 10% of those in the field hold this view, that we should welcome our future AI overlords in the sense of letting humanity be wiped out, seems like a very good reason not to leave the question of preventing this up to such folks.
What Arguments are Missing?
They leave out what I consider the best arguments against regulation of AI.
The best argument is that the government is terrible at regulating things. They reliably mess it up, and they will mess it up this time as well, in various surprising and unsurprising ways. You do not get the regulations you carefully crafted, you get whatever happens after various stakeholders butcher and twist all of it. It does not get implemented the way you intended. It then gets modified in ways that make it worse, and later expanded in ways you never wanted, and recovery from that is very difficult.
That is all very true. The choice is what you can realistically get versus nothing.
So you have to make the case that what you will get, in practice, is still better than doing nothing. Often this is not the case, where no action is a second best solution but it beats trying to intervene.
A related good argument, although it is often taken way too far in this context, is the danger of regulatory capture. Yes, one could argue, the major players are the ones hurt by these laws as written. But by writing such laws at all, you eventually put them in position where they will work the system to their advantage.
I would have liked to see these arguments addressed. Often in other cases regulations are passed without anyone noticing such issues.
I do think it is clear that proposals of this form clear these hurdles, the alternative is too unacceptable and forces our hand. The proposals here, and the ones many have converged on over the past year, are chosen with these considerations squarely in mind.
Another common argument is that we do not yet know enough about what regulations would be best. Instead, we should wait until we know more, because regulations become difficult to change.
My response is three-fold.
First, in this situation we cannot wait until the problems arise, because they are existential threats that could be impossible to stop by the time we concretely see or fully understand exactly what is going wrong. There is a lot of path dependence, and there are lots of long and variable lags lie in our future responses.
Second, I think this objection was a lot stronger a year ago, when we did not have good policy responses available. At this point, we know quite a lot. We can be confident that we know what choke points are available to target with regulation. We can target the chips, data centers and hardware, and very large compute-intensive training runs. We can hope to gain control and visibility there, exactly where the potential existential and catastrophic risks lie, with minimal other impact. And the geopolitical situation is highly favorable to this strategy.
What is the alternative regime being proposed? Aside from ‘hope for the best’ or ‘wait and see’ I see no other proposals.
Third, if we do not use this regime, and we allow model proliferation, then the required regulatory response to contain even mundane risks would leave a much bigger footprint. We would be forced to intervene on the level of any device capable of doing inference, or at minimum of doing fine-tuning. You do not want this. And even if you think we should take our chances, I suggest brushing up on public choice theory. That is not what we would do in that situation, if humans are still in control. Of course, we might not be in control, but that is not a better outcome or a reason to have not acted earlier.
There are of course other objections. I have covered all the major ones that occured to me today, but that does not mean the list is complete. If I missed a major objection, and someone points that out in the first day or so, I will edit it in here.
Tonight at 11: Doom!
The next section is entitled ‘challenges.’
It begins with 0.5.1.1, offering a list of the usual estimates of doom. This is an argument from authority. I do think it is strong enough to establish that existential risk is a highly credible and common concern, but of course you knew about all this already if you are reading this, so presumably you need not update.
0.5.1.2 notes that timelines for catastrophic risk is uncertain, and that all the major players – OpenAI, DeepMind, Anthropic and Nvidia – have all publically said they expect AGI within five years.
I consider a wide variety of probability distributions on such questions reasonable. I do think you have to take seriously that we could be close.
As noted in the Tweet thread above, and remembering that catastrophic risk is distinct from existential, here is them asking about how much catastrophic risk there is from AI in 2024.
The more I think about this the more strange the question becomes as framed like this. Will not, for example, who wins various elections have ‘global and irreversible’ effects? If an AI effort swung the White House does that count? In which directions?
Again, if we are sticking to Anthropic’s definition, I am on the lower end of this range, but not outside of it. But one could see definitional disagreements going a long way here.
0.5.1.3 says the degree of risk from loss of control is uncertain.
Well, yes, very true.
They say one must rely on theoretical arguments since we lack direct evidence in either direction. They then link to arguments elsewhere on why we should consider the degree of risk to be high.
I can see why many likely find this section disappointing and unconvincing. It does not itself contain the arguments, in a place where such arguments seem natural.
I have many times made various arguments that the risk from loss of control, in various forms, is large, some of which I have reiterated in this post.
I will say, once again, that this has not provided new evidence for risk from loss of control as of this point, either.
The Claim That Frontier Labs Lack Countermeasures For Loss of Control
So here’s where they tell their story.
This is of course anecdata. As I understand it, Gladstone talked to about 200 people. These are only three of them. Ideally you would get zero people expressing such opinions, but three would not be so bad if the other 197 all felt things were fine.
From a skeptical perspective, you could very reasonably say that these should be assumed to be the most worried three people out of 200, presented with their biggest concerns. When viewed that way, this is not so scary.
The story about one of the major labs testing newly enhanced potential autonomous agents, whose abilities were not well-understood, with zero monitoring and zero controls in place, and presumably access to the internet, is not great. It certainly sets a very bad precedent and indicates terrible practices. But one can argue that in context the risk was at the time likely minimal, although it looks that way right until it isn’t.
In any case, I would like to see statistical data. What percentage of those asked thought their lab was unprepared? And so on.
I do know from other sources that many at the frontier labs indeed have these concerns. And I also know that they indeed are not taking sufficient precautions. But the report here does not make that case.
What about preventing critical IP theft?
Here is the full quote from the Twitter thread at the top:
It is an interesting fact about the world that such attempts do not seem to have yet succeeded. We have no reports of successfully stolen model weights, no cases where a model seems to have turned up elsewhere unauthorized. Is everyone being so disciplined as to keep the stolen models secret? That seems super unlikely, unless they sold it back for a ransom perhaps.
My presumption is that this is another case of no one making that serious an attempt. That could be because everyone would rather wait for later when the stakes are higher, so you don’t want to risk your inside agent. It still seems super weird.
The Future Threat
They cite two additional threat models.
In 0.5.1.6, they note opening up a model makes it far more potentially dangerous, of course in a way that cannot be undone. Knowing that a model was safe as a closed model does not provide strong evidence, on its own, that it would be safe over time as an open model. I will note that ‘a clearly stronger model is already open’ does however provide stronger evidence.
In 0.5.1.7, they note closed-access AI models are vulnerable to black-box exfiltration and other attacks, including using an existing system to train a new one on the cheap. They note we have no idea how to stop jailbreaks. All true, but I don’t think they make a strong case here.
Section 0.5.2 covers what they call political challenges.
0.5.2.1: AI advances faster than the ordinary policy process. Very much so. Any effective response needs to either be skating where the puck is going to be, or needs to be ready to respond outside of normal policy channels in order to go faster. Ideally you would have both. You would also need to be on the ball in advance.
0.5.2.2: ‘The information environment around advanced AI makes grounded conversations challenging.’ That is quite the understatement. It is ugly out there.
He mentions Effective Altruism here. So this is a good time to note that, while the ideas in this report line up well with ideas from places like LessWrong and others worried about existential risk, Gladstone was in no way funded by or linked to EA.
0.5.3.1 points out the labs have terrible incentives (the report also effectively confirms here that ‘frontier labs’ throughout presumably means the big three only). There are big winner-take-all effects, potentially the biggest ever.
0.5.3.2 notes that supply chain proliferation is forever. You cannot take back physical hardware, or the ability to manufacture it in hostile areas. You need to get out in front of this if you want to use it as leverage.
0.5.4.1 argues that our legal regime is unprepared for AI. The core claim is that an open model could be used by malicious actors to do harm, and there would be no way to hold the model creator liable. I agree this is a problem if the models become so capable that letting malicious actors have them is irresponsible, which is not yet true. They suggest liability might not alone be sufficient, and do not mention the possibility of requiring insurance a la Robin Hanson.
In general, the problem is that there are negative externalities from these products, for which the creators cannot be held liable or otherwise legally responsible. We need a plan to address that, no matter what else we do. Gladstone does not have a legal action plan though, it focuses on other aspects.
That All Sounds Bad, What Should We Do?
What does Gladstone AI propose we do about this? They propose five lines of effort.
I mean, yeah, sure, all of that seems good if implemented well, but also all of that seems to sidestep the actually hard questions. It is enough to make some people cry bloody murder, but those are the people who if they were being consistent would oppose driver’s licenses. The only concrete rule here is on chip exports, where everyone basically agrees on principle and we are at least somewhat already trying.
The actual actions are presumably in the detailed document, which you have to request, so I did so (accurately I think at this point!) calling myself media, we’ll see if they give it to me.
This Time article says yes, the actual detailed proposals are for real.
Always interesting what people consider unthinkable.
This is exactly the standard compute limit regime. If you are close to the frontier, above a certain threshold, you would need to seek approval to train a new model. That would mean adhering to various safety and security requirements for how you train, monitor and deploy the model, one of which would obviously be ‘don’t let others steal your model weights’ which would imply also not publishing them. Above a second higher threshold, you cannot do it at all.
There is, of course, skepticism, because people are bad at extrapolation.
What would he have said a year ago, or two years ago, about a reporting threshold? Probably he would have said it was very unlikely, absent some kind of exogenous shock. Except then we got (at least) one of those. In this context, we will get more.
As is sadly standard for the government, a lot of the issue getting here was finding some department to claim some responsibility.
The report focuses both on ‘weaponization risk’ which I would think is a subset of misuse, incorporating things like biological, chemical or cyber attacks, and then ‘loss of control risk,’ and it centralizes the role of ‘race dynamics.’
Meanwhile, since it is important to share contrary voices, let us see how the skeptics are reacting, yes this is the first reaction I saw.
The Key Proposal: Extreme Compute Limits
Ben Brooks on the other hand does it right, and actually reads the damn thing.
Once again, this tier system seems like the only way a sane system could operate? We should then talk price on where the thresholds should be.
It seems this report argues for the most aggressive threshold of all for Tier 2, which is the 10^23 number I’ve heard from Connor Leahy and some others, and start Tier 3 at only 10^24, and Tier 4 at 10^25. So yes, that is a super aggressive set of prices:
The chart is from 4.1.3.4. Here is another useful one to have handy:
In addition to flops, you can also qualify as Tier 3 with a 70% on the MMLU, although only Tier 2 models are required to check this.
Their view is that the leak of the weights of a Tier 3 model is ‘an irreversible event that could create significant national security risk.’ It would indeed be irreversible, but would it harm national security? What does it mean to be in the 10^24 to 10^25 range?
We don’t know the exact answer, but the Metaculus estimate says that GPT-4 was trained on 2.1×10^25 flops, based on an estimate by Epoch, which makes it a Tier 4 model along with Gemini Ultra and (just barely) Inflection-2, and presumably Claude 3. Those models would be Tier 4 under this protocol. Whereas GPT-3.5 would be, in log terms, a little under halfway through Tier 3.
If we leaked (or opened up) the weights of GPT-3.5 today, it wouldn’t be awesome, but I would not see this as that big a deal. Certainly a year from now that would be mostly irrelevant, there will be better options. So while I agree that it ‘could’ create such issues, I doubt that it would.
If we treat Tier 3 as ‘you must protect the weights of this model and also not release them’ and other similarly motivated requirements, then it seems like this should start no lower than 10^25. At that point, I think a case can be made that there would be real risk in the room. More likely I think, as a practical question, you will have to start at 10^26, the EO reporting threshold, and cover models substantially stronger than GPT-4. We have to make a choice whether we are going to accept that GPT-4-level models are going to become generally available soon enough, and I do not see a practical path to preventing this for long.
One would then likely bump Tier 2 to 10^24.
Whereas they would put 10^25 into Tier 4, which they consider too high-risk to develop now due to loss of control risk. Given that this range includes GPT-4 and Gemini, this seems like it is not a reasonable ask without much stronger evidence. You can at least make a case that GPT-5 or Clade-4 or Gemini-2 poses loss of control risk such that it would be non-crazy to say we are not ready to build it.
But the existing models we all know about? I mean, no I wouldn’t give them four 9s of safety if you opened up the weights fully for years, but we do not have the luxury of playing things that kind of safe. And we certainly neither want to grant the existing models a permanent monopoly, nor do we want to force them to be withdrawn, even if enforcement was not an issue here.
The lowest sane and realistic threshold I can see proposing for ‘no you cannot do this at all’ is 10^27. Even at 10^28, where I think it is more likely to land, this would already be quite the ask.
Implementation Details of Computer Tiers
What about other implementation details? What do they suggest people should need to do at Tier 2 or Tier 3?
This is the subject of the bulk of section 4.
They list in 4.1.3.5:
If the threshold is set properly, this could still be cleaned up a bit, but mostly seems reasonable. Objections to this type of regime more broadly are essentially arguments against any kind of prior restraint or process control at all, saying that instead we should look only at outcomes.
In section 4.1.4 they discuss publication controls on AI capabilities research, mentioning the worry that such research often happens outside major labs. They mention DPO, AutoGPT and FlashAttention. I get why they want this, but pushing this hard this far does not seem great.
In 4.2.1 they propose imposing civil liability on creators of AI systems on a duty of care basis, including preventing inadvertent harm, preventing infiltration of third-party systems, safeguarding algorithmic configurations of advanced AI including model weights, and maintaining security measures against unauthorized use.
This is a combination of strict liability for harms and a ban on open model weights even if you thought that liability wasn’t an issue. They propose that you could choose to enter Tier 3 on purpose to protect against strict liability, which could be an alternative to imposing super low compute thresholds – you can decide if you think your LLM is dangerous, and pay the costs if you are wrong.
Open model weights presumably are not okay in Tier 3, and are okay in Tier 1. The question is where in Tier 2, from its start to its finish, to draw the line.
4.2.2 is where they discuss criminal liability. They point out that the stakes are sufficiently high that you need criminal penalties for sufficiently brazen or damaging violations.
The first two here seem like very reasonable things to be felonies. If the stakes are existential risk, then this is wilful ignoring of a direct order to stop. Breaching ‘the conditions’ sounds potentially vague, I would want to pin that down more carefully.
The asks here are clearly on the aggressive end and would need to be cut back a bit and more importantly carefully codified to avoid being too broad, but yes these are the stakes.
4.2.3 proposes ‘emergency powers’ which should always make one nervous. In this case, the emergency power looks like it would be confined to license suspension, demanding certain abilities be halted and precautions be taken, up to a general moratorium, with ability to sue for economic damages if this is not justified later.
I hate the framing on this, but the substance seems like something we want in our toolkit – the ability to act now if AI threatens to get out of control, and force a company to stand down until we can sort it out. Perhaps we can find a slightly different way to enable this specific power.
The Quest for Sane International Regulation
Section 5 is about the quest for international cooperation, with the report’s ideal end state being a treaty that enforces the restrictions from section 4, reporting requirements on cloud providers and also hardware-based tracking.
That seems like the correct goal, conditional on having chosen the correct thresholds. If the thresholds are right, they are right for everyone. One reason not to aim too low with the numbers is that this makes international cooperation that much harder.
Deeper monitoring than this would have benefits, but they rightfully step back from it, because this would too heavily motivate development of alternative supply chains and development programs.
How do we get there? Well, saying what one can about that is the kind of thing that makes the report 284 pages long (well, 161 if you don’t count acknowledgements and references). 5.2.1 says to coordinate domestic and international capacity-building, starting with policymaker education outreach in 5.2.1.1.
They do have a good note there:
The first point here has been said before, and is especially important. There is a class of people, who are almost all of NatSec, who simply cannot imagine anything except a specific, concrete and tangible scenario. In addition, many can only imagine scenarios involving a foreign human enemy, which can be a rival nation or a non-state actor.
That means one has to emphasize such threats in such conversations, even though the bulk of the risk instead lies elsewhere, in future harder-to-specify threats that largely originate from the dynamics surrounding the AIs themselves, and the difficulties in controlling alignment, direction, access and usage over time as capabilities advance.
One must keep in mind that even if one restricts only to the scenarios this permits you to raise, that is still sufficient to justify many of the correct interventions. It is imperfect, but the tangible threats are also very real and require real responses, although the resulting appropriate prices and details will of course be different.
Specifically, they suggest our Missions engage heads of state, that we use existing AI policy-focused forums and use foreign assistance to build partner capacity. I have no objection to any of that, sure why not, but it does not seem that effective?
5.2.1.2 talks about technical education and outreach, suggesting openness and international cooperation, with specifics such as ‘the U.S. ambassador to the UN should initiate a process to establish an intergovernmental commission on frontier AI.’ There is already a group meeting within the UN, so it is not clear what needs this addresses? Not that they or I have other better concrete path ideas to list.
5.2.1.3 says also use other forms of capacity building, I mean, again, sure.
5.2.2 says ‘articulate and reinforce an official US government position on catastrophic AI risk.’ Presumably we should be against it (and yes, there are people who are for it). And we should be in favor of mitigating and minimizing such risks. So, again, sure, yes it would be helpful to have Congress pass a statement with no legal standing to clarify our position.
5.3 says enshrine AI safeguards in international law. Yes, that would be great, but how? They note the standard security dilemma.
So, more summits, then?
5.4 says establish an international regulatory agency, which they call the International AI Agency (IAIA), and optimistically cite past non-proliferation efforts. 5.4.1 outlines standard protocols if one was serious about enforcing such rules, 5.4.2 notes there need to be standards, 5.4.3 that researchers will need to convene. Yes, yes.
5.5 discusses joint efforts to control the AI supply chain. This is a key part of any serious long term effort.
So essentially AI for me (and my friends) but not for thee.
The obvious response is, if this is the plan, why would China sign off on it? You need to either have China (and the same applies to others including Russia) inside or outside your international AI alliance. I do not see any talk of how to address China’s needs, or how this regime plans to handle it if China does not sign on to the deal. You cannot dodge both questions for long, and sufficiently harsh compute limits rule out ‘dominate China enough to not care what they think’ as an option, unless you think you really can fully cut off the chip supply chain somehow and China can’t make its own worth a damn? Seems hard.
5.5.1 points out that alternative paths, like attempting controls on the distribution or use of open data or open models, flat out do not work. The only way to contain open things is to not create or open them up in the first place. That is the whole point of being open, to not have anyone be able to stop users from doing anything they want.
5.5.2 outlines who has key parts of the AI supply chain. Most key countries are our allies including Israel, although the UAE is also listed.
5.6 is entitled ‘open challenges’ and seems curiously short. It notes that the incentives are terrible, reliable verification is an open problem and algorithmic improvements threaten to render the plan moot eventually even if it works. Well, yes. Those are some, but far from all, of the open challenges.
That does not mean give up. Giving up definitely doesn’t work.
The Other Proposals
Section 1 proposes interim safeguards.
1.2 suggests an ‘AI observatory’ for advanced AI, which they call AIO. Certainly there should be someone and some government group that is in charge of understanding what is happening in AI. It would (1.2.1) monitor frontier AI development, study emergency preparedness and coordinate to share information.
This should not be objectionable.
The only argument against it that I can see is that you want the government to be blindsided and unable to intervene, because you are opposed on principle (or for other reasons) to anything that USG might do in AI, period. And you are willing to risk them flailing around blind when the first crisis happens.
1.3 suggests establishing responsible AI development and adoption safeguards for private industry. They mention ‘responsible scaling policies’ (RSPs) such as that of Anthropic, and would expand this across the supply chain. As with all such things, price and details are key.
Certainly it would be good to get other labs to the level of Anthropic’s RSP and OpenAI’s preparedness framework. This seems realistic to get via voluntary cooperation, but also you might need to enforce it. Meta does exist. I presume the report authors would want to be much stricter than that, based on their other recommendations.
1.3.1 talks implementation details, mentioning four laws that might allow executive authority, including everyone’s favorite the Defense Production Act, but noting that they might not apply.
1.3.2 makes clear they want this to mostly be the same system they envision in section 4, which is why I wrote up section 4 first. They make clear the need to balance safety versus innovation and value creation. I think they get the price wrong and ask too much, but the key is to realize that one must talk price at all.
1.4 says establish a taskforce to do all this, facilitating responsible development and watching out for weaponization and loss of control as threats, and lay the foundation for transition to the regime in section 4, which seems remarkably similar to what they want out of the gate. Again, yes, sure.
There is also their ‘plan B’ if there is no short term path to getting the RADA safeguards similar to section 4, and one must rely on interim, voluntary safeguards:
They essentially say, if you can’t enforce it, call on labs to do it anyway. To which I essentially say: Good luck with that. You are not going to get a voluntary pause at 10^26 from the major players without much stronger evidence that one is needed, and you are not going to get cooperation from cloud service providers this way either. The case has not yet been sufficiently strongly made.
Section 2 proposes ‘strengthening capability and capacity for advanced AI preparedness and response.’ What does that mean?
That all seems unobjectionable in theory. It is all about the government getting its people prepared and understanding what is going on, so they can then make better decisions.
Again, the objection would be if you think the government will use better information to make worse decisions, because government intervention is all but assured to be negative, and more info means more action or more impactful action. Otherwise, it’s about whether the details are right.
I mostly would prefer to leave such questions to those who know more.
Their list of stakeholders seems plausible on first reading, but also seems like they included everyone without trying hard to differentiate who matters more versus less.
Their suggested topics seem fine as basic things more people should know. I would look closer if this was the draft of a bill or executive order, again on first glance it seems fine but could use more prioritization. This is more facts than people are able to absorb, you are not going to get everything here, you must choose.
Finally (in terms of the ordering here) there is section 3 on investment in technical AI safety research and standards development.
They note that some research can be done on public open models, and should be shared with everyone. Other research requires access to proprietary frontier models, and the government could help facilitate and fund this access, including developing dedicated secure infrastructure. They don’t talk about sharing the results, but presumably this should be shared mostly but not always. And finally there will be national security research, which will require both unrestricted frontier model access and need to be done in a classified context, and much of the results will not be things that can be published.
Again, all of that seems right, and it would be great to have funding to support this kind of work. That does not mean the government is up to the task at all these levels, given the restrictions it operates under. It is not implausible that government funding would be so delayed, bureaucratized and ill-targeted that it would end up doing little, no or even negative good here. I do think we should make the attempt, but have our eyes open.
After some standard stuff, 3.2 discusses proposed standards for AI evaluations and RADA safeguards, mostly to warn in 3.2.1.1 that current evaluations cannot provide comprehensive coverage or predict emergence of additional capabilities. You can never prove a capability is absent (or at least, you can’t if it was at all plausible it could ever have been present, obviously we know a lot of things e.g. GPT-2 cannot ever do).
As they warn in 3.2.1.3, evaluations can easily be undermined and manipulated if those doing them wish to do so. Where the spirit of the rules is not being enforced, you are toast. There are no known ways to prevent this. The best we have is that the evaluation team can be distinct from the model development team, with the model development team unaware of (changing and uncertain) evaluation details. So we need that kind of regime, and for AI labs to not effectively have control over who does their evaluations. At minimum, we want some form of ‘testing for gaming the test.’
Also key is 3.2.1.4’s warning that AI evaluations could fail systematically in high-capability regimes due to situational awareness, manipulation of evaluations by the AI and deceptive alignment. We saw clear signs of this in the Sleep Agent paper from Anthropic, Claude Opus has shown signs of it as well, and this needs to be understood as less an unexpected failure mode and more the tiger going tiger. If the AI is smarter than you, you are going to have a hard time having it not notice it is in a test. And in addition to the instrumental convergence and other obvious reasons to then game the test, you know what the training data is full of? Humans being situationally aware that they are being tested, and doing what it takes to pass. Everyone does it.
Conclusions, Both Theirs and Mine
There is lots more. One can only read in detail so many 284-page (or even de facto 161-page if you skip the appendixes and references and acknowledgements) documents.
Their conclusion makes it clear how they are thinking about this:
The Gladstone Report can, again, be thought of as two distinct reports. The first report outlines the threat. The second report proposes what to do about it.
The first report rests on their interviews inside the labs, and on the high degree of uncertainty throughout, and citing many properties of current AI systems that make the situation dangerous.
As they summarize:
I am alarmed, although not as alarmed as they are, and do not support the full price they are willing to pay with their interventions, but I believe the outline here is correct. Artificial general intelligence (AGI) could arrive soon, and it could be quickly followed by artificial super intelligence (ASI), it will pose an existential threat when it arrives and we are very much not prepared. Security measures are woefully inadequate even for worries about national security, stolen weights and mundane harms.
Does The Gladstone Report make a strong case that this is true, for those who did not already believe it? For those taking a skeptical eye, and who are inclined to offer the opposite of the benefit of the doubt?
That is a different question.
Alas, I do not believe the report lays out its interview findings in convincing fashion. If you did not believe it before, and had considered the public evidence and arguments, this need not move your opinion further. That the scary quotes would be available should be priced in. We need something more systematic, more difficult to fake or slant, in order to convince people not already convinced, or to change my confidence levels from where they previously were.
Its logical arguments about the situation are good as far they go, highly solid and better than most. They are also reiterations of the existing debate. So if this is new to you, you now have a government report that perhaps you feel social permission to take seriously and consider. If you had already done the considering, this won’t add much.
What Can We Do About All This?
One then must choose directionally correct interventions, and figure out a reasonable price to balance concerns and allow the plan to actually happen.
From what I have seen, the proposed interventions are directionally wise, but extreme in their prices. If we could get international cooperation I would take what is proposed over doing nothing, but going this far seems both extremely impractical any time soon and also not fully necessary given its costs.
Indeed, even if one agrees that such existentially dangerous systems could be developed in the next five years (which seems unlikely, but not something we can rule out), the suggested regime involves going actively backwards from where we are now. I do not see any path to doing that, nor do I think it is necessary or wise to do so even if there was a political path. The worlds were artificial superintelligence (ASI) is coming very soon with only roughly current levels of compute, and where ASI by default goes catastrophically badly, are not worlds I believe we can afford to save.
Ideally one could talk price, balance risks and costs against the benefits, and seek the best possible solution.