Bogdan Ionut Cirstea: I wish he had framed the governance side closer to something like [Drexler’s talk here about how to divide a large surplus] (or at least focused more on that aspect, especially in the longer run), instead of the strong competitive angle.
To clarify / add a bit more nuance, I meant that I wish Leopold had focused more of his discourse on a framing like that in Drexler's talk: 'The pie would be very large indeed, it matters more that we get to it safely and that the allocation is decent enough that we don't blow it all up; rather than precisely how much each actor gets (this is justified in the talk with a toyish model of each actor having a logarithmic utility in the amount of resources, making it irrational to 'want it all' at the risk of also losing it all).'
Hmm I didn't spend 47 hours watching the interview, reading all three long posts, and clicking the central links but
Is this not just a dude tryna make a buck here?
I mean if you start a fund to make money on orange dogs (at the same exact time you wrote your essays) then you should be totally ignored when you talk about how great and necessary orange dogs are?
Based on how he engaged with me privately I am confident that he it not just a dude tryna make a buck.
(I am not saying he is not also trying to make a buck.)
Unlikely, since he could have walked away with a million dollars instead of doing this. (Per Zvi's other post, "Leopold was fired right before his cliff, with equity of close to a million dollars. He was offered the equity if he signed the exit documents, but he refused.")
So I generally think this type of incentive affecting people's views is important to consider. Though I wonder, couldn't you make counter arguments along the lines of "oh well if they're really so great why don't you try to sell them and make money? Because they're not great." And "If you really believed this was important, you would bet proportional amounts of money on it."
My biggest problem with Leopold's project is this: in a world where his models hold up, where superintelligence is right around the corner, a US / China race is inevitable, and the winner really matters; in that world, publishing these essays on the open internet is very dangerous. It seems just as likely to help the Chinese side as to help the US.
If China prioritizes AI (if they decide that it's one tenth as important as Leopold suggests), I'd expect their administration to act more quickly and competently than the US. I don't have a good reason to think Leopold's essays will have a bigger impact in the US government than the Chinese, or vice-versa (I don't think it matters much that it was written in English). My guess is that they've been read by some USG staffers, but I wouldn't be surprised if things die out with the excitement of the upcoming election and partisan concerns. On the other hand, I wouldn't be surprised if they're already circulating in Beijing. If not now, then maybe in the future -- now that these essays are published on the internet, there's no way to take them back.
What's more, it seems possible to me that by framing things as a race, and saying cooperation is "fanciful", may (in a self-fulfilling prophecy way) make a race more likely (and cooperation less).
Another complicating factor is that there's just no way the US could run a secret project without China getting word of it immediately. With all the attention paid to the top US labs and research scientists, they're not going to all just slip away to New Mexico for three years unnoticed. (I'm not sure if China could pull off such a secret project, but I wouldn't rule it out.)
A slight silver lining, I'm not sure if a world in which China "wins" the race is all that bad. I'm genuinely uncertain. Let's take Leopold's objections for example:
I genuinely do not know the intentions of the CCP and their authoritarian allies. But, as a reminder: the CCP is a regime founded on the continued worship of perhaps the greatest totalitarian mass-murderer in human history (“with estimates ranging from 40 to 80 million victims due to starvation, persecution, prison labor, and mass executions”); a regime that recently put a million Uyghurs in concentration camps and crushed a free Hong Kong; a regime that systematically practices mass surveillance for social control, both of the new-fangled (tracking phones, DNA databases, facial recognition, and so on) and the old-fangled (recruiting an army of citizens to report on their neighbors) kind; a regime that ensures all text messages passes through a censor, and that goes so far to repress dissent as to pull families into police stations when their child overseas attends a protest; a regime that has cemented Xi Jinping as dictator-for-life; a regime that touts its aims to militarily crush and “reeducate” a free neighboring nation; a regime that explicitly seeks a China-centric world order.
I agree that all of these are bad (very bad). But I think they're all means to preserve the CCP's control. With superintelligence, preservation of control is no longer a problem.
I believe Xi (or choose your CCP representative) would say that the ultimate goal is human flourishing, that all they do to maintain control is to preserve communism, which exists to make a better life for their citizens. If that's the case, then if both sides are equally capable of building it, does it matter whether the instruction to maximize human flourishing comes from the US or China?
(Again, I want to reiterate that I'm genuinely uncertain here.)
I believe Xi (or choose your CCP representative) would say that the ultimate goal is human flourishing
I'm very much worried that this sort of thinking is a severe case of Typical Mind Fallacy.
I think the main terminal values of the individuals constituting the CCP – and I do mean terminal, not instrumental – are the preservation of their personal status, power, and control, like the values of ~all dictatorships, and most politicians in general. Ideology is mostly just an aesthetics, a tool for internal and external propaganda/rhetoric, and the backdrop for internal status games.
There probably are some genuine shards of ideology in their minds. But I expect minuscule overlap between their at-face-value ideological messaging, and the future they'd choose to build if given unchecked power.
On the other hand, if viewed purely as an organization/institution, I expect that the CCP doesn't have coherent "values" worth talking about at all. Instead, it is best modeled as a moral-maze-like inertial bureaucracy/committee which is just replaying instinctive patterns of behavior.
I expect the actual "CCP" would be something in-between: it would intermittently act as a collection of power-hungry ideology-biased individuals, and as an inertial institution. I have no idea how this mess would actually generalize "off-distribution", as in, outside the current resource, technology, and power constraints. But I don't expect the result to be pretty.
Mind, similar holds for the USG too, if perhaps to a lesser extent.
I would argue that leaders like Xi would not immediately choose general human flourishing as the goal. Xi has a giant chip on his shoulder. I suspect (not with any real proof, but just from a general intuition) that he feels western powers humiliated imperial China and that permanently disabling them is the first order of business. That means immediately dissolving western governments and placing them under CCP control. Part of human flourishing is the feeling of agency. Having a foreign government use AI to remove their government is probably not conducive to human flourishing. Instead, it will produce utter despair and hopelessness.
Consider what the US did with Native Americans using complete tech superiority. Subjugation and decimation in the name of "improvement" and "reeducation." Their governments were eliminated. They were often forcibly relocated at gunpoint. Schools were created to beat out "savage" habits from children. Their children were seized and rehomed with Whites. Their languages were forcibly suppresed and destroyed. Many killed themselves rather than submit. That is what I'd expect to happen to the West if China gets AGI.
Unfortunately, given the rate at which things are moving, I expect the West's slight lead to evaporate. They've already fast copied SORA. The West is unprepared to contend with a fully operational China. The counter measures are half-hearted and too late. I foresee a very bleak future.
The most obvious reason for skepticism of the impact that would cause follows.
David Manheim: I do think that Leopold is underrating how slow much of the economy will be to adopt this. (And so I expect there to be huge waves of bankruptcies of firms that are displaced / adapted slowly, and resulting concentration of power- but also some delay as assets change hands.)
I do not think Leopold is making that mistake. I think Leopold is saying a combination of the remote worker being a seamless integration, and also not much caring about how fast most businesses adapt to it. As long as the AI labs (and those in their supply chains?) are using the drop-in workers, who else does so mostly does not matter. The local grocery store refusing to cut its operational costs won’t much postpone the singularity.
I want to clarify the point I was making - I don't think that this directly changes the trajectory of AI capabilities, I think it changes the speed at which the world wakes up to those possibilities. That is, I think that in worlds with the pace of advances he posits, the impacts on the economy are slower than in AI, and we get faster capabilities takeoff than we do in economic impacts that make the transformation fully obvious to the rest of the world.
The more important point, in my mind, is what this means for geopolitics, which I think aligns with your skepticism. As I said responding to Leopold's original tweet: "I think that as the world wakes up to the reality, the dynamics change. The part of the extensive essay I think is least well supported, and least likely to play out as envisioned, is the geopolitical analysis. (Minimally, there's at least as much uncertainty as AI timelines!)"
I think the essay showed lots of caveats and hedging about the question of capabilities and timelines, but then told a single story about geopolitics - one that I think it both unlikely, and that fails to notice the critical fact - that this is describing a world where government is smart enough to act quickly, but not smart enough to notice that we all die very soon. To quote myself again, "I think [this describes] a weird world where military / government "gets it" that AGI will be a strategic decisive advantage quickly enough to nationalize labs, but never gets the message that this means it's inevitable that there will be loss of control at best."
Natural gas is a fact question. I have multiple sources who confirmed Leopold’s claims here, so I am 90% confident that if we wanted to do this with natural gas we could do that. I am 99%+ sure we need to get our permitting act together, and would even without AI as a consideration…
A key consideration is that if there is not time to build green energy including fission, and we must choose, then natural gas (IIUC) is superior to oil and obviously vastly superior to coal.
My other comment outlined how >20% of US electricity could be freed up quickly by conservation driven by high electricity prices. The other way the US could get >20% of current US electricity for AI without building new power plants is running the ones we have more. This can be done quickly for natural gas by taking it away from other uses (the high price will drive conservation). There are not that many other uses for coal, but agricultural residues or wood could potentially be used to co-fire in coal power plants. If people didn’t mind spending a lot of money on electricity, petroleum distillates could be diverted to some natural gas power plants.
A reckless China-US race is far less inevitable than Leopold portrayed in his situational awareness report. We’re not yet in a second Cold War, and as things get crazier and leaders get more stressed, a “we’re all riding the same tiger” mentality becomes plausible.
I don't really get why people keep saying this. They do realize that the US's foreign policy starting in ~2010 has been to treat China as an adversary, right? To the extent that they arguably created the enemy they feared within just a couple of years? And that China is not in fact going to back down because it'd be really, really nice of them if they did, or because they're currently on the back foot with respect to AI?
At some point, "what if China decides that the west's chip advantage is unacceptable and glasses Taiwan and/or Korea about it" becomes a possible future outcome worth tracking. It's not a nice or particularly long one, but "flip the table" is always on the table.
Leopold’s is just one potential unfolding, but a strikingly plausible one. Reading it feels like getting early access to Szilard’s letter in 1939.
What, and that triggered no internal valence-washing alarms in you?
Getting a 4.18 means that a majority of your grades were A+, and that is if every grade was no worse than an A. I got plenty of As, but I got maybe one A+. They do not happen by accident.
One knows how the game is played; and is curious on whether he took Calc I at Columbia (say). Obviously not sufficient, but there's kinds and kinds of 4.18 GPAs.
I have thoughts on the impact of AI on nuclear deterrents; and claims made thereof in the post.
But I'm uncertain whether it's wise to discuss such things publicly.
Curious if folks have takes on that. (The meta question)
My take is that in most cases it's probably good to discuss publicly (but I wouldn't be shocked to become convinced otherwise).
The main plausible reason I see for it potentially being bad is if it were drawing attention to a destabilizing technology that otherwise might not be discovered. But I imagine most thoughts are kind of going to be chasing through the implications of obvious ideas. And I think that in general having the basic strategic situation be closer to common knowledge is likely to reduce the risk of war.
(You might think the discussion could also have impacts on the amount of energy going into racing, but that seems pretty unlikely to me?)
If AI ends up intelligent enough and with enough manufacturing capability to threaten nuclear deterrence; I'd expect it to also deduce any conclusions I would.
So it seems mostly a question of what the world would do with those conclusions earlier, rather than not at all.
A key exception is if later AGI would be blocked on certain kinds of manufacturing to create it's destabilizing tech, and if drawing attention to that earlier starts serially blocking work earlier.
All our discussions will be repeated ad nauseam in DoD boardrooms with people whose job it is to talk about info hazards. And I also doubt discussion here will move the needle much if Trump and Jake Paul have already digested these ideas.
Lots of points of views in this, here is an AI "Narration" of this post where every unique quoted person is giver their own distinct "Voice"
https://askwhocastsai.substack.com/p/the-leopold-model-analysis-and-reactions
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Previously: On the Podcast, Quotes from the Paper
This is a post in three parts.
The first part is my attempt to condense Leopold Aschenbrenner’s paper and model into its load bearing elements and core logic and dependencies.
Two versions here, a long version that attempts to compress with minimal loss, and a short version that gives the gist.
The second part goes over where I agree and disagree, and briefly explains why.
The third part is the summary of other people’s reactions and related discussions, which will also include my own perspectives on related issues.
My goal is often to ask ‘what if.’ There is a lot I disagree with. For each subquestion, what would I think here, if the rest was accurate, or a lot of it was accurate?
Summary of Biggest Agreements and Disagreements
I had Leopold review a draft of this post. After going back and forth, I got a much better idea of his positions. They turned out to be a lot closer to mine than I realized on many fronts.
The biggest agreement is on cybersecurity and other security for the labs. I think this holds with or without the rest of the argument. This includes the need to build the data centers in America.
The arguments on power seem mostly right if the path to AGI and ASI is centrally hyperscaling on current approaches and happens quickly. I am highly uncertain about that being the path, but also America badly needs permitting reform and to build massive new sources of electrical power.
Those are both areas where the obvious responses are great ideas if we are not on the verge of AGI and will remain in the ‘economic normal’ scenario indefinitely. We should do those things.
Otherwise, while Leopold’s scenarios are possible, I am not convinced.
I do not believe America should push to AGI and ASI faster. I am still convinced that advancing frontier models at major labs faster is a mistake and one should not do that. To change that, fixing security issues (so we didn’t fully hand everything over) and power issues (so we would be able to take advantage) would be necessary but not sufficient, due to the disagreements that follow.
The most central crux is that to me a path like Leopold predicts and advocates for too frequently dooms humanity. He agrees that it is a risky path, but has more hope in it than I do, and sees no better alternative.
Everyone loses if we push ahead without properly solving alignment. Leopold and I agree that this is the default outcome, period. Leopold thinks his plan would at least makes that default less inevitable and give us a better chance, whereas I think the attitude here is clearly inadequate to the task and we need to aim for better.
We also have to solve for various dynamics after solving alignment. I do not see the solutions offered here as plausible. And we have to avoid this kind of conflict escalation turning into all-out war before decisive strategic advantage is achieved.
Alignment difficulty is a key crux for both of us. If I was as bullish on alignment as Leopold, that would change a lot of conclusions. If Leopold was sufficiently skeptical on alignment, that would greatly change his conclusions as well.
When I say ‘bullish on alignment’ I mean compared to what I believe is the reasonable range of outlooks – there are many who are unreasonably bullish, not taking the problem seriously.
The second central crux is I am far less hopeless on alternative options even if Leopold’s tech model is mostly right. If I knew his tech model was mostly right, and was convinced that the only alternative future was China rushing ahead unsafely, and they would be likely to go critical, then there are no reasonable options.
If I was as bearish on alternatives as Leopold, but remained my current level of bearish on alignment, and the tech worked out like he predicts, then I would largely despair, p(doom) very high, but would try to lay groundwork for least bad options.
I am also not so confident that governments can’t stay clueless, or end up not taking effective interventions, longer than they can stay governments.
I am far less confident in the tech model that scaling is all you need from here. I have less faith in the straight lines on graphs to continue or to mean in practice what Leopold thinks they would. We agree that the stages of human development metaphor (‘preschooler,’ ‘smart undergraduate’ and so on) are a flawed analogy, but I think it is more flawed than Leopold does. although I think there is a decent chance it ends up getting an approximately right answer.
If we do ‘go critical’ and get to Leopold’s drop-in workers or similar, then I think if anything his timelines from there seem strangely slow.
I don’t see the ‘unhobbling’ terminology as the right way to think about scaffolding, although I’m not sure how much difference that makes.
Decision Theory is Important
I suspect a lot of this is a decision theory and philosophical disagreement.
Eliezer noted that in the past Leopold often seemingly used causal decision theory without strong deontology.
If you read the Situational Awareness paper, it is clear that much of it is written from the primary perspective of Causal Decision Theory (CDT), and in many places it uses utilitarian thinking, although there is some deontology around the need for future liberal institutions.
Thinking in CDT terms restricts the game theoretic options. If you cannot use a form of Functional Decision Theory (FDT) and especially if you use strict CDT, a lot of possible cooperation becomes inevitable conflict. Of course you would see no way to avoid a race other than to ensure no one else has any hope they could win one. Also, you would not ask if your type of thinking was exactly what was causing the race in the first place in various ways. And you would not notice the various reasons why one might not go down this road even if the utilitarian calculus had it as positive expected value.
Another place this has a big impact is alignment.
If you think that alignment means you need to end up with an FDT agent rather than a CDT agent to avoid catastrophe, then that makes current alignment techniques look that much less hopeful.
If you think that ASIs will inevitably figure out FDT and thus gain things like the ability to all act as if they were one unified agent, or do other things that seem against their local incentives, then a lot of your plans to keep humans in control of the future or get an ASI to do the things you want and expect will not work.
Part 1: Leopold’s Model and Its Implications
The Long Version
In my own words, here is my attempt to condense Leopold’s core claims (with varying levels of confidence) while retaining all the load bearing arguments and points.
Category 1 (~sec 1-2): Capabilities Will Enable ASI (superintelligence) and Takeoff
Category 2 (~sec 3c): Alignment is an Engineering Problem
Category 3 (~sec 3a): Electrical Power and Physical Infrastructure Are Key
Category 4 (~sec 3b): We Desperately Need Better Cybersecurity
Category 5 (~sec 3d and 4): National Security and the Inevitable Conflicts
Thus the conclusion: The Project is inevitable and it must prevail. We need to work now to ensure we get the best possible version of it. It needs to be competent, to be fully alignment-pilled and safety-conscious with strong civilian control and good ultimate aims. It needs to be set up for success on all fronts.
That starts with locking down the leading AI labs and laying groundwork for needed electrical power, and presumably (although he does not mention this) growing state capacity and visibility to help The Project directly.
The Short Version
The categories here correspond to the sections in the long version.
Which Assumptions Are How Load Bearing in This Model?
There are many bold and controversial claims. Which of them are how load bearing?
The entire picture mostly depends on Category 1.
If AI is not going to ‘go critical’ in the relevant sense any time soon (the exact timing is not that relevant, 2027 vs. 2029 changes little), then most of what follows becomes moot. Any of the claims here could break this if sufficiently false. The straight lines could break, or they could not mean what Leopold expects in terms of practical benefits, or bottlenecks could stop progress from accelerating.
AI would still be of vital economic importance. I expect a lot of mundane utility and economic growth that is already baked in. It will still play a vital role in defense and national security. But it would not justify rapidly building a trillion dollar cluster. Strain on physical infrastructure and power generation would still be important, but not at this level. A lead in tech or in compute would not be critical in a way that justifies The Project, nor would the government attempt it.
The concerns about cybersecurity would still remain. The stakes would be lower, but would very much still be high enough that we need to act. Our cybersecurity at major labs is woefully inadequate even to mundane concerns. Similarly, AI would be an excellent reason to do what we should already be doing on reforming permitting and NEPA, and investing in a wealth of new green power generation including fission.
If Category 2 is wrong, and alignment or other associated problems are much harder or impossible, but the rest is accurate, what happens?
Oh no.
By default, this plays out the standard doomed way. People fool themselves into thinking alignment is sufficiently solved, or they decide to take the chance because the alternative is worse, or events get out of everyone’s control, and someone proceeds to build and deploy ASI anyway. Alignment fails. We lose control over the future. We probably all die.
Or ‘alignment’ superficially succeeds in the ‘give me what I want’ sense, but then various dynamics and pressures take over, and everything gets rapidly handed over to AI control, and again the future is out of our hands.
The ‘good scenarios’ here are ones where we realize that alignment and related issues are harder and the current solutions probably or definitely fail badly enough people pause or take extreme measures despite the geopolitical and game theory nightmares involved. Different odds change the incentives and the game. If we are lucky, The Project means there are only a limited number of live players, so an agreement becomes possible, and can then be enforced against others.
Leopold’s strategy is a relatively safe play in a narrow window, where what matters is time near the end (presumably due to using the AI researchers), and what otherwise looks like a modest ‘margin of safety’ in calendar time would both be used (you have to be sufficiently confident you have it or you won’t use it) and for that time to work at turning losses into wins. If you don’t need that time at the end, you do not need to force a race to get a large lead, and you have likely taken on other risks needlessly. If the time at the end is not good enough, then you lose anyway.
Category 3 matters because it is the relevant resource where China most plausibly has an edge over the United States, and it might bring other players into the game.
America has a clear lead right now. We have the best AI labs and researchers. We have access to superior chips. If electrical power is not a binding constraint, then even if China did steal many of our secrets we would still have the advantage. In general the race would then seem much harder to lose and the lead likely to be larger, but we can use all the lead and safety margin we can get.
We would also not have to worry as much about environmental concerns. As usual, those concerns take forms mostly unrelated to the actual impact on climate. If we build a data center in the UAE that runs on oil instead of one in America that runs on natural gas, we have not improved the outlook on climate change. We have only made our ‘American’ numbers look better by foisting the carbon off on the UAE. We could of course avoid that by not building the data centers at all, but if Leopold is right then that is not a practical option nor would it be wise if it were.
Also, if Leopold is right and things escalate this quickly, then we could safely set climate concerns aside during the transition period, and use ASI to solve the problem afterwards. If we build ASI and we cannot use it to find a much better way to solve climate, then we (along with the rest of the biosphere, this is not only about humans) have much bigger problems and were rather dead anyway.
Category 4 determines whether we have a local problem in cybersecurity and other security, and how much we need to do to address this. The overall picture does not depend on it. Leopold’s routes to victory depend on indeed fixing this problem, at which point we are in that scenario anyway. So this being wrong would be good news, and reduce the urgency of government stepping in, but not alter the big picture.
Category 5 has a lot of load bearing components, where if you change a sufficient combination of them the correct and expected responses shift radically.
If you think having the most advanced ASI does not grant decisive strategic advantage (29 and 30, similar to category 1), then the implications and need to race fall away.
If you do not believe that China winning is that bad relative to America winning (31 and 32, also 34) (or you think China winning is actively good, as many Chinese do) then the correct responses obviously change. If you think China would be as good or better than us on alignment and safety, that might or might not be enough for you.
However the descriptive or expected responses do not change. Even if China ‘winning’ would be fine because we all ultimately want similar things and the light cone is big enough for everyone, decision makers in America will not be thinking that way. The same goes for China’s actions in reverse.
If you think China is not in it to win it and they are uncompetitive overall (various elements of 33 and 34, potentially 36, similar to category 3) or you can secure our lead without The Project, then that gives us a lot more options to have a larger margin of safety. As the government, if you get situationally aware you still likely want The Project because you do not want to hand the future to Google or OpenAI or allow them to proceed unsafely, and you do not want them racing each other or letting rogue actors get WMDs, but you can wait longer for various interventions. You would still want cybersecurity strengthened soon to defend against rogue actors.
If you think either or both governments will stay blind until it is too late to matter (35 and 36) then that changes your prediction. If you don’t know to start The Project you won’t, even if it would be right to do so.
If America’s response would be substantially different than The Project (37) even after becoming situationally aware, that alters what is good to do, especially if you are unlikely to impact whether America does start The Project or not. It might involve a purer form of nationalization or mobilization. It might involve something less intrusive. There is often a conflation of descriptive and normative claims in situational awareness.
If some combination of governments is more concerned with existential risk and alignment, or with peace and cooperation, than Leopold expects, or there is better ability to work out a deal that will stick (38, 39 and 40) then picking up the phone and making a deal becomes a better option. The same goes if the other side remains asleep and doesn’t realize the implications. The entire thesis of The Project, or at least of this particular project, depends on the assumption that a deal is not possible except with overwhelming strength. That would not mean any of this is easy.
It would be easy to misunderstand what Leopold is proposing.
He confirms he very much is NOT saying this:
I strongly disagree with this first argument. But so does Leopold.
Instead, he is saying something more like this:
This brings us to part 2.
Part 2: Where I Agree and Disagree
On the core logic of the incorrect first version above, since I think this is a coherent point of view to have and one that it is easy to come away considering:
This ‘strawman’ version very much relies on assumptions of Causal Decision Theory and a utilitarian framework, as well.
What about the second version, that (I think) better reflects Leopold’s actual thesis? In short:
Now I’ll go statement by statement.
The probabilities here are not strongly held or precise, rather they are here because some idea of where one is at is far better than none. I am discounting scenarios where we face unrelated fast existential risks or civilizational collapse.
Category 1 on timelines, I see Leopold as highly optimistic about how quickly and reliably things will get to the point of going critical, then his timeline seems if anything slow to me.
I am not confident LLMs if scaled up get you the something good enough to fill this role or otherwise go critical. And I am certainly not so confident it happens this fast. But if they do offer that promise, the rest seems like it not only follows, but it seems like Leopold’s timelines from there are highly conservative.
Category 2 on alignment is where I think Leopold is most off base.
Category 3 on power is an area I haven’t thought about until recently, and where I know relatively little. I mostly buy the arguments here. If you do want to speed ahead, you will need the power.
Category 4 on cybersecurity I am pretty much in violent agreement with Leopold. He might be high on the value of current algorithmic improvements, and I worry he may be low on the value of current frontier models. But yeah. We need to fix this issue.
Category 5 brings it all together. Centrally I am higher on potential cooperation, and much higher on alignment difficulty and the chance racing forward causes everyone to lose, and I also am more confident in our lead if it comes to that. I also have less faith that the authorities will wake up in time.
What about considerations that are missing entirely?
Again, decision theory. If you are an FDT agent (you use functional decision theory) and you think other high stakes agents also plausibly use FDT including ASIs, then that changes many aspects of this.
Part 3: Reactions of Others
The Basics
Always start there.
I expected my p(drop-in remote workers by 2027) would be substantially lower than Aschenbrenner’s, but we did compare notes, and we are not that far apart.
The most obvious reason for skepticism of the impact that would cause follows.
I do not think Leopold is making that mistake. I think Leopold is saying a combination of the remote worker being a seamless integration, and also not much caring about how fast most businesses adapt to it. As long as the AI labs (and those in their supply chains?) are using the drop-in workers, who else does so mostly does not matter. The local grocery store refusing to cut its operational costs won’t much postpone the singularity.
As always, it is great when people say what they believe, predict and will do.
I think parts of this are the lab playbook, especially the tech section, alas also largely the alignment section. Other parts are things those companies would prefer to avoid.
Perhaps I am overly naive on this one, but I do not think Leopold is being dishonest. I think the central model of the future here is what he expects, and he is advocating for what he thinks would be good. It is of course also rhetoric, designed to be convincing to certain people especially in national security. So it uses language they can understand. And of course he is raising funding, so he wants to excite investors. The places things seem odd and discordant are around all the talk of patriotism and the Constitution and separation of powers and such, which seems to be rather laid on thick, and in the level of expressed confidence at several other points.
Would an edge in ASI truly give decisive strategic advantage in a war? As Sam Harsimony notes here before arguing against it, that is load bearing. Harsimony thinks others will be able to catch up quickly, rather than the lead growing stronger, and that even a few years is not enough time to build the necessary stuff for a big edge, and that disabling nukes in time is a pipe dream.
I am with Leopold on this one. Unless the weights get stolen, I do not think ‘catching up’ will be the order of the day, and the effective lead will get longer not shorter over time. And I think that with that lead and true ASI, yes, you will not need that much time building physical stuff to have a decisive edge. And yes, I would expect that reliably disabling or defending against nukes will be an option, even if I cannot detail exactly how.
A Clarification from Eliezer Yudkowsky
Eliezer linked back to this after someone asked if Eliezer Yudkowsky disliked Aschenbrenner.
At the time, OpenAI had fired Leopold saying it was for leaking information, and people were trying to blame Yudkowsky by proxy, and saying ‘never hire alignment people, they cannot be trusted.’ The principle seems to be: Blame everything anyone vaguely safety-related does wrong on AI safety in general and often Eliezer Yudkowsky in particular.
That was always absurd, even before we learned how flimsy the ‘leak’ claim was.
It is easy to see, reading Situational Awareness, why Aschenbrenner was not optimistic about MIRI and Yudkowsky’s ideas, or the things they would want funded. These are two diametrically opposed strategies. Both world models have a lot in common, but both think the other’s useful things are not so useful and the counterproductive actions could be quite bad.
Rob Bensinger then relays a further update from Eliezer Yudkowsky, which I will relay in full.
As I note above, I would use stronger language, and am more confident that Leopold did not break confidentiality in a meaningful way.
How ironic that the shoe is now on the other foot, Mr. Bond.
Or is it?
From an outside perspective, Leopold is making extraordinary claims. One could ask how much he is updating based on others often having very different views, or why he is substituting his judgment and model rather than being modest.
From an insider perspective, perhaps Leopold is simply reflecting the consensus perspective at the major labs. We should have long ago stopped acting surprised when yet another OpenAI employee says that AGI is coming in the late 2020s. So perhaps that part is Leopold ‘feeling the feeling the AGI’ and the faith in straight lines on curves. If you read how he writes about this, he does not sound like a person thinking he is making a bold claim.
Indeed, that is the thesis of situational awareness, that there is a group of a few hundred people who get it, and everyone else is about to get blindsided and their opinions should be discounted.
There was a clear philosophical disagreement on things like decision theory and also strong disagreements on strategy, as mentioned above.
I remember that situation, and yeah this seemed like a wise thing to say at the time.
Children of the Matrix
Many questioned Leopold’s metaphor of using childhood development as a stand-in for levels of intelligence.
I think Leopold’s predictions on effective capabilities could prove right, but that the metaphor was poor, and intelligence does need to be better defined.
For example:
Ate-a-Pi offers more in depth thoughts here. Doubts the scaling parts but considers plausible, thinks cost will slow things down, that not everyone wants the tech, that China is only 15 months behind and will not dare actually try that hard to steal our secrets, and might clamp down on AI rather than race to defend CCP supremacy.
Or here:
Or:
Similarly, with other thoughts as well:
Gary Marcus says this in the style of Gary Marcus. His wise central point is that no matter the timeline, the concerns about us being unprepared remain valid.
Aligning a Smarter Than Human Intelligence is Difficult
Seriously. Super hard. Way harder than Leopold thinks.
Leopold knows that alignment is difficult in some ways, but far from the complete set of ways and magnitudes that alignment is difficult.
Indeed, it is great that Leopold recognizes that his position is incredibly bullish on alignment techniques, and that he is taking a bold position saying that, rather than denying that there is a very difficult problem.
I am not attempting here to argue the case that alignment is super difficult and go all List of Lethalities, the same way that Leopold offered a sketch but did not offer an extensive argument for why we should be bullish on his alignment techniques.
One exception is I will plant the flag that I do not believe in the most important cases for AGI that evaluation is easier than generation. Aaron Scher here offers an argument that this is untrue for papers. I also think it is true for outputs generally, in part because your evaluation needs to be not boolean but bespoke.
Another is I will point to the decision theory issues I raised earlier.
Beyond that, I am only listing responses that were explicitly aimed at Leopold.
Former OpenAI colleague of Leopold’s Richard Ngo outlines one failure scenario, listed because he wrote this as a deliberate response to Leopold rather than because I believe it is the strongest case.
Steve’s response at ‘Am I Stronger Yet?’ affirms that if you buy the premises, you should largely buy the conclusion, but emphasizes how difficult alignment seems in this type of scenario, and also does not buy anything like the 2030 timeline.
The Sacred Timeline
Similar to alignment, I do not share Leopold’s timeline assessments, although here I have much higher uncertainty. I have no right to be shocked if Leopold is right.
Others are more skeptical than that. For example, Dave Friedman’s response is all about why he thinks timelines are not so short, largely citing the data issues. Leopold has explained why he thinks we can overcome that.
No, no, say the people, even if you buy the premise the calculations are all wrong!
No, no, the calculation is super imprecise at best.
What if the timeline is right, though?
If the timelines are indeed this short, then quite a lot of plans are useless.
The Need to Update
Leopold used to have unreasonably skeptical and long timelines as late as 2021. Note that this was very early in his thinking about these problems.
I strongly agree that we should not penalize those who radically change their minds about AI and especially about AI timelines. I would extend this beyond 2023.
It still matters how and why someone made the previous mistakes, but mostly I am happy to accept ‘I had much worse information, I thought about the situation in the wrong ways, and I was wrong.’ In general, these questions are complex, they depend on many things, there are conflicting intuitions and comparisons, and it is all very difficult to face and grapple with. So yeah, basically free pass, and I mostly would continue that pass to this day.
The report also contained a view that doom avoidance only requires a fixed set of things going wrong. He no longer endorses that, but I find it important to offer a periodic reminder that my view of that is very different:
By construction, in these scenarios, we bring into existence large numbers of highly capable and intelligent systems, that are more efficient and competitive than humans. Those systems ending up with an increasing percentage of the resources and power and then in full control is the default outcome, the ‘natural’ outcome, the one requiring intervention to prevent. Solving alignment in the way that term is typically used would not be sufficient to change this.
Open Models and Insights Can Be Copied
Remarkable number of people in the replies here who think the way to deal with CCP stealing our secrets is to give away our secrets before they can be stolen? That’ll show them. Also others who do not feel the AGI, or warning that stoking such fears is inherently bad. A reminder that Twitter is not real life.
Similarly, here is Ritwik Gupta praising Leopold’s paper but then explaining that innovation does not happen in secret, true innovation happens in the open, to keep us competitive with the people who are currently way behind our closed models that are developed in (poorly kept from our enemies) secret it is vital that we instead keep development open and then deploy open models. Which the CCP can then copy. As usual, openness equals good is the assumption.
If you want to argue ‘open models and open sharing of ideas promotes faster development and wider deployment of powerful and highly useful AI’ then yes, it absolutely does that.
If you want to argue that matters and we should not be worried about rogue actors or beating China, then yes, that is also a position one can argue.
If you want to argue ‘open models and open sharing of ideas are how we beat China,’ as suddenly I see many saying, then I notice I am confused as to why this is not Obvious Nonsense.
You Might Not Be Paranoid If They’re Really Out to Get You
A key obviously true and important point from Leopold is that cybersecurity and other information security at American AI labs is horribly inadequate given the value of their secrets. We need to be investing dramatically more in information security and cybersecurity, ideally with the help of government expertise.
This seems so transparently, obviously true to me.
The counterarguments are essentially of the form ‘I do not want that to be true.’
OpenAI told Leopold that concerns about CCP espionage were ‘racist.’ This is Obvious Nonsense and we should not fall for it, nor do I think the admonishment was genuine. Yes, spying is a thing, countries spy on each other all the time, and we should focus most on our biggest and most relevant rivals. Which here means China, then Russia, then minor states with little to lose.
One can worry about espionage without gearing up for an all-out race or war.
For an outside example, in addition to many of the comments referred to in the previous section, Mr. Dee here speaks out against ‘this fear mongering,’ and talks of researchers from abroad who want to work on AI in America, suggesting national security should be ‘vetted at the chip distribution level’ rather than ‘targeting passionate researchers.’
This does not address the question of whether the Chinese are indeed posed to steal all the secrets from the labs. If so, and I think that it is so, then that seems bad. We should try and prevent that.
Obviously we should not try and prevent that by banning foreigners from working at the labs. Certainly not those from our allies. How would that help? How would that be something one would even consider anywhere near current security levels?
But also, if this was indeed an existential race for the future, let us think back to the last time we had one of those. Remember when Sir Ian Jacob, Churchill’s military secretary, said we won WW2 because ‘our German scientists were better?’ Or that Leopold proposes working closely with our allies on this?
We Are All There Is
If there is one place I am in violent agreement with Leopold, it is that there are no reasonable authority figures. Someone has to, and no one else will.
I quoted this before, but it bears repeating.
Are there other live players who do not care about whether the world falls apart, and are out to grab the loot (in various senses) while they can? A few, but in the relevant senses here they do not count.
The Inevitable Conflict
What is the worst thing you can do? It is definitely one of the two things.
It is worth noting again that Lepold’s position is not all gung ho build it as fast as possible.
Right. That. Rob Bensinger’s model and Leopold Aschenbrenner’s model are similar in many other ways, but (at least in relative terms, and relative to my assessment as well) Leopold dramatically downplays the difficulty of alignment and dealing with related risks, and the likely results.
Leopold says very clearly that this transition is super scary. He wants to do it anyway, because he sees no better alternatives.
Rob’s argument is that Leopold’s proposal is not practical and almost certainly fails. We cannot solve alignment, not with that attitude and level of time and resource allocation. Rob would say that Leopold talks of the need for ‘a margin of safety’ but the margins he seeks are the wrong OOM (order of magnitude) to get the job done. Cooperation may be unlikely, but at least there is some chance.
Leopold’s response to such arguments is that Rob’s proposal, or anything like it, is not practical, and almost certainly fails. There is no way to get an agreement on it or implement it. If it was implemented, it would be an unstable equilibrium. He repeatedly says China will never be willing to play it safe. Whereas by gaining a clear insurmountable lead, America can gain a margin of safety and that is probably enough. It is super scary, but it will probably be fine.
Any third option would need to prevent unsafe ASI from being built, which means either no ASIs or find a way to safely build ASI and either prevent others from building unsafe ASIs or stay far enough ahead of them that you can otherwise defend. That means either find a solution quickly, or ensure you have the time to find one slowly.
We also should be concerned that if we do get into a race, that race could turn hot. Leopold very much acknowledges this.
The ‘good’ news here is the uncertainty. You do not know when someone else is close to getting to AGI, ASI or RSI, and all of those progressions are gradual. There is never a clear ‘now or never’ moment. My presumption is this is sufficient to keep the battle cold and conventional most of the time. But perhaps not, especially around Taiwan.
The biggest risk in Leopold’s approach, including his decision to write the paper, is that it is a CDT (causal decision theory)-infused approach that could wake up all sides and make cooperation harder, and thus risks causing the very crisis it predicts and wants to prevent.
Leopold does look forward to abundance and pricing galaxies, especially on the podcast, but doesn’t talk much about those problems presumably because they seem solvable once we come to them, especially with ASI help.
I do however think we need to be open to the possibility that conflict might become necessary.
We may wind up with only bad options.
There Are Only Least Bad Options
Pointing out something unacceptable about a proposed approach does not mean there is a better one. You have to find the better one. One that sounds good, that adheres to what are otherwise your principles, only works if it would actually work. I have not heard a solution proposed that does not involve solving impossible-level problems, or that does not involve large sacrifices or compromises.
If all potential outlooks are grim, one must choose whichever is least grim. I do not expect to get 99% confidence, nor do I think we can wait for that. If I thought the alignment and internal ASI dynamics issues in Leopold’s plan were 90% to work, I would be thrilled to take that, and I’d settle for substantially less. That is what realism requires. All the options are bad.
I am more optimistic about cooperative methods than Leopold, and more skeptical about the difficulty alignment and what related problems must be solved in order to get through this.
A sufficiently strong change in either or both of these would be a crux. I am confident that ‘it will probably be fine’ is not the right attitude for going forward, but I am open to being convinced (if this scenario is coming to pass) that it could become our least bad option.
What happens if we are in this scenario except that Leopold and Rob are both right that the other’s solution is impractical? And there is no middle path or third option that is practical instead? Then we probably all die, no matter what path we attempt.
A Really Big Deal
Her timeline then seems to return to normal. But yes, responding to this by rethinking one’s life plan is a highly reasonable response if the information or reasoning was new to you and you buy the core arguments about the technology. Or even if you buy that they might be right, rather than definitely are right.
What Gives You the Right?
What about the fact that people pretty much everywhere would vote no on all this?
We have very strong evidence that people would vote no. If you educated them they would still vote no. Sure, if you knew and could prove glorious future, maybe then they would vote yes, but you can’t know or prove that.
The standard answer is that new technology does not require a vote. If it did, we would not have our civilization. Indeed, consider the times we let the people veto things like nuclear power, and how much damage that did. The flip side is if we never put any restrictions on technologies, that is another scenario where we would not have our civilization. When the consequences and negative externalities get large enough, things change.
The better answer is ‘what do you mean the right?’ That is especially true in the context of a view that due to rivals like China, where people most definitely do not get a vote, development cannot be stopped. It does not matter what ‘the people’ want without a way for those people to get it.
We should absolutely choose and attempt to elect candidates that reflect our desire to not have anyone create superintelligence until we are ready, and convince our government to take a similar position. But we may or may not have a practical choice.
Random Other Thoughts
Peter Bowden notices the lack of talk about sentience and requests Leopold’s thoughts on the matter. I think it was right to not throw that in at this time, but yes I am curious.
There will always be bad takes, here’s a fun pure one.
Grady clarifies elsewhere that his primary objection is the timeline and characterization of AI capabilities, which is a fine place to object. But you have to actually say what is wrong and offer reasons.
Seriously, the Columbia thing is pretty insane to me.
Getting a 4.18 means that a majority of your grades were A+, and that is if every grade was no worse than an A. I got plenty of As, but I got maybe one A+. They do not happen by accident. The attitude to go that far class after class, that consistently, boggles my mind.
One way to summarize events in Leopold’s life, I suppose.