This is a linkpost for https://situational-awareness.ai/

The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace many college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word.

In the link provided, Leopold Aschenbrenner explains why he believes AGI is likely to arrive within the decade, with superintelligence following soon after. He does so in some detail; the website is well-organized, but the raw pdf is over 150 pages.

Leopold is a former member of OpenAI's Superalignment team; he was fired in April for allegedly leaking company secrets. However, he contests that portrayal of events in a recent interview with Dwarkesh Patel, saying he leaked nothing of significance and was fired for other reasons.[1]

However, I am somewhat confused by the new business venture Leopold is now promoting, an "AGI Hedge Fund" aimed at generating strong returns based on his predictions of imminent AGI. In the Dwarkesh Patel interview, it sounds like his intention is to make sure financial resources are available to back AI alignment and any other moves necessary to help Humanity navigate a turbulent future. However, the discussion in the podcast mostly focuses on whether such a fund would truly generate useful financial returns.

If you read this post, Leopold[2], could you please clarify your intentions in founding this fund?

  1. ^

    Specifically he brings up a memo he sent to the old OpenAI board claiming OpenAI wasn't taking security seriously enough. He was also one of very few OpenAI employees not to sign the letter asking for Sam Altman's reinstatement last November, and of course, the entire OpenAI superaligment team has collapsed for various reasons as well.

  2. ^

    Leopold does have a LessWrong account, but hasn't linked his new website here after some time. I hope he doesn't mind me posting in his stead.

New Comment
30 comments, sorted by Click to highlight new comments since:

(crossposted from twitter) Main thoughts: 
1. Maps pull the territory 
2. Beware what maps you summon 

Leopold Aschenbrenners series of essays is a fascinating read: there is a ton of locally valid observations and arguments. Lot of the content is the type of stuff mostly discussed in private. Many of the high-level observations are correct.

At the same time, my overall impression is the set of maps sketched pulls toward existential catastrophe, and this is true not only for the 'this is how things can go wrong' part, but also for the 'this is how we solve things' part. Leopold is likely aware of the this angle of criticism, and deflects it with 'this is just realism' and 'I don't wish things were like this, but they most likely are'. I basically don't buy that claim.

[-]No77e1918

He's starting an AGI investment firm that invests based on his thesis, so he does have a direct financial incentive to make this scenario more likely 

(Though he also has an incentive to not die.)

I agree that it's a good read.

I don't agree that it "pulls towards existential catastrophe". Pulls towards catastrophe, certainly, but not existential catastrophe? He's explicitly not a doomer,[1] and is much more focused on really-bad-but-survivable harms like WW3, authoritarian takeover, and societal upheaval.

  1. ^

    Page 105 of the PDF, "I am not a doomer.", with a footnote where he links a Yudkowsky tweet agreeing that he's not a doomer. Also, he listed his p(doom) as 5% last year. I didn't see an updated p(doom) in Situational Awareness or his Dwarkesh interview, though I might have missed it.

The question of 'pulls towards catastrophe' doesn't matter whether the author believes their work pulls towards catastrophe. The direction of the pull is in the eye of the reader. Therefore, you must evaluate whether Jan (or you, or I) believe that the futures which Leopold's maps pull us toward will result in existential catastrophes. For a simplified explanation, imagine that Leopold is driving fast at night on a winding cliffside road, and his vision is obscured by a heads-up display of a map of his own devising. If his map directs him to take a left and he drives over the cliff edge... It doesn't matter where Leopold thought he would end up, it matters where he got to. If you are his passenger, you should care more about where you think he's navigation is likely to actually end you up at than about where Leopold believes that his navigation will end up.

I think this gets more tricky because of coordination. Leopold's main effect is in selling maps, not using them. If his maps list a town in a particular location, which consumers and producers both travel to expecting a town, then his map has reshaped the territory and caused a town to exist.

Pointing out one concrete dynamic here, most of his argument boils down to "we must avoid a disastrous AI arms race by racing faster than our enemies to ASI", but of course it is unclear whether an "AI arms race" would even exist if nobody were talking about an "AI arms race". That is, just following incentives and coordinating rationally with their competitors.

There's also obviously the classic "AGI will likely end the world, thus I should invest in / work on it since if it doesn't I'll be rich, therefore AGI is more likely to end the world" self-fulfilling prophesy that has been a scourge on our field since the founding of DeepMind.

Hm, I was interpreting 'pulls towards existential catastrophe' as meaning Leopold's map mismatches the territory because it overrates the chance of existential catastrophe.

If the argument is instead "Leopold publishing his map increases the chance of existential catastrophe" (by charging race dynamics, for example) then I agree that's plausible. (Though I don't think the choice to publish it was inexcusable - the effects are hard to predict, and there's much to be said for trying to say true things.)

If the argument is "following Leopold's plan likely leads to existential catastrophe", same opinion as above.

Oh huh, I hadn't even considered that interpretation. Personally, I think Leopold's key error is in underrating how soon we will get to AGI if we continue as we have been, and in not thinking that that is as dangerous an achievement as I think it is.

So, if your interpretation of 'overrates chance of existential catastrophe' is correct, I am of the opposite opinion. Seems like Leopold expects we can make good use of AGI without a bunch more alignment. I think we'll just doom ourselves if we try to use it.

Personally, I think Leopold's key error is in underrating how soon we will get to AGI if we continue as we have been


His modal time-to-AGI is like 2027, with a 2-3 year intelligence explosion afterwards before humanity is ~ irrelevant. 

and in not thinking that that is as dangerous an achievement as I think it is.

Yeah this seems likely. 

Yes, and my modal time-to-AGI is late 2025 / early 2026. I think we're right on the brink of a pre-AGI recursive self-improvement loop which will quickly rocket us past AGI. I think we are already in a significant compute overhang and data overhang. In other words, that software improvements alone can be more than sufficient. In other words, I am concerned.

[-]Ann30

The difference between these two estimates feels like it can be pretty well accounted for by reasonable expected development friction for prototype-humanish-level self-improvers, who will still be subject to many (minus some) of the same limitations that prevent "9 woman from growing a baby in a month". You can predict they'll be able to lubricate more or less of that, but we can't currently strictly scale project speeds by throwing masses of software engineers and money at it.

I believe you are correct about the importance of taking these phenomena into account: indivisibility of certain serial tasks, coordination overhead of larger team sizes. I do think that my model takes these into account.

It's certainly possible that my model is wrong. I feel like there's a lot of uncertainty in many key variables, and likely I have overlooked things. The phenomena you point out don't happen to be things that I neglected to consider though.

[-]Ann10

I understand - my point is more that the difference between these two positions could be readily explained by you being slightly more optimistic in estimated task time when doing the accounting, and the voice of experience saying "take your best estimate of the task time, and double it, and that's what it actually is".

One example: Leopold spends a lot of time talking about how we need to beat China to AGI and even talks about how we will need to build robo armies. He paints it as liberal democracy against the CCP. Seems that he would basically burn timeline and accelerate to beat China. At the same time, he doesn't really talk about his plan for alignment which kind of shows his priorities. I think his narrative shifts the focus from the real problem (alignment).

This part shows some of his thinking. Dwarkesh makes some good counter points here, like how is Donald Trump having this power better than Xi.

Page 87:

The clusters can be built in the US, and we have to get our act together to make sure it happens in the US.

No, we have to make sure it doesn't happen anywhere.

Page 110:

What we want is to add side-constraints: don’t lie, don’t break the law, etc.

That's very not enough. A superintelligence will be much more economically powerful than humans. If it merely exhibits normal human levels of benevolence, truth-telling, law-obeying, money-seeking, power-seeking and so on, it will deprive humans of everything.

It's entirely legal to do jobs so cheaply that others can't compete, and to show people optimized messages to make them spend savings on consumption. A superintelligence merely doing these two things superhumanly well, staying within the law, is sufficient to deprive most people of everything. Moreover, the money incentives point to building superintelligences that will do exactly these things, while rushing to the market and spending the minimum on alignment.

Superintelligence requires super-benevolence. It must not be built for profit or for an arms race, it has to be built for good as the central goal. We've been saying this for decades. If AI researchers even now keep talking in terms like "add constraints to not break the law", we really are fucked.

That's very not enough. A superintelligence will be much more economically powerful than humans. If it merely exhibits normal human levels of benevolence, truth-telling, law-obeying, money-seeking, power-seeking and so on, it will deprive humans of everything.

It's entirely legal to do jobs so cheaply that others can't compete, and to show people optimized messages to make them spend savings on consumption. A superintelligence merely doing these two things superhumanly well, staying within the law, is sufficient to deprive most people of everything. Moreover, the money incentives point to building superintelligences that will do exactly these things, while rushing to the market and spending the minimum on alignment.

 

So, my guess at Leo's reaction is one of RLHF-optimism. Even a bizarre-sounding idea like "get your ASI to love the US constitution" might be rationalized as merely a way to get a normal-looking world after you do RLHF++. And sure, if your AI jumps straight to maximizing the reward process, it will manipulate the humans and bad things happen, but learning is path-dependent and if you start with an AI with the right concepts and train it on non-manipulative cases first, the RLHF-optimist would say it's not implausible that we can get an AI that genuinely doesn't want to manipulate us like that.

Although I agree this is possible, and is in fact reason for modest optimism, it's also based on a sort of hacky, unprincipled praxis of getting the model to learn good concepts, which probably fails a large percentage of the time even if we try our best. And even if it succeeds, I'm aesthetically displeased by a world that builds transformative AI and then uses it to largely maintain the status quo ante - something has gone wrong there, and worst case that wrongness will be reflected in the values learned by the AI these people made.

I think that response basically doesn't work. But when I started writing in more detail why it doesn't work, it morphed into a book review that I've wanted to write for the last couple years but always put it off. So thank you for finally making me write it!

>So, my guess at Leo's reaction is one of RLHF-optimism.

This is more or less what he seems to say according to the transcript -- he thinks we will have legible trustworthy chain of thought at least for the initial automated AI researchers, we can RLHF them, and then use them to do alignment research.  This of course is not a new concept and has been debated here ad nauseum but it's not a shocking view for a member of Ilya and Jan's team and he clearly cosigns it in the interview.

[-]Wei Dai5125

Some questions for @leopold.

  1. Anywhere I can listen to or read your debates with "doomers"?
  2. We share a strong interest in economics, but apparently not in philosophy. I'm curious if this is true, or you just didn't talk about it in the places I looked.
  3. What do you think about my worries around AIs doing philosophy? See this post or my discussion about it with Jan Leike.
  4. What do you think about my worries around AGI being inherently centralizing and/or offense-favoring and/or anti-democratic (aside from above problems, how would elections work when minds can be copied at little cost)? Seems like the free world "prevailing" on AGI might well be a Pyrrhic victory unless we can also solve these follow-up problems, but you don't address them.
  5. More generally, do you have a longer term vision of how your proposal leads to a good outcome for our lightcone, avoiding all the major AI-related x-risks and s-risks?
  6. Why are you not in favor of an AI pause treaty with other major nations? (You only talk about unilateral pause in the section "AGI Realism".) China is currently behind in chips and AI and it seems hard to surpass the entire West in a chips/AI race, so why would they not go for an AI pause treaty to preserve the status quo instead of risking a US-led intelligence explosion (not to mention x-risks)?

Re 6: at 1:24:30 in the Dwarkesh podcast Leopold proposes the US making an agreement with China to slow down (/pause) after the US has a 100GW cluster and is clearly going to win the race to build AGI to buy time to get things right during the "volatile period" before AGI.

See this post

That link gives me a "Sorry, you don't have access to this draft"

I second questions 1, 5, and 6 after listening to the Dwarkesh interview.

[-]Linch2816

Slight tangent, but I'd be interested in takes/analysis by INR on the geopolitics side (though I understand it might be very difficult to get useful declassified information). I feel like many people in various clusters closeish to us (Leopold, Dario, DC EAish people, DC people in general) have quite strong (and frequently hawkish) opinions about geopolitics, without, as far as I can tell, a reliable track record to back such opinions up. I'd be interested in getting more sane analysis by people who actually have deep domain expertise + a good predictive track record.

[-]mishka2012

Specifically he brings up a memo he sent to the old OpenAI board claiming OpenAI wasn't taking security seriously enough.

Is it normal that an employee can be formally reprimanded for bringing information security concerns to members of the board? Or is this an evidence of serious pathology inside the company?

(The 15 min "What happened at OpenAI" fragment of the podcast with Dwarkesh starting at 2:31:24 is quite informative.)

I'm curious for opinions on what I think is a crux of Leopold's "Situational Awareness":

picking the many obvious low-hanging fruit on “unhobbling” gains should take us from chatbots to agents, from a tool to something that looks more like drop-in remote worker replacements.[1]

This disagrees with my own intuition - the gap between chatbot and agent seems stubbornly large. He suggests three main angles of improvement:[2]

  1. Large context windows allowing for fully "onboarding" LLMs to a job or task
  2. Increased inference-time compute allowing for building 'System 2' reasoning abilities
  3. Enabling full computer access

We already have pretty large context windows (which has been surprising to me, admittedly), but they've helped less than I expected - I mostly just don't need to move relevant code right next to my cursor as much when using Copilot. I haven't seen really powerful use cases; the closest is probably Devin, but that doesn't work very well. Using large context windows on documents does reasonably well, but LLMs are too unreliable, biased towards the generic, and memoryless to get solid benefit out of that, in my personal experience.

Put another way, I think large context windows are of pretty limited benefit when LLMs have poor working memory and can't keep properly track of what they're doing over the course of their output.

That leads into the inference-time compute argument, both the weakest and the most essential. By my understanding, the goal is to give LLMs a working memory, but how we get there seems really fuzzy. The idea presented is to produce OOMs more tokens, and keep them on-track, but "keep them on-track" part in his writing feels like merely a restatement of the problem to me. The only substantial suggestion I can see is this single line:

Perhaps a small amount of RL helps a model learn to error correct (“hm, that doesn’t look right, let me double check that”), make plans, search over possible solutions, and so on.[3]

And in a footnote on the same page he acknowledges:

Unlocking this capability will require a new kind of training, for it to learn these extra skills.

Not trivial or baked-in to current AI progress, I think? Maybe I'm misunderstanding something.

As far as for enabling full computer access - yeah multi-modal models should allow this within a few years, but it remains of limited benefit if the working memory problem isn't solved.

  1. ^

    Page 9 of the PDF.

  2. ^

    Pages 34-37 of the PDF.

  3. ^

    Page 36 of the PDF.

the inference-time compute argument, both the weakest and the most essential

I think this will be done via multi-agent architectures ("society of mind" over an LLM).

This does require plenty of calls to an LLM, so plenty of inference time compute.

For example, the current leader of https://huggingface.co/spaces/gaia-benchmark/leaderboard is this relatively simple multi-agent concoction by a Microsoft group: https://github.com/microsoft/autogen/tree/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/scenarios/GAIA/Templates/Orchestrator

I think that cutting-edge in this direction is probably non-public at this point (which makes a lot of sense).

Just skimmed the pdf. This is my first exposure to Aschenbrenner beyond "fired by OpenAI". I haven't listened to his interview with Dwarkesh yet. 

For some reason, the pdf reminds me a lot of Drexler's Engines of Creation. Of course, that was a book which argued that nanotechnology would transform everything, but which posed great perils, and shared a few ideas on how to counter those perils. Along the way it mentions that nanotechnology will lead to a great concentration of power, dubbed "the leading force", and says that the "cooperating democracies" of the world are the leading force for now, and can stay that way. 

Aschenbrenner's opus is like an accelerated version of this that focuses on AI. For Drexler. nanotechnology was still decades away. For Aschenbrenner, superintelligence is coming later this decade, and the 2030s will see a speedrun through the possibilities of science and technology, culminating in a year of chaos in which the political character of the world will be decided (since superintelligent AI will be harnessed by some political system or other). Aschenbrenner's take is that liberal democracy needs to prevail, it can do so if the US maintains its existing lead in AI, but to do so, it has to treat frontier algorithms as the top national security issue, and nationalize AI in some way or other. 

At first read, Aschenbrenner's reasoning seems logical to me in many areas. For example, I think AI nationalization is the logical thing for the US to do, given the context he describes; though I wonder if the US has enough institutional coherence to do something so forceful. (Perhaps it is more consistent with Trump's autocratic style, than with Biden's spokesperson-for-the-system demeanour.) Though the Harris brothers recently assured Joe Rogan that, as smart as Silicon Valley's best are, there are people like that scattered throughout the US government too; the hypercompetent people that @trevor has talked about. 

When Aschenbrenner said that by the end of the 2020s, there will be massive growth in electrical production (for the sake of training AIs), that made be a bit skeptical. I believe superintelligence can probably design and mass-produce transformative material technologies quickly, but I'm not sure I believe in the human economy's ability to do so. However, I haven't checked the numbers, this is just a feeling (a "vibe"?). 

I become more skeptical when Aschenbrenner says there will be millions of superintelligent agents in the world - and the political future will still be at stake. I think, once you reach that situation, humanity exists at their mercy, not vice versa... Aschenbrenner also says he's optimistic about the solvability of superalignment; which I guess makes Anthropic important, since they're now the only leading AI company that's working on it. 

As a person, Aschenbrenner seems quite impressive (what is he, 25?). Apparently there is, or was, a post on Threads beginning like this: 

I feel slightly bad for AI's latest main character, Leopold Aschenbrenner. He seems like a bright young man, which is awesome! But there are some things you can only learn with age. There are no shortcuts

I can't find the full text or original post (but I am not on Threads). It's probably just someone being a generic killjoy - "things don't turn out how you expect, kid" - but I would be interested to know the full comment, just in case it contains something important. 

[-]Linch178

As a person, Aschenbrenner seems quite impressive (what is he, 25?).

He graduated Columbia in 2021 at 19,  so I think more like 22. 

I am also on the hawkish side, and my projections of the future have a fair amount in common with Leopold's. Our recommendations for what to do are very different however. I am on team 'don't build and deploy AGI' rather than on team 'build AGI'. I don't think racing to AGI ensures the safety of liberal democracy, I believe it results in humanity's destruction. I think that even if we had AGI today, we wouldn't be able to trust it enough to use it safely without a lot more alignment work. If we trust it too much, we all die. If we don't trust it, it is not very useful as a tool to help us. Ryan and Buck's AI control theory helps somewhat with being able to use an untrustworthy AI, but that supposes that the creators will be wise enough to adhere to the careful control plan. I don't trust that they will. I think they'll screw up and unleash a demon.

There is a different path available. Seeking an international treaty. Establishing strong enforcement mechanisms, including inspections and constant monitoring of all datacenters and bio labs everywhere in the world. To ensure the safety of humanity we must prevent the development of bioweapons and also prevent the rise of rogue AGI.

If we can't come to such an arrangement peacefully with all the nations of the world, we must prepare for war. If we fail to enforce the worldwide halt on AI and bio weapons, we all die.

But I don't think that preparing for war should include racing for AGI. That is a major point where Leopold and I differ in recommendation for current and future action.

It's literally the worst possible message ever. 

  1. AGI is awesome!
  2. Free World Uber Alles Must Prevail
  3. So, US government should race to AGI