What do you think are the most important points that weren't publicly discussed before?
I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).
@ryan_greenblatt can you say more about what you expect to happen from the period in-between "AI 10Xes AI R&D" and "AI takeover is very plausible?"
I'm particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during...
I don't feel very well informed and I haven't thought about it that much, but in short timelines (e.g. my 25th percentile): I expect that we know what's going on roughly within 6 months of it happening, but this isn't salient to the broader world. So, maybe the DC tech policy staffers know that the AI people think the situation is crazy, but maybe this isn't very salient to them. A 6 month delay could be pretty fatal even for us as things might progress very rapidly.
Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it's mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.
TBC, I think it's a plausible and reasonable assumption to make. But I think this assumption ends up meaning that "the plan" excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.
Here's an alternative frame: I wo...
I would love to see a post laying this out in more detail. I found writing my post a good exercise for prioritization. Maybe writing a similar piece where governance is the main lever brings out good insights into what to prioritize in governance efforts.
At first glance, I don’t see how the point I raised is affected by the distinction between expert-level AIs vs earlier AIs.
In both cases, you could expect an important part of the story to be “what are the comparative strengths and weaknesses of this AI system.”
For example, suppose you have an AI system that dominates human experts at every single relevant domain of cognition. It still seems like there’s a big difference between “system that is 10% better at every relevant domain of cognition” and “system that is 300% better at domain X and only 10% better...
I agree comparative advantages can still important, but your comment implied a key part of the picture is "models can't do some important thing". (E.g. you talked about "The frame is less accurate in worlds where AI is really good at some things and really bad at other things." but models can't be really bad at almost anything if they strictly dominate humans at basically everything.)
And I agree that at the point AIs are >5% better at everything they might also be 1000% better at some stuff.
I was just trying to point out that talking about the number hu...
Models that don’t even cause safety problems, and aren't even goal-directedly misaligned, but that fail to live up to their potential, thus failing to provide us with the benefits we were hoping to get when we trained them. For example, sycophantic myopic reward hacking models that can’t be made to do useful research.
Would this kind of model present any risk? Could a lab just say "oh darn, this thing isn't very useful– let's turn this off and develop a new model"?
There isn't direct risk, but there could be substantial indirect risks because getting useful work out of these models could be important. The risk comes from not getting the work from this model that you needed to mitigate risks from later models.
If the AI company isn't well described as even trying to optimize for safety the analysis is less clear, but all else equal, we'd prefer to get safety critical labor from weaker systems that are less likely to be egregiously misaligned. (All else is not fully equal, as making models more useful also speeds up tim...
Do you have any suggestions RE alternative (more precise) terms? Or do you think it's more of a situation where authors should use the existing terms but make sure to define them in the context of their own work? (e.g., "In this paper, when I use the term AGI, I am referring to a system that [insert description of the capabilities of the system.])
The point I make here is also likely obvious to many, but I wonder if the "X human equivalents" frame often implicitly assumes that GPT-N will be like having X humans. But if we expect AIs to have comparative advantages (and disadvantages), then this picture might miss some important factors.
The "human equivalents" frame seems most accurate in worlds where the capability profile of an AI looks pretty similar to the capability profile of humans. That is, getting GPT-6 to do AI R&D is basically "the same as" getting X humans to do AI R&D. It thinks i...
I'm glad you're doing this, and I support many of the ideas already suggested. Some additional ideas:
I think this a helpful overview post. It outlines challenges to a mainstream plan (bootstrapped alignment) and offers a few case studies of how entities in other fields handle complex organizational challenges.
I'd be excited to see more follow-up research on organizational design and organizational culture. This work might be especially useful for helping folks think about various AI policy proposals.
For example, it seems plausible that at some point the US government will view certain kinds of AI systems as critical national security assets. At that...
@Nikola Jurkovic I'd be interested in timeline estimates for something along the lines of "AI that substantially increases AI R&D". Not exactly sure what the right way to operationalize this is, but something that says "if there is a period of Rapid AI-enabled AI Progress, this is when we think it would occur."
(I don't really love the "95% of fully remote jobs could be automated frame", partially because I don't think it captures many of the specific domains we care about (e.g., AI-enabled R&D, other natsec-relevant capabilities) and partly because...
@elifland what do you think is the strongest argument for long(er) timelines? Do you think it's essentially just "it takes a long time for researchers learn how to cross the gaps"?
Or do you think there's an entirely different frame (something that's in an ontology that just looks very different from the one presented in the "benchmarks + gaps argument"?)
I think it depends on whether or not the new paradigm is "training and inference" or "inference [on a substantially weaker/cheaper foundation model] is all you need." My impression so far is that it's more likely to be the former (but people should chime in).
If I were trying to have the most powerful model in 2027, it's not like I would stop scaling. I would still be interested in using a $1B+ training run to make a more powerful foundation model and then pouring a bunch of inference into that model.
But OK, suppose I need to pause after my $1B+ training ru...
I'm pleased with this dialogue and glad I did it. Outreach to policymakers is an important & complicated topic. No single post will be able to explain all the nuances, but I think this post explains a lot, and I still think it's a useful resource for people interested in engaging with policymakers.
A lot has changed since this dialogue, and I've also learned a lot since then. Here are a few examples:
It's so much better if everyone in the company can walk around and tell you what are the top goals of the RSP, how do we know if we're meeting them, what AI safety level are we at right now—are we at ASL-2, are we at ASL-3—that people know what to look for because that is how you're going to have good common knowledge of if something's going wrong.
I like this goal a lot: Good RSPs could contribute to building common language/awareness around several topics (e.g., "if" conditions, "then" commitments, how safety decisions will be handled). As many have point...
I found the description of warning fatigue interesting. Do you have takes on the warning fatigue concern?
...Warning Fatigue
The playbook for politicians trying to avoid scandals is to release everything piecemeal. You want something like:
- Rumor Says Politician Involved In Impropriety. Whatever, this is barely a headline, tell me when we know what he did.
- Recent Rumor Revealed To Be About Possible Affair. Well, okay, but it’s still a rumor, there’s no evidence.
- New Documents Lend Credence To Affair Rumor. Okay, fine, but we’re not sure those documents are true.
- Pol
@Zvi I'm curious if you have thoughts on Buck's post here (and my comment here) about how empirical evidence of scheming might not cause people to update toward thinking scheming is a legitimate/scary threat model (unless they already had priors or theoretical context that made them concerned about this.)
Do you think this is true? And if so, what implications do you think this has for people who are trying to help the world better understand misalignment/scheming?
I really like the idea of soliciting independent reviews from folks, and I found the reviews interesting and thought-provoking. EG, this part of Jacob Andreas's review stood out (some bolding added):
...Fine-tuning situationally aware models via RL: These are, in my mind, the most surprising findings in the paper. Specifically: if we take a model prompted as above, and optimize it toward the new objective via RL, it exhibits small increases in non-faked compliance, and extreme increases in faked compliance, at training time. Non-compliant behavior is almost co
How well LLMs follow which decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control.
Do you have any thoughts on "red lines" for AI collusion? That is, "if an AI could do X, then we should acknowledge that AIs can likely collude with each other in monitoring setups."
I believe about Sacks views comes from regularly listening to the All-In Podcast where he regularly talks about AI.
Do you have any quotes or any particular podcast episodes you recommend?
if you would ask the Department of Homeland security for their justification there's a good chance that they would say "national security".
Yeah, I agree that one needs to have a pretty narrow conception of national security. In the absence of that, there's concept creep in which you can justify pretty much anything under a broad conception of national security. (Indeed, I ...
Can you quote (or link to) things Sacks has said that give you this impression?
My own impression is that there are many AI policy ideas that don't have anything to do with censorship (e.g., improving government technical capacity, transparency into frontier AI development, emergency preparedness efforts, efforts to increase government "situational awareness", research into HEMs and verification methods). Also things like "an AI model should not output bioweapons or other things that threaten national security" are "censorship" under some very narrow defini...
While a centralized project would get more resources, it also has more ability to pause//investigate things.
So EG if the centralized project researchers see something concerning (perhaps early warning signs of scheming), it seems more able to do a bunch of research on that Concerning Thing before advancing.
I think makes the effect of centralization on timelines unclear, at least if one expects these kinds of “warning signs” on the pathway to very advanced capabilities.
(It’s plausible that you could get something like this from a decentralized model with su...
Out of curiosity, what’s the rationale for not having agree/disagree votes on posts? (I feel like pretty much everyone thinks it has been a great feature for comments!)
Thank you for the thoughtful response! It'll push me to move from "amateur spectator" mode to "someone who has actually read all of these posts" mode :)
Optional q: What would you say are the 2-3 most important contributions that have been made in AI control research over the last year?
A few quick thoughts on control (loosely held; I consider myself an "amateur spectator" on this topic and suspect a lot of my opinions may change if I spent 100+ hours engaging with this work):
I think the AI control discourse would improve if it focused more on discussing concrete control strategies (and their limitations) and focused less on the high-level idea that “AI control is super important and a bedrock approach for mitigating catastrophic risk.” I sometimes feel like the AI control discourse is relatively shallow and just cites the idea/meme of AI control instead of making concrete empirical, methodological, or strategic contributions.
I quite strongly disagree: I think discussions of AI control have had way more concrete empirical, meth...
Regulation to reduce racing. Government regulation could temper racing between multiple western projects. So there are ways to reduce racing between western projects, besides centralising.
Can you say more about the kinds of regulations you're envisioning? What are your favorite ideas for regulations for (a) the current Overton Window and (b) a wider Overton Window but one that still has some constraints?
I disagree with some of the claims made here, and I think there several worldview assumptions that go into a lot of these claims. Examples include things like "what do we expect the trajectory to ASI to look like", "how much should we worry about AI takeover risks", "what happens if a single actor ends up controlling [aligned] ASI", "what kinds of regulations can we reasonably expect absent some sort of centralized USG project", and "how much do we expect companies to race to the top on safety absent meaningful USG involvement." (TBC though I don't think i...
What do you think are the biggest mistakes you/LightCone have made in the last ~2 years?
And what do you think a 90th percentile outcome looks like for you/LightCone in 2025? Would would success look like?
(Asking out of pure curiosity– I'd have these questions even if LC wasn't fundraising. I figured this was a relevant place to ask, but feel free to ignore it if it's not in the spirit of the post.)
I worry that cos this hasn't received a reply in a bit, people might think it's not in the spirit of the post. I'm even more worried people might think that critical comments aren't in the spirit of the post.
Both critical comments and high-effort-demanding questions are in the spirit of the post, IMO! But the latter might take awhile to get a response
Thanks for spelling it out. I agree that more people should think about these scenarios. I could see something like this triggering central international coordination (or conflict).
(I still don't think this would trigger the USG to take different actions in the near-term, except perhaps "try to be more secret about AGI development" and maybe "commission someone to do some sort of study or analysis on how we would handle these kinds of dynamics & what sorts of international proposals would advance US interests while preventing major conflict." The second thing is a bit optimistic but maybe plausible.)
What kinds of conflicts are you envisioning?
I think if the argument is something along the lines of "maybe at some point other countries will demand that the US stop AI progress", then from the perspective of the USG, I think it's sensible to operate under the perspective of "OK so we need to advance AI progress as much as possible and try to hide some of it, and if at some future time other countries are threatening us we need to figure out how to respond." But I don't think it justifies anything like "we should pause or start initiating international agr...
@davekasten I know you posed this question to us, but I'll throw it back on you :) what's your best-guess answer?
Or perhaps put differently: What do you think are the factors that typically influence whether the cautious folks or the non-cautious folks end up in charge? Are there any historical or recent examples of these camps fighting for power over an important operation?
it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project
Can you say more about what has contributed to this update?
Can you say more about scenarios where you envision a later project happening that has different motivations?
I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons:
If you could only have "partial visibility", what are some of the things you would most want the government to be able to know?
Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).
If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.
This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps "somewhat hard but not intractably hard").
What do you think are the most important factors for determining if it results in them behaving responsibly later?
For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG "behaving more responsibly later?"
Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the "subsidy model", but they were willing to ask for certain concessions from industry.
Are there any concessions/arrangements that you would advocate for? Are there any ways to do the "subsidy model" well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?
My own impression is that this would be an improvement over the status quo. Main reasons:
As you know, I have huge respect for USG natsec folks. But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the "fuck you, I'm doing Iran-Contra" folks. Which do you expect will get in control of such a program ? It's not immediately clear to me which ones would.
@davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)
Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)
Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?
My own impression is that this would be an improvement over the status quo. Main reasons:
If the project was fueled by a desire to beat China, the structure of the Manhattan project seems unlikely to resemble the parts of the structure of the Manhattan project that seemed maybe advantageous here, like having a single government-controlled centralized R&D effort.
My guess is if something like this actually happens, it would involve a large number of industry subsidies, and would create strong institutional momentum that even when things got dangerous, to push the state of the art forward, and in as much as there is pushback, continue dangerou...
We're not going to be bottlenecked by politicians not caring about AI safety. As AI gets crazier and crazier everyone would want to do AI safety, and the question is guiding people to the right AI safety policies
I think we're seeing more interest in AI, but I think interest in "AI in general" and "AI through the lens of great power competition with China" has vastly outpaced interest in "AI safety". (Especially if we're using a narrow definition of AI safety; note that people in DC often use the term "AI safety" to refer to a much broader set of concerns t...
A few quotes that stood out to me:
Greg:
I hope for us to enter the field as a neutral group, looking to collaborate widely and shift the dialog towards being about humanity winning rather than any particular group or company.
Greg and Ilya (to Elon):
...The goal of OpenAI is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where you could become a dictator if you chose to, especially given that we can create some other structure that
Do we know anything about why they were concerned about an AGI dictatorship created by Demis?
Also this:
From Altman: [...] Admitted that he lost a lot of trust with Greg and Ilya through this process. Felt their messaging was inconsistent and felt childish at times. [...] Sam was bothered by how much Greg and Ilya keep the whole team in the loop with happenings as the process unfolded. Felt like it distracted the team.
Apparently airing such concerns is "childish" and should only be done behind closed doors, otherwise it "distracts the team", hm.
and recently founded another AI company
Potentially a hot take, but I feel like xAI's contributions to race dynamics (at least thus far) have been relatively trivial. I am usually skeptical of the whole "I need to start an AI company to have a seat at the table", but I do imagine that Elon owning an AI company strengthens his voice. And I think his AI-related comms have mostly been used to (a) raise awareness about AI risk, (b) raise concerns about OpenAI/Altman, and (c) endorse SB1047 [which he did even faster and less ambiguously than Anthropic].
The count...
I'm sympathetic to Musk being genuinely worried about AI safety. My problem is that one of his first actions after learning about AI safety was to found OpenAI, and that hasn't worked out very well. Not just due to Altman; even the "Open" part was a highly questionable goal. Hopefully Musk's future actions in this area would have positive EV, but still.
I agree with many points here and have been excited about AE Studio's outreach. Quick thoughts on China/international AI governance:
I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.
This inspired me to write a silly dialogue.
Simplicio enters. An engine rumbles like the thunder of the gods, as Sophistico focuses on ensuring his MAGMA-O1 racecar will go as fast as possible.
Simplicio: "You shouldn't play Chicken."
Sophistico: "Why not?"
Simplic...
Did they have any points that you found especially helpful, surprising, or interesting? Anything you think folks in AI policy might not be thinking enough about?
(Separately, I hope to listen to these at some point & send reactions if I have any.)
you have to spend several years resume-building before painstakingly convincing people you're worth hiring for paid work
For government roles, I think "years of experience" is definitely an important factor. But I don't think you need to have been specializing for government roles specifically.
Especially for AI policy, there are several programs that are basically like "hey, if you have AI expertise but no background in policy, we want your help." To be clear, these are often still fairly competitive, but I think it's much more about being generally capable/competent and less about having optimized your resume for policy roles.
I like the section where you list out specific things you think people should do. (One objection I sometimes hear is something like "I know that [evals/RSPs/if-then plans/misc] are not sufficient, but I just don't really know what else there is to do. It feels like you either have to commit to something tangible that doesn't solve the whole problem or you just get lost in a depressed doom spiral.")
I think your section on suggestions could be stronger by presenting more ambitious/impactful stories of comms/advocacy. I think there's something tricky about a ...
Thanks Akash for this substantive reply!
Last minute we ended up cutting a section at the end of the document called "how does this [referring to civics, communications, coordination] all add up to preventing extinction?" It was an attempt to address the thing you're pointing at here:
...I think there's something tricky about a document that has the vibe "this is the most important issue in the world and pretty much everyone else is approaching it the wrong way" and then pivots to "and the right way to approach it is to post on Twitter and talk to your friends.
I think there's something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It's higher status and sexier and there's a default vibe that the best way to understand/improve the world is through rigorous empirical research.
I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy.
I also think there are me...
One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there's nothing else we can do. There's not a better alternative– these are the only things that labs and governments are currently willing to support.
The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:
...
- Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct
I appreciated their section on AI governance. The "if-then"/RSP/preparedness frame has become popular, and they directly argue for why they oppose this direction. (I'm a fan of preparedness efforts– especially on the government level– but I think it's worth engaging with the counterarguments.)
Pasting some content from their piece below.
High-level thesis against current AI governance efforts:
...The majority of existing AI safety efforts are reactive rather than proactive, which inherently puts humanity in the position of managing risk rather than controlling A
I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”
Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).
That’s not to say forecasting is always unhelpfu... (read more)