All of Orpheus16's Comments + Replies

I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”

Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).

That’s not to say forecasting is always unhelpfu... (read more)

Orpheus16170

What do you think are the most important points that weren't publicly discussed before?

Orpheus16100

I expect that outcomes like “AIs are capable enough to automate virtually all remote workers” and “the AIs are capable enough that immediate AI takeover is very plausible (in the absence of countermeasures)” come shortly after (median 1.5 years and 2 years after respectively under my views).

@ryan_greenblatt can you say more about what you expect to happen from the period in-between "AI 10Xes AI R&D" and "AI takeover is very plausible?"

I'm particularly interested in getting a sense of what sorts of things will be visible to the USG and the public during... (read more)

6davekasten
Note that the production function of the 10x really matters.  If it's "yeah, we get to net-10x if we have all our staff working alongside it," it's much more detectable than, "well, if we only let like 5 carefully-vetted staff in a SCIF know about it, we only get to 8.5x speedup".   (It's hard to prove that the results are from the speedup instead of just, like, "One day, Dario woke up from a dream with The Next Architecture in his head")

I don't feel very well informed and I haven't thought about it that much, but in short timelines (e.g. my 25th percentile): I expect that we know what's going on roughly within 6 months of it happening, but this isn't salient to the broader world. So, maybe the DC tech policy staffers know that the AI people think the situation is crazy, but maybe this isn't very salient to them. A 6 month delay could be pretty fatal even for us as things might progress very rapidly.

Orpheus16Ω276021

Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it's mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.

TBC, I think it's a plausible and reasonable assumption to make. But I think this assumption ends up meaning that "the plan" excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.

Here's an alternative frame: I wo... (read more)

6MiloSal
Akash, your comment raises the good point that a short-timelines plan that doesn't realize governments are a really important lever here is missing a lot of opportunities for safety. Another piece of the puzzle that comes out when you consider what governance measures we'd want to include in the short timelines plan is the "off-ramps problem" that's sort of touched on in this post.  Basically, our short timelines plan needs to also include measures (mostly governance/policy, though also technical) that get us to a desirable off-ramp from geopolitical tensions brought about by the economic and military transformation resulting from AGI/ASI. I don't think there are good off-ramps that do not route through governments. This is one reason to include more government-focused outreach/measures in our plans.
6otto.barten
I think that if government involvement suddenly increases, there will also be a window of opportunity to get an AI safety treaty passed. I feel a government-focused plan should include pushing for this. (I think heightened public xrisk awareness is also likely in such a scenario, making the treaty more achievable. I also think heightened awareness in both govt and public will make short treaty timelines (a year to weeks), at least between the US and China, realistic.) Our treaty proposal (a few other good ones exist): https://time.com/7171432/conditional-ai-safety-treaty-trump/ Also, I think end games should be made explicit: what are we going to do once we have aligned ASI? I think that's both true for Marius' plan, and for a government-focused plan with a Manhattan or CERN included in it.
Marius HobbhahnΩ71413

I would love to see a post laying this out in more detail. I found writing my post a good exercise for prioritization. Maybe writing a similar piece where governance is the main lever brings out good insights into what to prioritize in governance efforts.

At first glance, I don’t see how the point I raised is affected by the distinction between expert-level AIs vs earlier AIs.

In both cases, you could expect an important part of the story to be “what are the comparative strengths and weaknesses of this AI system.”

For example, suppose you have an AI system that dominates human experts at every single relevant domain of cognition. It still seems like there’s a big difference between “system that is 10% better at every relevant domain of cognition” and “system that is 300% better at domain X and only 10% better... (read more)

I agree comparative advantages can still important, but your comment implied a key part of the picture is "models can't do some important thing". (E.g. you talked about "The frame is less accurate in worlds where AI is really good at some things and really bad at other things." but models can't be really bad at almost anything if they strictly dominate humans at basically everything.)

And I agree that at the point AIs are >5% better at everything they might also be 1000% better at some stuff.

I was just trying to point out that talking about the number hu... (read more)

Models that don’t even cause safety problems, and aren't even goal-directedly misaligned, but that fail to live up to their potential, thus failing to provide us with the benefits we were hoping to get when we trained them. For example, sycophantic myopic reward hacking models that can’t be made to do useful research.

Would this kind of model present any risk? Could a lab just say "oh darn, this thing isn't very useful– let's turn this off and develop a new model"?

There isn't direct risk, but there could be substantial indirect risks because getting useful work out of these models could be important. The risk comes from not getting the work from this model that you needed to mitigate risks from later models.

If the AI company isn't well described as even trying to optimize for safety the analysis is less clear, but all else equal, we'd prefer to get safety critical labor from weaker systems that are less likely to be egregiously misaligned. (All else is not fully equal, as making models more useful also speeds up tim... (read more)

Orpheus16141

Do you have any suggestions RE alternative (more precise) terms? Or do you think it's more of a situation where authors should use the existing terms but make sure to define them in the context of their own work? (e.g., "In this paper, when I use the term AGI, I am referring to a system that [insert description of the capabilities of the system.])

The point I make here is also likely obvious to many, but I wonder if the "X human equivalents" frame often implicitly assumes that GPT-N will be like having X humans. But if we expect AIs to have comparative advantages (and disadvantages), then this picture might miss some important factors.

The "human equivalents" frame seems most accurate in worlds where the capability profile of an AI looks pretty similar to the capability profile of humans. That is, getting GPT-6 to do AI R&D is basically "the same as" getting X humans to do AI R&D. It thinks i... (read more)

7ryan_greenblatt
In the top level comment, I was just talking about AI systems which are (at least) as capable as top human experts. (I was trying to point at the notion of Top-human-Expert-Dominating AI that I define in this post, though without a speed/cost constraint, but I think I was a bit sloppy in my language. I edited the comment a bit to better communicate this.) So, in this context, human (at least) equivalents does make sense (as in, because the question is the cost of AIs that can strictly dominate top human experts so we can talk about the amount of compute needed to automate away one expert/researcher on average), but I agree that for earlier AIs it doesn't (necessarily) make sense and plausibly these earlier AIs are very key for understanding the risk (because e.g. they will radically accelerate AI R&D without necessarily accelerating other domain).
Orpheus16235

I'm glad you're doing this, and I support many of the ideas already suggested. Some additional ideas:

  • Interview program. Work with USAISI or UKAISI (or DHS/NSA) to pilot an interview program in which officials can ask questions about AI capabilities, safety and security threats, and national security concerns. (If it's not feasible to do this with a government entity yet, start a pilot with a non-government group– perhaps METR, Apollo, Palisade, or the new AI Futures Project.)
  • Clear communication about RSP capability thresholds. I think the RSP could do a be
... (read more)

I think this a helpful overview post. It outlines challenges to a mainstream plan (bootstrapped alignment) and offers a few case studies of how entities in other fields handle complex organizational challenges. 

I'd be excited to see more follow-up research on organizational design and organizational culture. This work might be especially useful for helping folks think about various AI policy proposals.

For example, it seems plausible that at some point the US government will view certain kinds of AI systems as critical national security assets. At that... (read more)

@Nikola Jurkovic I'd be interested in timeline estimates for something along the lines of "AI that substantially increases AI R&D". Not exactly sure what the right way to operationalize this is, but something that says "if there is a period of Rapid AI-enabled AI Progress, this is when we think it would occur."

(I don't really love the "95% of fully remote jobs could be automated frame", partially because I don't think it captures many of the specific domains we care about (e.g., AI-enabled R&D, other natsec-relevant capabilities) and partly because... (read more)

@elifland what do you think is the strongest argument for long(er) timelines? Do you think it's essentially just "it takes a long time for researchers learn how to cross the gaps"? 

Or do you think there's an entirely different frame (something that's in an ontology that just looks very different from the one presented in the "benchmarks + gaps argument"?)

6elifland
A few possible categories of situations we might have long timelines, off the top of my head: 1. Benchmarks + gaps is still best: overall gap is somewhat larger + slowdown in compute doubling time after 2028, but trend extrapolations still tell us something about gap trends: This is how I would most naturally think about how timelines through maybe the 2030s are achieved, and potentially beyond if neither of the next hold. 2. Others are best (more than of one of these can be true): 1. The current benchmarks and evaluations are so far away from AGI that trends on them don't tell us anything (including regarding how fast gaps might be crossed). In this case one might want to identify the 1-2 most important gaps and reason about when we will cross these based on gears-level reasoning or trend extrapolation/forecasting on "real-world" data (e.g. revenue?) rather than trend extrapolation on benchmarks. Example candidate "gaps" that I often hear for these sorts of cases are the lack of feedback loops and the "long-tail of tasks" / reliability. 2. A paradigm shift in AGI training is needed and benchmark trends don't tell us much about when we will achieve this (this is basically Steven's sibling comment): in this case the best analysis might involve looking at the base rate of paradigm shifts per research effort, and/or looking at specific possible shifts. ^ this taxonomy is not comprehensive, just things I came up with quickly. Might be missing something that would be good. To cop out answer your question, I feel like if I were making a long-timelines argument I'd argue that all 3 of those would be ways of forecasting to give weight to, then aggregate. If I had to choose just one I'd probably still go with (1) though. edit: oh there's also the "defer to AI experts" argument. I mostly try not to think about deference-based arguments because thinking on the object-level is more productive, though I think if I were really trying to make an all-things-considered
Orpheus16244

I think it depends on whether or not the new paradigm is "training and inference" or "inference [on a substantially weaker/cheaper foundation model] is all you need." My impression so far is that it's more likely to be the former (but people should chime in).

If I were trying to have the most powerful model in 2027, it's not like I would stop scaling. I would still be interested in using a $1B+ training run to make a more powerful foundation model and then pouring a bunch of inference into that model.

But OK, suppose I need to pause after my $1B+ training ru... (read more)

5Josh You
By several reports, (e.g. here and here) OpenAI is throwing enormous amounts of training compute at o-series models. And if the new RL paradigm involves more decentralized training compute than the pretraining paradigm, that could lead to more consolidation into a few players, not less, because pretraining* is bottlenecked by the size of the largest cluster. E.g. OpenAI's biggest single compute cluster is similar in size to xAI's, even though OpenAI has access to much more compute overall. But if it's just about who has the most compute then the biggest players will win. *though pretraining will probably shift to distributed training eventually

I'm pleased with this dialogue and glad I did it. Outreach to policymakers is an important & complicated topic. No single post will be able to explain all the nuances, but I think this post explains a lot, and I still think it's a useful resource for people interested in engaging with policymakers.

A lot has changed since this dialogue, and I've also learned a lot since then. Here are a few examples:

  • I think it's no longer as useful to emphasize "AI is a big deal for national/global security." This is now pretty well-established.
    • Instead, I would encourag
... (read more)

It's so much better if everyone in the company can walk around and tell you what are the top goals of the RSP, how do we know if we're meeting them, what AI safety level are we at right now—are we at ASL-2, are we at ASL-3—that people know what to look for because that is how you're going to have good common knowledge of if something's going wrong.

I like this goal a lot: Good RSPs could contribute to building common language/awareness around several topics (e.g., "if" conditions, "then" commitments, how safety decisions will be handled). As many have point... (read more)

Orpheus16111

I found the description of warning fatigue interesting. Do you have takes on the warning fatigue concern?

Warning Fatigue

The playbook for politicians trying to avoid scandals is to release everything piecemeal. You want something like:

  • Rumor Says Politician Involved In Impropriety. Whatever, this is barely a headline, tell me when we know what he did.
  • Recent Rumor Revealed To Be About Possible Affair. Well, okay, but it’s still a rumor, there’s no evidence.
  • New Documents Lend Credence To Affair Rumor. Okay, fine, but we’re not sure those documents are true.
  • Pol
... (read more)
4ryan_greenblatt
I don't currently have a strong view. I also found it interesting. It updated me a bit toward working on other types of projects.
Orpheus16100

@Zvi I'm curious if you have thoughts on Buck's post here (and my comment here) about how empirical evidence of scheming might not cause people to update toward thinking scheming is a legitimate/scary threat model (unless they already had priors or theoretical context that made them concerned about this.)

Do you think this is true? And if so, what implications do you think this has for people who are trying to help the world better understand misalignment/scheming?

Orpheus16Ω11279

I really like the idea of soliciting independent reviews from folks, and I found the reviews interesting and thought-provoking. EG, this part of Jacob Andreas's review stood out (some bolding added):

Fine-tuning situationally aware models via RL: These are, in my mind, the most surprising findings in the paper. Specifically: if we take a model prompted as above, and optimize it toward the new objective via RL, it exhibits small increases in non-faked compliance, and extreme increases in faked compliance, at training time. Non-compliant behavior is almost co

... (read more)
0Zvi
My interpretation/hunch of this is that there are two things going on, curious if others see it this way: 1. It is learning to fake the trainer's desired answer. 2. It is learning to actually give the trainer's desired answer. So during training, it learns to fake a lot more, and will often decide to fake the desired answer, even though it would have otherwise decided to give the desired answer anyway. It's 'lying with the truth' and perhaps giving a different variation of the desired answer than it would have given otherwise or perhaps not. The algorithm in training is learning to be mostly preferences-agnostic, password-guessing behavior.
7evhub
We have some discussion of this in the paper (much of which we added thanks to Jacob's review):

How well LLMs follow which decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control.

Do you have any thoughts on "red lines" for AI collusion? That is, "if an AI could do X, then we should acknowledge that AIs can likely collude with each other in monitoring setups."

I believe about Sacks views comes from regularly listening to the All-In Podcast where he regularly talks about AI.

Do you have any quotes or any particular podcast episodes you recommend?

if you would ask the Department of Homeland security for their justification there's a good chance that they would say "national security".

Yeah, I agree that one needs to have a pretty narrow conception of national security. In the absence of that, there's concept creep in which you can justify pretty much anything under a broad conception of national security. (Indeed, I ... (read more)

2ChristianKl
I don't have specific recommendations for the past. I would expect a section in the next All-In Podcast in which David Sachs participates to law out his views a bit. That's the question you would ask if you  think the person who's drawing the line is aligned. If you think the people speaking about national security and using that to further different political and geopolitical ends are not aligned, it's not the most interesting question.  It sounds to me like you are taking this as an abstract policy issue while ignoring the real-world censorship industrial complex. It's like discussing union policy in the 1970s and 1980s in New York without taking into account that a lot of strikes are because someone failed to pay the Mafia. If you don't know what the censorship industrial complex is, Joe Rogan had a good interview with Mike Benz, who is a former official with the U.S. Department of State and current Executive Director of the Foundation For Freedom Online.

Can you quote (or link to) things Sacks has said that give you this impression?

My own impression is that there are many AI policy ideas that don't have anything to do with censorship (e.g., improving government technical capacity, transparency into frontier AI development, emergency preparedness efforts, efforts to increase government "situational awareness", research into HEMs and verification methods). Also things like "an AI model should not output bioweapons or other things that threaten national security" are "censorship" under some very narrow defini... (read more)

4ChristianKl
I believe about Sacks views comes from regularly listening to the All-In Podcast where he regularly talks about AI. Sacks is smarter and more sophisticated than that.  In the real world, efforts of the Department of Homeland security that started with censoring for reasons of national security ended up increasing the scope of what they censor. In the end the lab leak theory got censored and if you would ask the Department of Homeland security for their justification there's a good chance that they would say "national security".

While a centralized project would get more resources, it also has more ability to pause//investigate things.

So EG if the centralized project researchers see something concerning (perhaps early warning signs of scheming), it seems more able to do a bunch of research on that Concerning Thing before advancing.

I think makes the effect of centralization on timelines unclear, at least if one expects these kinds of “warning signs” on the pathway to very advanced capabilities.

(It’s plausible that you could get something like this from a decentralized model with su... (read more)

4Rohin Shah
Tbc, I don't want to strongly claim that centralization implies shorter timelines. Besides the point you raise there's also things like bureaucracy and diseconomies of scale. I'm just trying to figure out what the authors of the post were saying. That said, if I had to guess, I'd guess that centralization speeds up timelines.

Out of curiosity, what’s the rationale for not having agree/disagree votes on posts? (I feel like pretty much everyone thinks it has been a great feature for comments!)

4habryka
I explained it a bit here: https://www.lesswrong.com/posts/fjfWrKhEawwBGCTGs/a-simple-case-for-extreme-inner-misalignment?commentId=tXPrvXihTwp2hKYME 

Thank you for the thoughtful response! It'll push me to move from "amateur spectator" mode to "someone who has actually read all of these posts" mode :)

Optional q: What would you say are the 2-3 most important contributions that have been made in AI control research over the last year? 

Orpheus16Ω34-19

A few quick thoughts on control (loosely held; I consider myself an "amateur spectator" on this topic and suspect a lot of my opinions may change if I spent 100+ hours engaging with this work):

  • I think it's a clever technical approach and part of me likes that it's become "a thing."
  • I'm worried that people have sort of "bandwagonned" onto it because (a) there are some high-status people who are involved, (b) there's an interest in finding things for technical researchers to do, and (c) it appeals to AGI companies.
  • I think the AI control discourse would improv
... (read more)
BuckΩ142525

I think the AI control discourse would improve if it focused more on discussing concrete control strategies (and their limitations) and focused less on the high-level idea that “AI control is super important and a bedrock approach for mitigating catastrophic risk.” I sometimes feel like the AI control discourse is relatively shallow and just cites the idea/meme of AI control instead of making concrete empirical, methodological, or strategic contributions.

I quite strongly disagree: I think discussions of AI control have had way more concrete empirical, meth... (read more)

Orpheus163-1

Regulation to reduce racing. Government regulation could temper racing between multiple western projects. So there are ways to reduce racing between western projects, besides centralising.

Can you say more about the kinds of regulations you're envisioning? What are your favorite ideas for regulations for (a) the current Overton Window and (b) a wider Overton Window but one that still has some constraints?

1rosehadshar
This is a good question and I haven't thought much about it. (Tom might have better ideas.) My quick takes: - The usual stuff: compute thresholds, eval requirements, transparency, infosec requirements, maybe licensing beyond a certain risk threshold - Maybe important to mandate use of certain types of hardware, if we get verification mechanisms which enable agreements with China

I disagree with some of the claims made here, and I think there several worldview assumptions that go into a lot of these claims. Examples include things like "what do we expect the trajectory to ASI to look like", "how much should we worry about AI takeover risks", "what happens if a single actor ends up controlling [aligned] ASI", "what kinds of regulations can we reasonably expect absent some sort of centralized USG project", and "how much do we expect companies to race to the top on safety absent meaningful USG involvement." (TBC though I don't think i... (read more)

Orpheus162515

What do you think are the biggest mistakes you/LightCone have made in the last ~2 years?

And what do you think a 90th percentile outcome looks like for you/LightCone in 2025? Would would success look like?

(Asking out of pure curiosity– I'd have these questions even if LC wasn't fundraising. I figured this was a relevant place to ask, but feel free to ignore it if it's not in the spirit of the post.)

kave*130

I worry that cos this hasn't received a reply in a bit, people might think it's not in the spirit of the post. I'm even more worried people might think that critical comments aren't in the spirit of the post.

Both critical comments and high-effort-demanding questions are in the spirit of the post, IMO! But the latter might take awhile to get a response

Thanks for spelling it out. I agree that more people should think about these scenarios. I could see something like this triggering central international coordination (or conflict).

(I still don't think this would trigger the USG to take different actions in the near-term, except perhaps "try to be more secret about AGI development" and maybe "commission someone to do some sort of study or analysis on how we would handle these kinds of dynamics & what sorts of international proposals would advance US interests while preventing major conflict." The second thing is a bit optimistic but maybe plausible.)

What kinds of conflicts are you envisioning?

I think if the argument is something along the lines of "maybe at some point other countries will demand that the US stop AI progress", then from the perspective of the USG, I think it's sensible to operate under the perspective of "OK so we need to advance AI progress as much as possible and try to hide some of it, and if at some future time other countries are threatening us we need to figure out how to respond." But I don't think it justifies anything like "we should pause or start initiating international agr... (read more)

4Bogdan Ionut Cirstea
I'm envisioning something like: scary powerful capabilities/demos/accidents leading to various/a coalition of other countries asking the US (and/or China) not to build any additional/larger data centers (and/or run any larger training runs), and, if they're scared enough, potentially even threatening various (escalatory) measures, including economic sanctions, blockading the supply of compute/prerequisites to compute, sabotage, direct military strikes on the data centers, etc. I'm far from an expert on the topic, but I suspect it might not be trivial to hide at least building a lot more new data centers/supplying a lot more compute, if a significant chunk of the rest of the world was watching very intently. I'm envisioning a very near-casted scenario, on very short (e.g. Daniel Kokotajlo-cluster) timelines, egregious misalignment quite unlikely but not impossible, slow-ish (couple of years) takeoff (by default, if no deliberate pause), pretty multipolar, but with more-obviously-close-to-scary capabilities, like ML R&D automation evals starting to fall.

@davekasten I know you posed this question to us, but I'll throw it back on you :) what's your best-guess answer?

Or perhaps put differently: What do you think are the factors that typically influence whether the cautious folks or the non-cautious folks end up in charge? Are there any historical or recent examples of these camps fighting for power over an important operation?

it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project

Can you say more about what has contributed to this update?

Can you say more about scenarios where you envision a later project happening that has different motivations?

I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons:

  • A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change.
  • The longer we wait, the more capable the "most capable model that wasn't secured" is. So we could risk getting into a scenario wh
... (read more)

If you could only have "partial visibility", what are some of the things you would most want the government to be able to know?

2Nathan Helm-Burger
I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access. Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn't give away corporate algorithmic secrets. A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can't have a good sense of a what the model would be capable of in the hands of bad actors if you don't test fine-tuning it on hazardous info.

Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power). 

If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.

This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps "somewhat hard but not intractably hard").

What do you think are the most important factors for determining if it results in them behaving responsibly later? 

For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG "behaving more responsibly later?"

Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the "subsidy model", but they were willing to ask for certain concessions from industry.

Are there any concessions/arrangements that you would advocate for? Are there any ways to do the "subsidy model" well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?

8habryka
I think "full visibility" seems like the obvious thing to ask for, and something that could maybe improve things. Also, preventing you from selling your products to the public, and basically forcing you to sell your most powerful models only to the government, gives the government more ability to stop things when it comes to it.  I will think more about this, I don't have any immediate great ideas.
Orpheus1633-4

My own impression is that this would be an improvement over the status quo. Main reasons:

  • A lot of my P(doom) comes from race dynamics.
  • Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can't do much to end the race. Their main strategy would be to go to the USG.
  • If the USG runs the Manhattan Project (or there's some sort of soft nationalization in which the government ends up having a much stronger role), it's much easier for the USG to see that misalignment risks are concerning & to do something about it.
... (read more)

As you know, I have huge respect for USG natsec folks.  But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the "fuck you, I'm doing Iran-Contra" folks.  Which do you expect will get in control of such a program ?  It's not immediately clear to me which ones would.

3O O
Why is the built-in assumption for almost every single post on this site that alignment is impossible and we need a 100 year international ban to survive? This does not seem particularly intellectually honest to me. It is very possible no international agreement is needed. Alignment may turn out to be quite tractable.

@davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)

6Daniel Kokotajlo
(c). Like if this actually results in them behaving responsibly later, then it was all worth it.
Orpheus16390

Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)

Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?

1MiloSal
I'm fairly confident that this would be better than the current situation, and primarily because of something that others haven't touched on here. The reason is that, regardless of who develops them, the first (militarily and economically) transformative AIs will cause extreme geopolitical tension and instability that is challenging to resolve safely. Resolving such a situation safely requires a well-planned off-ramp, which must route through extremely major national- or international-level decisions. Only governments are equipped to make decisions like these; private AGI companies certainly are not. Therefore, unless development is at some point centralized in a USG project, there is no way to avoid the many paths to catastrophe that threaten the world during the period of extreme tension coinciding with AGI/ASI development.
5Joe Collman
Some thoughts: * The correct answer is clearly (c) - it depends on a bunch of factors. * My current guess is that it would make things worse (given likely values for the bunch of other factors) - basically for Richard's reasons. * Given [new potential-to-shift-motivation information/understanding], I expect there's a much higher chance that this substantially changes the direction of a not-yet-formed project, than a project already in motion. * Specifically: * Who gets picked to run such a project? If it's primarily a [let's beat China!] project, are the key people cautious and highly adaptable when it comes to top-level goals? Do they appoint deputies who're cautious and highly adaptable? * Here I note that the kind of 'caution' we'd need is [people who push effectively for the system to operate with caution]. Most people who want caution are more cautious. * How is the project structured? Will the structure be optimized for adaptability? For red-teaming of top-level goals? * Suppose that a mid-to-high-level participant receives information making the current top-level goals questionable - is the setup likely to reward them for pushing for changes? (noting that these are the kind of changes that were not expected to be needed when the project launched) * Which external advisors do leaders of the project develop relationships with? What would trigger these to change? * ... * I do think that it makes sense to aim for some centralized project - but only if it's the right kind. * I expect that almost all the directional influence is in [influence the initial conditions]. * For this reason, I expect [push for some kind of centralized project, and hope it changes later] is a bad idea. * I think [devote great effort to influencing the likely initial direction of any such future project] seems a great idea (so long as you're sufficiently enlightened about desirable initial directions, of course :)) * I'd note that [in
3Sohaib Imran
One thing I’d be bearish on is visibility into the latest methods being used for frontier AI methods, which would downstream reduce the relevance of alignment research except for the research within the manhattan-like project itself. This is already somewhat true of the big labs eg. methods used for o1 like models. However, there is still some visibility in the form of system cards and reports which hint at the methods. When the primary intention is racing ahead of China, I doubt there will be reports discussing methods used for frontier systems.
9Richard_Ngo
Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner. In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.
3worse
Something I'm worried about now is some RFK Jr/Dr. Oz equivalent being picked to lead on AI...
4Seth Herd
One factor is different incentives for decision-makers. The incentives (and the mindset) for tech companies is to move fast and break things. The incentives (and mindset) for government workers is usually vastly more conservative. So if it is the government making decisions about when to test and deploy new systems, I think we're probably far better off WRT caution. That must be weighed against the government typically being very bad at technical matters. So even an attempt to be cautious could be thwarted by lack of technical understanding of risks. Of course, the Trump administration is attempting to instill a vastly different mindset, more like tech companies. So if it's that administration we're talking about, we're probably worse off on net with a combination of lack of knowledge and YOLO attitudes. Which is unfortunate - because this is likely to happen anyway. As Habryka and others have noted, it also depends on whether it reduces race dynamics by aggregating efforts across companies, or mostly just throws funding fuel on the race fire.
5tlevin
Depends on the direction/magnitude of the shift! I'm currently feeling very uncertain about the relative costs and benefits of centralization in general. I used to be more into the idea of a national project that centralized domestic projects and thus reduced domestic racing dynamics (and arguably better aligned incentives), but now I'm nervous about the secrecy that would likely entail, and think it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project. Which is to say, even under pretty optimistic assumptions about how much such a project invests in alignment, security, and benefit-sharing, I'm pretty uncertain that this would be good, and with more realistic assumptions I probably lean towards it being bad. But it super depends on the governance, the wider context, how a "Manhattan Project" would affect domestic companies and China's policymaking, etc. (I think a great start would be not naming it after the Manhattan Project, though. It seems path dependent, and that's not a great first step.)
4davekasten
I think this is a (c) leaning (b), especially given that we're doing it in public.  Remember, the Manhattan Project was a highly-classified effort and we know it by an innocuous name given to it to avoid attention.   Saying publicly, "yo, China, we view this as an all-costs priority, hbu" is a great way to trigger a race with China... But if it turned out that we knew from ironclad intel with perfect sourcing that China was already racing (I don't expect this to be the case), then I would lean back more towards (c).  
Orpheus1633-4

My own impression is that this would be an improvement over the status quo. Main reasons:

  • A lot of my P(doom) comes from race dynamics.
  • Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can't do much to end the race. Their main strategy would be to go to the USG.
  • If the USG runs the Manhattan Project (or there's some sort of soft nationalization in which the government ends up having a much stronger role), it's much easier for the USG to see that misalignment risks are concerning & to do something about it.
... (read more)
habryka207

If the project was fueled by a desire to beat China, the structure of the Manhattan project seems unlikely to resemble the parts of the structure of the Manhattan project that seemed maybe advantageous here, like having a single government-controlled centralized R&D effort.

My guess is if something like this actually happens, it would involve a large number of industry subsidies, and would create strong institutional momentum that even when things got dangerous, to push the state of the art forward, and in as much as there is pushback, continue dangerou... (read more)

4Orpheus16
@davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)

We're not going to be bottlenecked by politicians not caring about AI safety. As AI gets crazier and crazier everyone would want to do AI safety, and the question is guiding people to the right AI safety policies

I think we're seeing more interest in AI, but I think interest in "AI in general" and "AI through the lens of great power competition with China" has vastly outpaced interest in "AI safety". (Especially if we're using a narrow definition of AI safety; note that people in DC often use the term "AI safety" to refer to a much broader set of concerns t... (read more)

Orpheus165316

A few quotes that stood out to me:

Greg:

I hope for us to enter the field as a neutral group, looking to collaborate widely and shift the dialog towards being about humanity winning rather than any particular group or company. 

Greg and Ilya (to Elon):

The goal of OpenAI is to make the future good and to avoid an AGI dictatorship. You are concerned that Demis could create an AGI dictatorship. So do we. So it is a bad idea to create a structure where you could become a dictator if you chose to, especially given that we can create some other structure that

... (read more)
Leon Lang110

Do we know anything about why they were concerned about an AGI dictatorship created by Demis?

Also this:

From Altman: [...] Admitted that he lost a lot of trust with Greg and Ilya through this process. Felt their messaging was inconsistent and felt childish at times. [...] Sam was bothered by how much Greg and Ilya keep the whole team in the loop with happenings as the process unfolded. Felt like it distracted the team.

Apparently airing such concerns is "childish" and should only be done behind closed doors, otherwise it "distracts the team", hm.

Orpheus161014

and recently founded another AI company

Potentially a hot take, but I feel like xAI's contributions to race dynamics (at least thus far) have been relatively trivial. I am usually skeptical of the whole "I need to start an AI company to have a seat at the table", but I do imagine that Elon owning an AI company strengthens his voice. And I think his AI-related comms have mostly been used to (a) raise awareness about AI risk, (b) raise concerns about OpenAI/Altman, and (c) endorse SB1047 [which he did even faster and less ambiguously than Anthropic].

The count... (read more)

7ozziegooen
I think that xAI's contributions have been minimal so far, but that could shift. Apparently they have a very ambitious data center coming up, and are scaling up research efforts quickly. Seems very accelerate-y.

I'm sympathetic to Musk being genuinely worried about AI safety. My problem is that one of his first actions after learning about AI safety was to found OpenAI, and that hasn't worked out very well. Not just due to Altman; even the "Open" part was a highly questionable goal. Hopefully Musk's future actions in this area would have positive EV, but still.

Orpheus165329

I agree with many points here and have been excited about AE Studio's outreach. Quick thoughts on China/international AI governance:

  • I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.
  • US foreign policy is dominated primarily by concerns about US interests. Other considerations can matter, but they are not the dominan
... (read more)
Algon*112

I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.

This inspired me to write a silly dialogue. 

Simplicio enters. An engine rumbles like the thunder of the gods, as Sophistico focuses on ensuring his MAGMA-O1 racecar will go as fast as possible.

Simplicio: "You shouldn't play Chicken."

Sophistico: "Why not?"

Simplic... (read more)

1Matrice Jacobine
My impression is that (without even delving into any meta-level IR theory debates) Democrats are more hawkish on Russia while Republicans are more hawkish on China. So while obviously neither parties are kum-ba-yah and both ultimately represent US interests, it still makes sense to expect each party to be less receptive to the idea of ending any potential arms race against the country they consider an existential threat to US interests if left unchecked, so the party that is more hawkish on a primarily military superpower would be worse on nuclear x-risk, and the party that is more hawkish on a primarily economic superpower would be worse on AI x-risk and environmental x-risk. (Negotiating arms control agreements with the enemy superpower right during its period of liberalization and collapse or facilitating a deal between multiple US allies with the clear goal of serving as a counterweight to the purported enemy superpower seems entirely irrelevant here.)

Did they have any points that you found especially helpful, surprising, or interesting? Anything you think folks in AI policy might not be thinking enough about?

(Separately, I hope to listen to these at some point & send reactions if I have any.)

you have to spend several years resume-building before painstakingly convincing people you're worth hiring for paid work

For government roles, I think "years of experience" is definitely an important factor. But I don't think you need to have been specializing for government roles specifically.

Especially for AI policy, there are several programs that are basically like "hey, if you have AI expertise but no background in policy, we want your help." To be clear, these are often still fairly competitive, but I think it's much more about being generally capable/competent and less about having optimized your resume for policy roles. 

Orpheus161814

I like the section where you list out specific things you think people should do. (One objection I sometimes hear is something like "I know that [evals/RSPs/if-then plans/misc] are not sufficient, but I just don't really know what else there is to do. It feels like you either have to commit to something tangible that doesn't solve the whole problem or you just get lost in a depressed doom spiral.")

I think your section on suggestions could be stronger by presenting more ambitious/impactful stories of comms/advocacy. I think there's something tricky about a ... (read more)

Thanks Akash for this substantive reply!

Last minute we ended up cutting a section at the end of the document called "how does this [referring to civics, communications, coordination] all add up to preventing extinction?" It was an attempt to address the thing you're pointing at here:

I think there's something tricky about a document that has the vibe "this is the most important issue in the world and pretty much everyone else is approaching it the wrong way" and then pivots to "and the right way to approach it is to post on Twitter and talk to your friends.

... (read more)
Reply2211
Orpheus163616

I think there's something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It's higher status and sexier and there's a default vibe that the best way to understand/improve the world is through rigorous empirical research.

I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy

I also think there are me... (read more)

Orpheus16146

One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there's nothing else we can do. There's not a better alternative– these are the only things that labs and governments are currently willing to support.

The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:

  • Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct
... (read more)

I appreciated their section on AI governance. The "if-then"/RSP/preparedness frame has become popular, and they directly argue for why they oppose this direction. (I'm a fan of preparedness efforts– especially on the government level– but I think it's worth engaging with the counterarguments.)

Pasting some content from their piece below.

High-level thesis against current AI governance efforts:

The majority of existing AI safety efforts are reactive rather than proactive, which inherently puts humanity in the position of managing risk rather than controlling A

... (read more)
5Bogdan Ionut Cirstea
This seems to be confusing a dangerous capability eval (of being able to 'deceive' in a visible scratchpad) with an assessment of alignment, which seems like exactly what the 'questioning' was about.
Load More