Epistemic status: not a lawyer, but I've worked with a lot of them.
As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony). Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...
Okay, I spent much more time with the Anthropic RSP revisions today. Overall, I think it has two big thematic shifts for me:
1. It's way more "professionally paranoid," but needs even more so on non-cyber risks. A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)
2. It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result." You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet).
I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.
(I am reporting, not endorsing. For example, it is deeply unclear to me why we should take another country's statements about the year they're gonna do a war at surface level)
"want to pick a war with America" is really strange wording because China's strategic goals are not "win a war against nuclear-armed America", but things like "be able to control its claims in the South China Sea including invading Taiwan without American interference". Likewise Russia doesn't want to "pick a war with the EU" but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don't know whether national security people relate to the phrasing the same way but they do understand this.
It's a small but positive sign that Anthropic sees taking 3 days beyond their RSP's specified timeframe to conduct a process without a formal exception as an issue. Signals that at least some members of the team there are extremely attuned to normalization of deviance concerns.
Ok, so it seems clear that we are, for better or worse, likely going to try to get AGI to do our alignment homework.
Who has thought through all the other homework we might give AGI that is as good of an idea, assuming a model that isn't an instant-game-over for us? E.G., I remember @Buck rattling off a list of other ideas that he had in his The Curve talk, but I feel like I haven't seen the list of, e.g., "here are all the ways I would like to run an automated counterintelligence sweep of my organization" ideas.
(Yes, obviously, if the AI is sneakily misaligned, you're just dead because it will trick you into firing all your researchers, etc.; this is written in a "playing to your outs" mentality, not an "I endorse this as a good plan" mentality.)
@ryan_greenblatt is working on a list of alignment research applications. For control applications, you might enjoy the long list of control techniques in our original post.
At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky , @habryka , and some interlocutors from the frontier labs (you'll momentarily see why I'm being vague on the latter names).
One question was: "does DC actually listen to whistleblowers?" and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.
Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article "The U.S. Spies Who Sound the Alarm About Election Interference" by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)
The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.
Also of relevance is the wave of resignations from the DC newspaper The Washington Post the past few days over Jeff Bezos suddenly exerting control.
Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?
For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future...
A random observation from a think tank event last night in DC -- the average person in those rooms is convinced there's a problem, but that it's the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.
This is a very weird set of selection effects. I'm not sure what to make of it, honestly.
I really dislike the term "warning shot," and I'm trying to get it out of my vocabulary. I understand how it came to be a term people use. But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn't the right term "mass casualty event" ?
I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn't involve mass casualties.
I do agree with what I think you are pointin at, which is that there is something Orwellian about the "warning shot" language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).
Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models?
Here are at least 3 things I think they have as benefits:
1. Just an independent 3rd-party perspective generally
2. The ability to draw insights across multiple labs' efforts, and identify patterns that others might not be able to
3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: "can you design an accurate classified nuclear explosive lensing arrangement").
Are there others that come to mind?
One point that maybe someone's made, but I haven't run across recently: if you want to turn AI development into a Manhattan Project, you will by-default face some real delays from the reorganization of private efforts into one big national effort. In a close race, you might actually see pressures not to do so, because you don't want to give up 6 months to a year on reorg drama -- so in some possible worlds, the Project is actually a deceleration move in the short term, even if it accelerates in the long term!
It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).
Why do we think this is the case?
I can imagine at least 3 hypotheses:
1. Just path-dependence; someone did it, it went well, others imitated
2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas
3. This is a return to the true original meaning of an essay, under Mont...
Well, what's the alternative? If you think there is something weird enough and suboptimal about essay formats that you are reaching for 'random chance' or 'monkey see monkey do' level explanations, that implies you think there is some much superior format they ought to be using instead. But I can't see what. I think it might be helpful to try to make the case for doing these things via some of the alternatives:
I think those are the meta because they have just enough space to not only give opinions but to mention reasons for those opinions and expertise/background to support the many unstated judgment calls.
Note that the essays by Altman and Amodei are popular because their positions are central beyond the others because they have not only demonstrable backgrounds in AI but lots of name recognition (we're mostly assuming Altman has bothered learning a lot about how Transformers work even if we don't like him). And that the Gladstone report got itself commissioned by at least a little piece of the government.
A Narrow Path just demonstrates in the text that you and your co-authors have thought deeply about the topic. Shorter essays leave more guesswork on the authors' expertise and depth of consideration.
Incidentally, spurred by @Mo Putera's posting of Vernor Vinge's A Fire Upon The Deep annotations, I want to remind folks that Vinge's Rainbows End is very good and doesn't get enough attention, and will give you a less-incorrect understanding of how national security people think.
I'm pretty sure that I think "infohazard" is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans. It is an orphan of a concept -- it doesn't go anywhere. Ok, the information's harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?
That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.
I have a few weeks off coming up shortly, and I'm planning on spending some of it monkeying around AI and code stuff. I can think of two obvious tacks: 1. Go do some fundamentals learning on technical stuff I don't have hands-on technical experience with or 2. go build on new fun stuff.
Does anyone have particular lists of learning topics / syllabi / similar things like that that would be a good fit for "fairly familiar with the broad policy/technical space, but his largest shipped chunk of code is a few hundred lines of python" person like me?
Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)
Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat
I feel like one of the trivially most obvious signs that AI safety comms hasn't gone actually mainstream yet is that we don't say, "yeah, superintelligent AI is very risky. No, I don't mean Terminator. I'm thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?"
I'll be in Berkeley Weds evening through next Monday, would love to chat with, well, basically anyone who wants to chat. (I'll be at The Curve Fri-Sun, so if you're already gonna be there, come find me there between the raindrops!)
Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?
(or is this just a expected-harm metric rather than a probability metric ?)
I am (speaking personally) pleasantly surprised by Anthropic's letter. https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf
I'll be at LessOnline this upcoming weekend -- would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.
Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the "grieving the possibility that we might all die" moment happened, on average, more years ago for the national security policy person than the AI safety person.
This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the "oh, what happens if we get nuked by Russia" moment in their careers in the Literal 1980s...
Ok, so Anthropic's new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds or @Evan Hubinger or someone else from Anthropic will, and figure the main convo should happen there, and don't want to incentivize fragmenting of conversation) seems to have a very obvious implication.
Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.
Incidentally, spurred by @Mo Putera's posting of Vernor Vinge's A Fire Upon The Deep annotations, I want to remind folks that Vinge's Rainbows End is very good and doesn't get enough attention, and will give you a less-incorrect understanding of how national security people think.