Random note: Congressman Brad Sherman just held up If Anyone Builds It, Everyone Dies in a Congressional hearing and recommended it, saying (rough transcript, might be slight paraphrase): "they're [the AI companies] not really focused on the issue raised by this book, which I recommend, but the title tells it all, If Anyone Builds It Everyone Dies"
I think this is a clear and unambiguous example of the theory of change of the book having at least one success -- being an object that can literally be held and pointed to by someone in power.
It's important context that Sherman was concerned about Superintelligence risks, broadly construed, decades ago.
In 2007 he gave a speech in which he said:
There is one issue that I think is more explosive than even the spread of nuclear weapons: engineered intelligence. By that I mean, the efforts of computer engineers and bio-engineers who may create intelligence beyond that of a human being. In testimony at the House Science Committee1, the consensus of experts testifying was that in roughly 25 years we would have a computer that passed the Turing Test,2 and more importantly, exceeded human intelligence.
As we develop more intelligent computers, we will find them useful tools in creating ever more intelligent computers, a positive feedback loop. I don't know whether we will create the maniacal Hal from 2001, or the earnest Data from Star Trek --- or perhaps both.
There are those who say don't worry, even if a computer is intelligent and malevolent --- it is in a box and it cannot affect the world. But I believe that there are those of our species who sell hands to the Beelzebub, in return for a good stock tip.
How the heck has this guy been in Congress the whole time and we've not heard about him / he's not been in contact with the AI x-risk scene?
I think this was a major dropped ball. We had mostly ruled out political advocacy, so there was no one trying to do the "make connections with congresspeople" work that would have caused us to discover that someone had been thinking of this as an important issue for years.
That said, I know that several x-risk orgs have been in contact with his office in recent years.
That's totally right, until like 2020 or something the community was small and underresourced, such that things were gonna get dropped.
But I think we also did a somewhat bad job of effectively strategizing about how to approach the problem such that we ended up making worse allocation-of-effort choices than we could have, given the (unfair) benefit of hindsight.
I have a draft post about how I think we should have spent the period before takeoff started, in retrospect.
This is such a funny coincidence! I just wrote a post where Claude does research on every member of congress individually.
https://www.lesswrong.com/posts/WLdcvAcoFZv9enR37/what-washington-says-about-agi
It was actually inspired by Brad Sherman holding up the book. I just saw this shortform and its funny because this thread roughly corresponds to my own thought process when seeing the original image!
Heads up -- if you're 1. on a H1-B visa AND 2. currently outside the US, there is VERY IMPORTANT, EXTREMELY TIME SENSITIVE stuff going on that might prevent you from getting back into the US after 21 September.
If this applies to you, immediately stop looking at LessWrong and look at the latest news. (I'm not providing a summary of it here because there are conflicting stories about who it will apply to and it's evolving hour by hour and I don't want this post to be out of date)
USCIS says this does not apply to existing H1-B visas, only to new applications: https://www.uscis.gov/sites/default/files/document/memos/H1B_Proc_Memo_FINAL.pdf
(I am not a lawyer nor a spokesperson for the US government and cannot advise on how likely it is that they will somehow backtrack on this.)
The Trump administration (or, more specifically, the White House Office of Science and Technology Policy, but they are in the lead on most AI policy, it seems), are asking for comment on what their AI Action Plan should include. Literally anyone can comment on it. You should consider commenting on it, comments are due Saturday at 8:59pm PT/11:59pm ET via an email address. These comments will actually be read, and a large number of comments on an issue usually does influence any White House's policy. I encourage you to submit comments!
regulations.gov/document/NSF_FRDOC_0001-3479… (Note that all submissions are public and will be published)
(Disclosure: I am working on a submission for this for my dayjob but this particular post is in my personal capacity)
(Edit note: I originally said this was due Friday; I cannot read a calendar, it is in fact due 24 hours later. Consider this a refund that we have all received for being so good at remembering the planning fallacy all these years.)
In the future, there should be some organization or some group of individuals in the LW community who raise awareness about these sorts of opportunities and offer content and support to ensure submissions from the most knowledgeable and relevant actors. This seems like a very low-hanging fruit and is something several groups I know are doing.
I think on net, there are relatively fewer risks related to getting governments more AGI-pilled vs. them continuing on their current course; governments are broadly AI-pilled even if not AGI/ASI-pilled and are doing most of the accelerating actions an AGI-accelerator would want.
Epistemic status: not a lawyer, but I've worked with a lot of them.
As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony). Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...
Okay, I spent much more time with the Anthropic RSP revisions today. Overall, I think it has two big thematic shifts for me:
1. It's way more "professionally paranoid," but needs even more so on non-cyber risks. A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)
2. It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result." You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet).
Something I believe:
The reason that society isn't currently freaking out about AI taking artists' jobs is mainly that we've historically thought of artists' jobs as inherently precarious, and so a new report of them being precarious for new reasons doesn't surprise anyone. The moment it takes a "so stable your mom wants you to study it in school" job, that all will change for the stable folks, but not the artists, unfortunately. After all, they're just artists...
(If you say that this means that I don't care about artists or their financial challenges, you're wrong. This sucks, I'm predicting, not endorsing, a likely scenario.)
I hope you're right that job losses in more 'stable' fields will catalyze interest in a constructive response, but I was surprised over the last 20 years or so as the market power of workers in various traditionally stable industries collapsed for mundane economic reasons and not much changed in the policy world. Professors, lawyers, accountants, civil servants, and even some types of physicians have all been squeezed fairly heavily in the US just from globalization, monopolization, and deregulation. There was some brief pushback around the time of Occupy Wall Street, and then after that increased job insecurity became part of the new status quo.
Entry level software engineers are now facing serious pressure. It's debatable whether this is from pandemic-era over-hiring or from AI, but until last year "software engineering" was the paradigmatic example of a job more stable than art, to the point where artists went to coding bootcamp if they wanted to sell out. Now bootcamps seem mostly dead, but I don't hear serious cries to save them.
Wasn't translator supposed to be a normal stable job? Allegedly they were hit hard by google translate even pre-llm
Another factor here is that artists are mostly self-employed, so you don't see headlines like "ArtCo closes factory; lays off 3,500 workers." Instead, a diffuse group of people spread all over the country just have a harder time finding work.
I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.
(I am reporting, not endorsing. For example, it is deeply unclear to me why we should take another country's statements about the year they're gonna do a war at surface level)
"want to pick a war with America" is really strange wording because China's strategic goals are not "win a war against nuclear-armed America", but things like "be able to control its claims in the South China Sea including invading Taiwan without American interference". Likewise Russia doesn't want to "pick a war with the EU" but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don't know whether national security people relate to the phrasing the same way but they do understand this.
It's a small but positive sign that Anthropic sees taking 3 days beyond their RSP's specified timeframe to conduct a process without a formal exception as an issue. Signals that at least some members of the team there are extremely attuned to normalization of deviance concerns.
Ok, so it seems clear that we are, for better or worse, likely going to try to get AGI to do our alignment homework.
Who has thought through all the other homework we might give AGI that is as good of an idea, assuming a model that isn't an instant-game-over for us? E.G., I remember @Buck rattling off a list of other ideas that he had in his The Curve talk, but I feel like I haven't seen the list of, e.g., "here are all the ways I would like to run an automated counterintelligence sweep of my organization" ideas.
(Yes, obviously, if the AI is sneakily misaligned, you're just dead because it will trick you into firing all your researchers, etc.; this is written in a "playing to your outs" mentality, not an "I endorse this as a good plan" mentality.)
@ryan_greenblatt is working on a list of alignment research applications. For control applications, you might enjoy the long list of control techniques in our original post.
At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky , @habryka , and some interlocutors from the frontier labs (you'll momentarily see why I'm being vague on the latter names).
One question was: "does DC actually listen to whistleblowers?" and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.
Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article "The U.S. Spies Who Sound the Alarm About Election Interference" by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)
The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.
Also of relevance is the wave of resignations from the DC newspaper The Washington Post the past few days over Jeff Bezos suddenly exerting control.
I'm about to embark on the classic exercise of "think a bunch about AI policy."
Does anyone actually have an up to date collection of "here are all the existing AI safety policy proposals out there"?
(Yes, I know, your existing proposal is already great and we should just implement it as-is. Think of the goal of this exercise being to convince someone else who needs to see a spreadsheet of "here are all the ideas, here is why idea number three is the best one")
The elites do want you to know it: you can just email a Congressional office and get a meeting
You can definitely meet your own district's staff locally (e.g., if you're in Berkeley, Congresswoman Simon has an office in Oakland, Senator Padilla has an office in SF, and Senator Schiff's offices look not to be finalized yet but undoubtedly will include a Bay Area Office).
You can also meet most Congressional offices' staff via Zoom or phone (though some offices strongly prefer in-person meetings).
There is also indeed a meaningful rationalist presence in DC, though opinions vary as to whether the enclave is in Adams Morgan-Columbia Heights, Northern Virginia, or Silver Spring.*
*This trichotomy is funny, but hard to culturally translate unless you want a 15,000 word thesis on DC-area housing and federal office building policy since 1945 and its related cultural signifiers. Just...just trust me on this.
Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?
For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future...
A random observation from a think tank event last night in DC -- the average person in those rooms is convinced there's a problem, but that it's the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.
This is a very weird set of selection effects. I'm not sure what to make of it, honestly.
We're hiring at ControlAI for folks who walk to work on UK and US policy advocacy. Come talk to Congress and Parliament and stop risks from unsafe superintelligences! controlai.com/careers
(Admins: I don't tend to see many folks posting this sort of thing here, so feel free to nuke this post if not the sort of content you're going for. But given audience here, figured might be of interest)
Zach Stein-Perlman's recent quick take is confusing. It just seems like an assertion, followed by condemnation of Anthropic conditioned on us accepting his assertion blindly as true.
It is definitely the case that "insider threat from a compute provider" is a key part of Anthropic's threat model! They routinely talk about it in formal and informal settings! So what precisely is his threat model here that he thinks they're not defending adequately against?
(He has me blocked from commenting on his posts for some reason, which is absolutely his right, but insofar as he hasn't blocked me from seeing his posts, I wanted to explicitly register in public my objection to this sort of low-quality argument.)
I agree that whether Anthropic has handled insider threat from compute providers is a crux. My guess is that Anthropic and humans-at-Anthropic wouldn't claim to have handled this (outside of the implicit claim for ASL-3) and they would say something more like that's out of scope for ASL-3 or oops.
Separately, I just unblocked you. (I blocked you because I didn't like this thread in my shortform, not directly to stifle dissent. I have not blocked anyone else. I mention this because hearing about disagreement being hidden/blocked should make readers suspicious but that's mostly not correct in this case.)
Edit: also, man, I tried to avoid "condemnation" and I think I succeeded. I was just making an observation. I don't really condemn Anthropic for this.
I basically agree with Zach that based on public information it seems like it would be really hard for them to be robust to this and it seems implausible that they have justified confidence in such robustness.
I agree that he doesn't say the argument in very much depth. Obviously, I think it'd be great if someone made the argument in more detail. I think Zach's point is a positive contribution even though it isn't that detailed.
I am a bit confused why it's an assertion and not an "argument"? The argument is relatively straightforward:
Anthropic is currently shipping its weights to compute providers for inference. Those compute providers almost certainly do not comply with Anthropic's ASL-3 security standard, and the inference setup is likely not structured in a way that makes it impossible for the compute provider somehow get access to the weights if they really wanted. This means Anthropic is violating its RSP, as their ASL-3 security standard required them to be robust against this kind of attack.
It is true that "insider threat from a compute provider" is a key part of Anthropic's threat model! Anthropic is clearly not unaware of this attack chain. Indeed, in the whitepaper linked in Zach's shortform they call for various changes that would need to happen at compute providers to enable a zero-trust relationship here, but also implicitly in calling for these changes they admit that they are very likely not currently in place!
My guess is what happened here is that at least some people at Anthropic are probably aware that their RSP commits them to a higher level of security than they can currently rea...
Anthropic has released their own whitepaper where they call out what kind of changes would need to be required. Can you please engage with my arguments?
I have now also heard from 1-2 Anthropic employees about this. The specific people weren't super up-to-date on what Anthropic is doing here, and didn't want to say anything committal, but nobody I talked to had a reaction that suggested that they thought it was likely that Anthropic is robust to high-level insider threats at compute providers.
Like, if you want you can take a bet with me here, I am happy to offer you 2:1 odds on the opinion of some independent expert we both trust on whether Anthropic is likely robust to that kind of insider threat. I can also start going into all the specific technical reasons, but that would require restating half of the state of the art of computer security, which would be a lot of work. I really think that not being robust here is a relatively straightforward inference that most people in the field will agree with (unless Anthropic has changed operating procedure substantially from what is available to other consumers in their deals with cloud providers, which I currently think is unlikely, but not impossible).
I really dislike the term "warning shot," and I'm trying to get it out of my vocabulary. I understand how it came to be a term people use. But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn't the right term "mass casualty event" ?
I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn't involve mass casualties.
I do agree with what I think you are pointin at, which is that there is something Orwellian about the "warning shot" language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).
Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models?
Here are at least 3 things I think they have as benefits:
1. Just an independent 3rd-party perspective generally
2. The ability to draw insights across multiple labs' efforts, and identify patterns that others might not be able to
3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: "can you design an accurate classified nuclear explosive lensing arrangement").
Are there others that come to mind?
One point that maybe someone's made, but I haven't run across recently: if you want to turn AI development into a Manhattan Project, you will by-default face some real delays from the reorganization of private efforts into one big national effort. In a close race, you might actually see pressures not to do so, because you don't want to give up 6 months to a year on reorg drama -- so in some possible worlds, the Project is actually a deceleration move in the short term, even if it accelerates in the long term!
It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).
Why do we think this is the case?
I can imagine at least 3 hypotheses:
1. Just path-dependence; someone did it, it went well, others imitated
2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas
3. This is a return to the true original meaning of an essay, under Mont...
Well, what's the alternative? If you think there is something weird enough and suboptimal about essay formats that you are reaching for 'random chance' or 'monkey see monkey do' level explanations, that implies you think there is some much superior format they ought to be using instead. But I can't see what. I think it might be helpful to try to make the case for doing these things via some of the alternatives:
I think those are the meta because they have just enough space to not only give opinions but to mention reasons for those opinions and expertise/background to support the many unstated judgment calls.
Note that the essays by Altman and Amodei are popular because their positions are central beyond the others because they have not only demonstrable backgrounds in AI but lots of name recognition (we're mostly assuming Altman has bothered learning a lot about how Transformers work even if we don't like him). And that the Gladstone report got itself commissioned by at least a little piece of the government.
A Narrow Path just demonstrates in the text that you and your co-authors have thought deeply about the topic. Shorter essays leave more guesswork on the authors' expertise and depth of consideration.
I just reread https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer by Paul Christiano from 2022 for somewhat random reasons[1] and wow is this a fascinating historical snapshot document, especially in the comment section.
Many of the main characters in AI from 2022 to 2025 swing by and say, essentially, "hello! I would like to foreshadow my character arc for the next 3 years!"
Too many open tabs, need to clean up or computer no do videoconference good
Incidentally, spurred by @Mo Putera's posting of Vernor Vinge's A Fire Upon The Deep annotations, I want to remind folks that Vinge's Rainbows End is very good and doesn't get enough attention, and will give you a less-incorrect understanding of how national security people think.
I'm pretty sure that I think "infohazard" is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans. It is an orphan of a concept -- it doesn't go anywhere. Ok, the information's harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?
That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.
I have a few weeks off coming up shortly, and I'm planning on spending some of it monkeying around AI and code stuff. I can think of two obvious tacks: 1. Go do some fundamentals learning on technical stuff I don't have hands-on technical experience with or 2. go build on new fun stuff.
Does anyone have particular lists of learning topics / syllabi / similar things like that that would be a good fit for "fairly familiar with the broad policy/technical space, but his largest shipped chunk of code is a few hundred lines of python" person like me?
If you're someone who has[1], or will have, read If Anyone Builds It, Everyone Dies, I encourage you to post your sincere and honest review of the book on Amazon once you have read it -- I think it would be useful to the book's overall reputation.
But be a rationalist! Give your honest opinion.
When:
If you've already read it: Once Amazon accepts reviews, likely starting on the book launch date tomorrow.
If you haven't read it: Once you've read it. Especially if you've ordered a copy from Amazon so they know the review is coming from a ...
Idle musing: should we all be writing a Claude Constitution-esque set of posts about our hopes for how AIs help humans around the dangerous moments in takeoff, in hopes that this influences the models advise not only us, but people who are coming into these issues fresher than us?
(Yes, I know that from the outside I have exponentially less influence on model behavior than Anthropic, and for MIRI-like reasons maybe this doesn't go well at all. But, you know, play to all of your outs.)
Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)
Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat
I feel like one of the trivially most obvious signs that AI safety comms hasn't gone actually mainstream yet is that we don't say, "yeah, superintelligent AI is very risky. No, I don't mean Terminator. I'm thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?"
I'll be in Berkeley Weds evening through next Monday, would love to chat with, well, basically anyone who wants to chat. (I'll be at The Curve Fri-Sun, so if you're already gonna be there, come find me there between the raindrops!)
Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?
(or is this just a expected-harm metric rather than a probability metric ?)
I am (speaking personally) pleasantly surprised by Anthropic's letter. https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf
I'll be at LessOnline this upcoming weekend -- would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.
Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the "grieving the possibility that we might all die" moment happened, on average, more years ago for the national security policy person than the AI safety person.
This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the "oh, what happens if we get nuked by Russia" moment in their careers in the Literal 1980s...
Ok, so Anthropic's new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds or @Evan Hubinger or someone else from Anthropic will, and figure the main convo should happen there, and don't want to incentivize fragmenting of conversation) seems to have a very obvious implication.
Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.