In recent years, I’ve found that people who self-identify as members of the AI safety community have increasingly split into two camps: Camp A) "Race to superintelligence safely”: People in this group typically argue that "superintelligence is inevitable because of X”, and it's therefore better that their in-group (their company...
Subhash, Josh, and David were co-first authors on this project. The full paper is here, and our code can be found here. TLDR We empirically study the success of weak-to-strong oversight as we scale the intelligence of the weak overseer model (Guard) and strong adversary model (Houdini) in four oversight...
As humanity gets closer to Artificial General Intelligence (AGI), a new geopolitical strategy is gaining traction in US and allied circles, in the NatSec, AI safety and tech communities. Anthropic CEO Dario Amodei and RAND Corporation call it the “entente”, while others privately refer to it as “hegemony" or “crush...
There's been ample interest in the probability that the Ukraine conflict escalates into a nuclear way, ranging from predictions on Metaculus, Manifold and Polymarket to estimates by David Orr, myself and others. Aside from estimating this probability, what can we do to reduce it? Current discourse is often framed as...
Many people have asked me what I think the odds are of an imminent major US-Russia nuclear war. My current estimate is about the same as losing in Russian roulette: one in six. The goal of this post is to explain how I arrived at this estimate. Please forgive its...
Although I feel that Nick Bostrom’s new book “Superintelligence” is generally awesome and a well-needed milestone for the field, I do have one quibble: both he and Steve Omohundro appear to be more convinced than I am by the assumption that an AI will naturally tend to retain its goals...