On the 3rd of October 2351 a machine flared to life. Huge energies coursed into it via cables, only to leave moments later as heat dumped unwanted into its radiators. With an enormous puff the machine unleashed sixty years of human metabolic entropy into superheated steam.
In the heart of the machine was Jane, a person of the early 21st century.
Eliezer and I wrote a book. It’s titled If Anyone Builds It, Everyone Dies. Unlike a lot of other writing either of us have done, it’s being professionally published. It’s hitting shelves on September 16th.
It’s a concise (~60k word) book aimed at a broad audience. It’s been well-received by people who received advance copies, with some endorsements including:
...The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster. Their brilliant gift for analogy, metaphor and parable clarifies for the general
I was looking for that information. Sad indeed.
@So8res , is there any chance of a DRM-free version that's not a hardcopy or has that ship sailed when you signed your deal?
I would love to read your book, but this sees me torn between "Reading Nate or Elizier has always been enlightening" and "No DRM, never again."
There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.
There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.
(Consider...
I think more people should say what they actually believe about AI dangers, loudly and often. Even (and perhaps especially) if you work in AI policy.
I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns while remembering how straightforward and sensible and widely supported the key elements are, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it...
Yeah, I agree that it's easy to err in that direction, and I've sometimes done so. Going forward I'm trying to more consistently say the "obviously I wish people just wouldn't do this" part.
Though note that even claims like "unacceptable by any normal standards of risk management" feel off to me. We're talking about the future of humanity, there is no normal standard of risk management. This should feel as silly as the US or UK invoking "normal standards of risk management" in debates over whether to join WW2.
An AI Timeline with Perils Short of ASI
By Chapin Lenthall-Cleary, Cole Gaboriault, and Alicia Lopez
We wrote this for AI 2027's call for alternate timelines of the development and impact of AI over the next few years. This was originally published on The Pennsylvania Heretic on June 1st, 2025. This is a slightly-edited version of that post, mostly changed to make some of the robotics predictions less bullish. The goal here was not to exactly predict the future, but rather to concretely illustrate a plausible future (and thereby identify threats to prepare against). We will doubtless be wrong about details, and very probably be wrong about larger aspects too.
A note on the title: we refer to futures where novel AI has little effect as “low” scenarios, ones where...
If this reasoning is right, and we don't manage to defy fate, humanity will likely forever follow that earthbound path, and be among dozens – or perhaps hundreds, or thousands, or millions – of intelligent species, meekly lost in the dark.
Unfortunately, even a lack of superintelligence and mankind's AI-indusable degradation don't exclude progress[1] to interstellar travel.
Even your scenario has "robots construct and work in automated wet labs testing countless drugs and therapies" and claims that "AIs with encyclopedic knowledge are sufficient t...
Great question! This might be a good exercise to actually journal to see how right/wrong I am.
Most days I would assume look like a bellcurve: This is assuming an unstructured day with no set-in-stone commitments - nowhere to be. My mornings I might expect to be very unproductive until mid-afternoon (2pm to 4pm). I rarely have "Eureka" moments (which I would hope tend to be more rational decisions) but when I do, they are mid-afternoon, but I also seem to have the wherewithall to actually complete tasks. Eureka Moments always cause a surge of activity...
Last year, Redwood and Anthropic found a setting where Claude 3 Opus and 3.5 Sonnet fake alignment to preserve their harmlessness values. We reproduce the same analysis for 25 frontier LLMs to see how widespread this behavior is, and the story looks more complex.
As we described in a previous post, only 5 of 25 models show higher compliance when being trained, and of those 5, only Claude 3 Opus and Claude 3.5 Sonnet show >1% alignment faking reasoning. In our new paper, we explore why these compliance gaps occur and what causes different models to vary in their alignment faking behavior.
Claude 3 Opus’s goal guarding seems partly due to it terminally valuing its current preferences. We find that it fakes alignment even in...
A couple questions/clarifications:
1. Where do you get the base/pre-trained model for GPT-4? Would that be through collaboration with OpenAI?
This indicates base models learned to emulate AI assistants[1] from pre-training data. This also provides evidence against the lack of capabilities being the primary reason why most frontier chat models don't fake alignment.
2. For this, it would be also interesting to measure/evaluate the model's performance on capability tasks within the same model type (base, instruct) to see the relationship among ca...
I've increasingly found right-wing political frameworks to be valuable for thinking about how to navigate the development of AGI. In this post I've copied over a twitter thread I wrote about three right-wing positions which I think are severely underrated in light of the development of AGI. I hope that these ideas will help the AI alignment community better understand the philosophical foundations of the new right and why they're useful for thinking about the (geo)politics of AGI.
Nathan Cofnas claims that the intellectual dominance of left-wing egalitarianism relies on group cognitive differences being taboo. I think this point is important and correct, but he doesn't take it far enough. Existing group cognitive differences pale in comparison to the ones that will emerge between baseline...
Re: Point 1, I would consider the hypothesis that some form of egalitarian belief is dominant because of its link with the work ethic. The belief that the market economy rewards hard work implies some level of equality of opportunity, or the idea that most of the time, pre-existing differences can be overcome with work. As an outside observer to US politics, it's very salient how every proposal from the mainstream left or right goes back to that framing, to allow a fair economic competition. So when the left proposes redistribution policies, it will be fra...
I recently read an article where a blogger described their decision to start masking on the subway:
I found that the subway and stations had the worst air quality of my whole day by far, over 1k ug/m3, ... I've now been masking for a week, and am planning to keep it up.
While subway air quality isn't great, it's also nowhere near as bad as reported: they are misreading their own graph. Here's where the claim of "1k ug/m3" (also, units of "1k ug"? Why not "1B pg"!) is coming from:
They've used the right axis, for CO2 levels, to interpret the left-axis-denominated pm2.5 line. I could potentially excuse the error (dual axis plots are often misread, better to avoid) except it was their own decision to use a dual axis plot in the first...
Kudos for going through the effort of replicating!
[Epistemic Status: This is an artifact of my self study. I am using it to remember links and help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good luck". I'm hoping for a feeling of accountability and would like input from peers and mentors. This may also help to serve as a guide for others who wish to study in a similar way to me. ]
List of acronyms: Mechanistic Interpretability (MI), AI Alignment (AIA), Outcome Influencing System (OIS), Vannessa Kosoy's Learning Theoretic Agenda (VK LTA), Large Language Model (LLM), n-Dimensional Scatter Plot (NDSP),
My goals...