I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will be crashed in front of my house within the next week. Here's the data:
Until today, only two cars had crashed in front of my house, several months apart, during the 15 months I have lived here. But a few hours ago it happened again, mere weeks from the previous crash. This graph may look harmless enough, but now consider the frequency of crashes this implies over time:
The car crash singularity will occur in the early morning hours of Monday, April 7. As crash frequency approaches infinity, every car will be involved. You might be thinking that the same car could be involved in multiple crashes. This is true! But the same car can only withstand a finite number of crashes before it is no longer able to move. It follows that every car will be involved in at least one crash. And who do you think will be driving your car?
See, this is what happens when you extrapolate data points linearly into the future. You get totally unrealistic predictions. It's important to remember the physical constraints on whatever trend you're trying to extrapolate. Importantly for this issue, you need to remember that time between successive crashes can never be negative, so it is inappropriate to model intervals with a straight line that crosses the time axis on April 7.
Instead, with so few data points, a more realistic model would take a log-transform of the inter-crash interval before fitting...
TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. I certainly try to have good strategic takes, but it's hard, and you shouldn't assume I succeed!
I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these...
I suppose I mean influence over politics, policy, or governance (this is very high level since these are all distinct and separable), rather than actually being political necessarily. I do think there are some common skills, but actually being a politician weighs so many other factors more heavily that the strategic skill is not selected on very strongly at all. Being a politician's advisor, on the other hand...
Yes, it's a special case, but importantly one that is not evaluated by Brier score or Manifold bucks.
Epistemic status: Using UDT as a case study for the tools developed in my meta-theory of rationality sequence so far, which means all previous posts are prerequisites. This post is the result of conversations with many people at the CMU agent foundations conference, including particularly Daniel A. Herrmann, Ayden Mohensi, Scott Garrabrant, and Abram Demski. I am a bit of an outsider to the development of UDT and logical induction, though I've worked on pretty closely related things.
I'd like to discuss the limits of consistency as an optimality standard for rational agents. A lot of fascinating discourse and useful techniques have been built around it, but I think that it can be in tension with learning at the extremes. Updateless decision theory (UDT) is one of those...
Do you think a superintelligence will be able to completely rule out the hypothesis that our universe literally is a dovetailing program that runs every possible TM, or literally is a bank of UTMs running every possible program (e.g., by reproducing every time step and adding 0 or 1 to each input tape)? (Or the many other hypothetical universes that similarly contain a whole Level-4-like multiverse?) It seems to me that hypotheses like these will always collectively have a non-negligible weight, and have to be considered when making decisions.
Another argum...
Our community is not prepared for an AI crash. We're good at tracking new capability developments, but not as much the company financials. Currently, both OpenAI and Anthropic are losing $5 billion+ a year, while under threat of losing users to cheap LLMs.
A crash will weaken the labs. Funding-deprived and distracted, execs struggle to counter coordinated efforts to restrict their reckless actions. Journalists turn on tech darlings. Optimism makes way for mass outrage, for all the wasted money and reckless harms.
You may not think a crash is likely. But if it happens, we can turn the tide.
Preparing for a crash is our best bet.[1] But our community is poorly positioned to respond. Core people positioned themselves inside institutions – to advise on how to maybe make AI 'safe',...
Glad to read your thoughts!
Agreed on being friends with communities who are not happy about AI.
I’m personally not a fan of working with OpenAI or Anthropic, given that they’ve defected on people here concerned about a default trajectory to mass extinction, and used our research for their own ends.
Epistemic status: This post aims at an ambitious target: improving intuitive understanding directly. The model for why this is worth trying is that I believe we are more bottlenecked by people having good intuitions guiding their research than, for example, by the ability of people to code and run evals.
Quite a few ideas in AI safety implicitly use assumptions about individuality that ultimately derive from human experience.
When we talk about AIs scheming, alignment faking or goal preservation, we imply there is something scheming or alignment faking or wanting to preserve its goals or escape the datacentre.
If the system in question were human, it would be quite clear what that individual system is. When you read about Reinhold Messner reaching the summit of Everest, you would be curious about...
(Other than the thoughts on the consequences of said idea) This idea largely seems like a rehash of https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators (and frankly, so does the three layer model, but that does go into more mechanistic territory and I think it complements simulator theory well)
This post was adapted from an internal doc I wrote at Wave.
Welcome to being a manager! Your time-management problem just got a lot harder.
As an IC, you can often get away with a very simple time-management strategy:
As a team lead, this isn’t going to work, because much more of your work is interrupt-driven or gets blocked for long periods of time. One-on-ones! Code reviews! Design reviews! Prioritization meetings! Project check-ins! All of these are subject to an external schedule rather than being the type of thing that you can push on in a single focused block until it’s done.
Being a team lead means three big changes for your time management:
You no longer have a
A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".
If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.
For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].
But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...
Something entirely new occurred around March 26th, 2025. Following the release of OpenAI’s 4o image generation, a specific aesthetic didn’t just trend—it swept across the virtual landscape like a tidal wave. Scroll through timelines, and nearly every image, every meme, every shared moment seemed spontaneously re-rendered in the unmistakable style of Studio Ghibli. This wasn’t just another filter; it felt like a collective, joyful migration into an alternate visual reality.
But why? Why this specific style? And what deeper cognitive or technological threshold did we just cross? The Ghiblification wave wasn’t mere novelty; it was, I propose, the first widely experienced instance of successful reality transfer: the mapping of our complex, nuanced reality into a fundamentally different, yet equally coherent and emotionally resonant, representational framework.
And Ghibli, it turns out, was...
A thing that gave me creeping horror about the Ghiblification is that the I don't think the masses actually particularly understand Ghibli. And the result is an uneven simulacrum-mask that gives the impression of "rendered with love and care" without actually being so.
The Ghibli aesthetic is actually historically pretty important to me, and in particular important as a counterbalanacing force against, among other things, what I expect to happen by default with AI. Some things I like about Ghibli:
I've been running meetups since 2019 in Kitchener-Waterloo. These were rationalist-adjacent from 2019-2021 (examples here) and then explicitly rationalist from 2022 onwards.
Here's a low-effort/stream of consciousness rundown of some meetups I ran in Q1 2025. Sometime late last year, I resolved to develop my meetup posts in such a way that they're more plug-and-play-able by other organizers who are interested in running meetups on the same topics. Below you'll find links to said meetup posts (which generally have an intro, required and supplemental readings, and discussion questions for sparking conversation—all free to take), and brief notes on how they went and how they can go better. Which is to say, this post might be kind of boring for non-organizers.
The first meetup of...
If there are a lot of people for the very-low-context NY meetup, possibly at least one very-low-context meetup per quarter is worth doing, to see if that gets people in/back more?