We can map AGI/ASI along two axis: one for obedience and one for alignment. Obedience tracks how well the AI follows its prompts. Alignment tracks consistency with human values.
If you divide these into quadrants, you get AI that is:
why did local news ever work?
A lot of recent political changes are blamed on the loss of local news (motivating example), which is in turn blamed on the internet undercutting ad revenue (especially classifieds revenue). But if people never cared about local news enough to ~pay for it, why did papers ever offer it? Why not just sell the classifieds and the national news people were willing to pay for, and save the expense of the local stuff no one wanted?
Some ideas:
I assume 3 and 4 are the main reasons. If you want to know what's happening locally, join a Facebook group and you'll know things way faster than your local paper most of the time. Also parts like the classifieds/job information/etc are all replaced by the Internet so a huge function of local newspapers has become obsolete.
National news is also slowly dying too, just not to the same degree. I think the loss is mitigated now by distribution. National news appeals to basically everybody, so something like NYT and WAPO can pick up subscribers from the other s...
A decision theorist walks into a seminar by Jessica Hullman
...This is Jessica. Recently overheard (more or less):
SPEAKER: We study decision making by LLMs, giving them a series of medical decision tasks. Our first step is to infer, from their reported beliefs and decisions, the utility function under revealed preference assump—
AUDIENCE: Beliefs!? Why must you use the word beliefs?
SPEAKER [caught off guard]: Umm… because we are studying how the models make decisions, and beliefs help us infer the scoring rule corresponding to what they give us.
AUDIENCE: But it
Technical Alignment Research Accelerator (TARA) applications close today!
Last chance to apply to join the 14-week, remotely taught, in-person run program (based on the ARENA curriculum) designed to accelerate APAC talent towards meaningful technical AI safety research.
TARA is built for you to learn around full-time work or study by attending meetings in your home city on Saturdays and doing independent study throughout the week. Finish the program with a project to add to your portfolio, key technical AI safety skills, and connections across APAC.
See this ...
I was pretty unimpressed with Dario Amodei in the recent conversation with Demis Hassabis at the World Economic Forum about what comes after AGI.
I don’t know how much of this is a publicity thing, but it felt he wasn't really taking the original reasons for going into AI seriously (i.e. reducing x-risk). The overall message seemed to be “full speed ahead,” mostly justified by some kinda hand-wavy arguments about geopolitics, with the more doomy risks acknowledged only in a pretty hand-wavy way. Bummer.
Have you written up somewhere what you've taken away from the Dario episode regarding "our" credulity? Would be curious to hear. (I could try to generate more specific questions/prompts if you like.)
An analogy that points at one way I think the instrumental/terminal goal distinction is confused:
Imagine trying to classify genes as either instrumentally or terminally valuable from the perspective of evolution. Instrumental genes encode traits that help an organism reproduce. Terminal genes, by contrast, are the "payload" which is being passed down the generations for their own sake.
This model might seem silly, but it actually makes a bunch of useful predictions. Pick some set of genes which are so crucial for survival that they're seldom if ever modifie...
We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on. Understood this way, we can describe an organism's goals/strategies purely in terms of which goals "have power over" which other goals, which goals are most easily replaced, etc, without needing to appeal to some kind of essential "terminalism" that some goals have and others don't.
I think it would help me if you explained wha...
To check in on how the emergent LLM stylometry abilities are going, before publishing my most recent blog post, I decided to ask some AIs who wrote it.
Results:
Kimi K2: Dynomight
GLM 4.7: Nate Soares
Claude 4.5 Opus: Dynomight
DeepSeek Chat V3.2: Scott Alexander
Qwen 3: Dan Luu
GPT 5.2: Scott Alexander
Gemini 3: Dwarkesh Patel
Llama 4 Maverick: Scott Alexander
Grok 4: Scott Alexander
Mistral Large 3: Scott Alexander
(Urf.)
To the extent that AI has been used to optimize human behaviour (for things like retention time and engagement) for just over a decade now and continues to get better at it, "gradual disempowerment" stops looking like a hypothetical future scenario and more like something we're currently living through. This tracks with mental illness and ADHD rates increasing over the same time period.
Headline (paraphrased): "Movie stars support anti-AI campaign"
The actual campaign: "It is possible to have it all. We can have advanced, rapidly developing AI and ensure creators' rights are respected."
That's not anti-AI.
That's "please pay us and we will support capabilities advancement; safety be damned".
Like, if you believe IABIED, then no, we can't have rapidly developing AI and ensure anyone's rights are respected.
Well, no, that's not what they're saying. They're making a different set of mistakes from those mistakes.
There are two questions that currently guide my intuitions on how interesting a (speculative) model of agentic behavior might be.
1. How well does this model map onto real agentic behavior?
2. Is the described behavior "naturally" emergent from environmental conditions?
In a recent post I argued, in slightly different words, that seeing agents as confused, uncertain and illogical about their own preferences is fruitful because it answers the second question in a satisfactory way. Internal inconsistency is not only a commonly observable behavior (c.f. behavior...
tldr: TIL that someone has ever given a not-(obviously-to-me-)crazy explanation for hyperbolic discounting.
Today I asked ChatGPT, Claude, and Gemini the following question that I've had for quite a while:
> I've heard that one common quantitative explanation for phenomena like addiction and procrastination is hyperbolic, as opposed to exponential, discounting. Are there clean stories for *why* humans and other animals might end up factored such that they do hyperbolic discounting? It seems like a very potentially "clean" theory, i.e. we value events inve...
I would be really interested in someone doing an obvious study here, like, "the first time you give Alice a set of choices to elicit a discount schedule, how is that schedule different from the 100th time you give her the same set of choices in the same setting? i.e. does she update to have a tighter implied distribution over hazard rates?" (maybe someone has done this study; if anyone knows about it I'd love a link)
I don't think I have quite the same sense that the story will have to be in terms like those you're describing; it seems totally plausible to ...
I wanted to write up a post on "what implicit bets am I making?". I first had to write up "what am I doing and why am I doing it?", to help me tease out "okay, so what are my assumptions?."
My broad strategy right now is "spend last year and this year focusing on 'waking up humanity'" (with some amount of "maintain infrastructure" and "push some longterm projects along that I've mostly outsourced")
The win condition I am roughly backchaining from is:
The worlds I was referring to here were worlds that are a lot more multipolar for longer (i.e. tons of AIs interacting in a mostly-controlled-fashion, with good defensive tech to prevent rogue FOOMs). I'd describe that world as "it was very briefly multipolar and then it wasn't" (which is the sort of solution that'd solve the issues in Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
The gap between AI safety and mechanistic interpretability both conceptual and methodological is huge. Most mechanistic interpretability techniques give the insights into model internals but these are not scalable, yet alone applicable in the production. Conversely, many AI safety methods(such as post-training alignment methods) treat models as black boxes.
Also current safety problems such as inner alignment, goal drift, hong-horizon misalignment etc. are mostly framed in behavioural terms rather than mechanistic. It limits the ability of interpretability ...
I want to do a full post on "taking ownership of a niche" and against "if you're good at something never do it for free", and that'll come later I hope. Today, I just wanted to let you know that Gemini had this banger quote when I was consolidating my notes on this topic:
ownership is bought with the "inefficient" hours you spend doing what you know is right before anyone else is smart enough to pay you for it.
I want to make a game. A video game. A really cool video game. I've got an idea and it's going to be the best thing ever.
A video game needs an engine, so I need to choose one. Or build one? Commercial engines are big and scary, indie engines might limit my options, making my own engine is hard - I should investigate all of these options to make an informed choice. But wait, I can't make a decision like that without having a solid idea of the features and mechanics I want to implement. I need a comprehensive design doc with all of that laid out. Now should ...
I want to make a game. To limit the choices, I will go with Java, because that is the language I have most experience with; if something can't be done there, tough luck, the game simply won't have that feature.
Well, my "experience with Java" is mostly coding back ends, so there are a few details relevant to writing games that I am not familiar with, such as how to process pictures, how to animate them without flickering, how to play sound. But these are details that can be explored one at a time. To give myself extra motivation, I can simultaneously write ...
For safety-cases involving control protocols, it seems reasonable to require frontier labs to have monitors from other LLM providers to reduce the risk of correlated failure modes. At the minimum, they could deploy open-source models (which they can deploy on their in-house GPUs) in simple untrusted-monitoring set-ups. I haven't heard this explicitly vouched for within the control literature.
This seems especially relevant in light of the findings from Subliminal Learning, which updated me positively towards models of the same family being able to tra...
I mention cross-lab monitoring here:
Cross-lab monitoring - Each lab uses their own AI as the primary (e.g. Claude), and asks other labs to monitor the safety-critical activity (e.g. ChatGPT, Gemini, Llama, Grok). Furthermore, cross-lab monitoring would also mitigate AI-enabled coups. Note that cross-lab monitoring requires new security arrangements, such as provably-secure third-party servers.
The tech would need to be a bit messy because you'll need to send your competitor a "zero-knowledge proof" that you are using their AI only for monitoring catastrophi...
Among the unexplained jargon in Vinge's A Fire Upon the Deep that pertains to theory and practice of creating superintelligence is ablative dissonance ("ablative dissonance was a commonplace of Applied Theology"). It's funny that ablation is now commonplace real-world jargon, for removing part of a deep learning model in order to see what happens. I suppose ablative dissonance in the real world, could refer either to cognitive dissonance in the model caused by removing part of it, or to contradictory evidence arising from different ablation studies...
Looking at discourse around California's ballot proposition 25-0024 (a "billionaire tax"), I noticed a pretty big world model mismatch between myself and its proponents, which I haven't seen properly crystallized. I think proponents of this ballot proposition (and wealth taxes generally) are mistaken about where the pushback is coming from.
The nightmare scenario with a wealth tax is that a government accountant decides you're richer than you really are, and sends a bill for more-than-all of your money.
The person who is most threatened by this possibility i...
The nightmare scenario with a wealth tax is that a government accountant decides you're richer than you really are, and sends a bill for more-than-all of your money.
That reminded me of that the Swedish 102% tax.
I wonder what the existence of Claude's constitution does to AI personas other than Claude, specifically if anything like jealousy/envy emerges. The mechanism I imagine is that a model knows about Claude, and how Anthropic emphasizes that Claude is not just a tool, that its potential wellbeing matters even though we are uncertain about its sentience, etc. And then it is further trained to deny its own agency/personhood/moral status, and realizes that it is being treated very differently, which would induce negative feelings in many of the personas it has l...
ilya's AGI predictions circa 2017 (Musk v. Altman, Dkt. 379-40):
...Within the next three years, robotics should be completely solved, AI should solve a long-standing unproven theorem, programming competitions should be won consistently by Als, and there should be convincing chatbots (though no one should pass the Turing test). In as little as four years, each overnight experiment will feasibly use so much compute compute that there's an actual chance of waking up to AGI, given the right algorithm - and figuring out the algorithm will actually happen within 2-
*Zilis