I disagree with Ben. I think the usage that Mark is referring to is a reference to Death with Dignity. A central example of my usage is
it would be undignified if AI takes over because we didn't really try off-policy probes; maybe they just work; someone should figure that out
It's playful and unserious but "X would be undignified" roughly means "it would be an unfortunate error if we did X or let X happen" and is used in the context of AI doom and our ability to affect P(doom).
Note: below is a hypothetical future written in strong terms and does not track my actual probabilities.
Throughout 2025, a huge amount of compute is spent on producing data in verifiable tasks, such as math[1] (w/ "does it compile as a proof?" being the ground truth label) and code (w/ "does it compile and past unit tests?" being the ground truth label).
In 2026, when the next giant compute clusters w/ their GB200's are built, labs train the next larger model over 100 days, then some extra RL(H/AI)F and whatever else they've cooked up by then.
By mid-2026, we have a model that is very generally intelligent, that is superhuman in coding and math proofs.
Naively, 10x-ing research means releasing 10x the same quality amount of papers in a year; however, these...
Hey Logan, thanks for writing this!
We talked about this recently, but for others reading this: given that I'm working on building an org focused on this kind of work and wrote a relevant shortform lately, I wanted to ping anybody reading this to send me a DM if you are interested in either making this happen (looking for a cracked CTO atm and will be entering phase 2 of Catalyze Impact in January) or provide feedback to an internal vision doc.
I have a lot of ideas about AGI/ASI safety. I've written them down in a paper and I'm sharing the paper here, hoping it can be helpful.
Title: A Comprehensive Solution for the Safety and Controllability of Artificial Superintelligence
Abstract:
As artificial intelligence technology rapidly advances, it is likely to implement Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI) in the future. The highly intelligent ASI systems could be manipulated by malicious humans or independently evolve goals misaligned with human interests, potentially leading to severe harm or even human extinction. To mitigate the risks posed by ASI, it is imperative that we implement measures to ensure its safety and controllability. This paper analyzes the intellectual characteristics of ASI, and three conditions for ASI to cause catastrophes (harmful goals, concealed intentions,...
I didn't read the 100 pages, but the content seems extremely intelligent and logical. I really like the illustrations, they are awesome.
A few questions.
1: In your opinion, which idea in your paper is the most important, most new (not already focused on by others), and most affordable (can work without needing huge improvements in political will for AI safety)?
2: The paper suggests preventing AI from self-iteration, or recursive self improvement. My worry is that once many countries (or companies) have access to AI which are far better and faster than human...
There's a concept I first heard in relation to the Fermi Paradox, which I've ended up using a lot in other contexts.
Why do we see no aliens out there? A possible (though not necessarily correct) answer, is that the aliens might not want to reveal themselves for fear of being destroyed by larger, older, hostile civilizations. There might be friendly civilizations worth reaching out to, but the upside of finding friendlies is smaller than the downside of risking getting destroyed.
Even old, powerful civilizations aren't sure that they're the oldest and most powerful civilization, and eldest civilizations could be orders of magnitude more powerful still.
So, maybe everyone made an individually rational-seeming decision to hide.
A quote from the original sci-fi story I saw describing this:
...“The universe is a dark
The problem with Dark Forest theory is that, in the absence of FTL detection/communication, it requires a very high density and absurdly high proportion of hiding civilizations. Without that, expansionary civilizations dominate. The only known civilization, us, is expansionary for reasons that don't seem path-determinant, so it seems unlikely that the preconditions for Dark Forest theory exist.
To explain:
Hiders have limited space and mass-energy to work with. An expansionary civilization, once in its technological phase, can spread to thousands of star sys...
Last week, a report was released on the potential risks of mirror biology - that is, biological systems with reversed chirality. I had previously looked into the topic, and was skeptical of the viability of this as a global catastrophic or existential risk. This report was tremendously helpful in filling in details of a scenario where, if mirror bacteria are developed, they could cause a global catastrophe. I will assume that readers here have or will read at least a summary, if not the original Science article or perhaps the summaries in the (300 page) report itself. And before I get into details, I will say that I’m not a biologist; I have a significant familiarity with relevant topics, but there is some chance I get technical...
I don't understand. You shouldn't get any changes from changing encoding if it produces the same proteins - the difference for mirror life is that it would also mirror proteins, etc.
Historically, I've gotten the impression from the AI Safety community that theoretical AI alignment work is promising in long timeline situations but less relevant in short timeline ones. As timelines have shrunk and capabilities have accelerated, I feel like I've seen theoretical alignment work appear less frequently and with less fanfare.
However, I think that we should challenge this communal assumption. In particular, not all capabilities advance at the same rate, and I expect that formally verified mathematical reasoning capabilities will accelerate particularly quickly. Formally-verified mathematics has a beautifully clean training signal, it feels like the perfect setup for aggressive amounts of RL. We know that Deepmind's AlphaProof can get IMO Silver Medal performance (with caveats) while writing proofs in Lean. There's no reason to expect that performance...
See livestream, site, OpenAI thread, Nat McAleese thread.
OpenAI announced (but isn't yet releasing) o3 and o3-mini (skipping o2 because of telecom company O2's trademark). "We plan to deploy these models early next year." "o3 is powered by further scaling up RL beyond o1"; I don't know whether it's a new base model.
o3 gets 25% on FrontierMath, smashing the previous SoTA. (These are really hard math problems.[1]) Wow. (The dark blue bar, about 7%, is presumably one-attempt and most comparable to the old SoTA; unfortunately OpenAI didn't say what the light blue bar is, but I think it doesn't really matter and the 25% is for real.[2])
o3 also is easily SoTA on SWE-bench Verified and Codeforces.
It's also easily SoTA on ARC-AGI, after doing RL on the public ARC-AGI...
Thank you so much for your research! I would have never found these statements.
I'm still quite suspicious. Why would they be "including a (subset of) the public training set"? Is it accidental data contamination? They don't say so. Do they think simply including some questions and answers without reinforcement learning or reasoning would help the model solve other such questions? That's possible but not very likely.
Were they "including a (subset of) the public training set" in o3's base training data? Or in o3's reinforcement learning problem/answer sets?
A...
My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I and others have had on how to orient to short timelines.
I’ve led multiple small workshops on orienting to short AGI timelines and compiled the wisdom of around 50 participants (but mostly my thoughts) here. I’ve also participated in multiple short-timelines AGI wargames and co-led one wargame.
This post will assume median AGI timelines of 2027 and will not spend time arguing for this point. Instead, I focus on what the implications of 3 year timelines would be.
I didn’t update much on o3 (as my timelines were already short) but I imagine some readers did and might feel disoriented now. I hope...
Note that "The AI Safety Community" is not part of this list. I think external people without much capital just won't have that much leverage over what happens.
What would you advise for external people with some amount of capital, say $5M? How would this change for each of the years 2025-2027?
I haven’t found many credible reports on what algorithms and techniques have been used to train the latest generation of powerful AI models (including OpenAI’s o3). Some reports suggest that reinforcement learning (RL) has been a key part, which is also consistent with what OpenAI officially reported about o1 three months ago.
The use of RL to enhance the capabilities of AGI[1] appears to be a concerning development. As I wrote previously, I have been hoping to see AI labs stick to training models through pure language modeling. By “pure language modeling” I don’t rule out fine-tuning with RLHF or other techniques designed to promote helpfulness/alignment, as long as they don’t dramatically enhance capabilities. I’m also okay with the LLMs being used as part of more complex AI systems that invoke many...
I think we're working with a different set of premises, so I'll try to disentangle a few ideas.
First, I completely agree with you that building superhuman AGI carries a lot of risks, and that society broadly isn't prepared for the advent of AI models that can perform economically useful labor.
Unfortunately, economic and political incentives being what they are, capabilities research will continue to happen. My more specific claim is that conditional on AI being at a given capabilities level, I prefer to reach that level with less capable text generators an...