I think that's a pretty plausible way the world could be, yes.
I still expect the Singularity somewhere in the 2030s, even under that model.
GPT-5 ought to come around September 20, 2028, but Altman said today it'll be out within months
I don't think what he said meant what you think it meant. Exact words:
In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3
The "GPT-5" he's talking about is not the next generation of GPT-4, not an even bigger pretrained LLM. It is some wrapper over GPT-4.5/Orion, their reasoning models, and their agent models. My interpretation is that "GPT-5" the product and GPT-5 the hypothetical 100x-bigger GPT mo...
followed by the promised ‘GPT-5’ within months that Altman says is smarter than he is
Ha, good catch. I don't think GPT-5 the future pretrained LLM 100x the size of GPT-4 which Altman says will be smarter than he is, and GPT-5 the Frankensteinian monstrosity into which OpenAI plan to stitch all of their current offerings, are the same entity.
Edit: It occurred to me that this reading may not be obvious. Look at Altman's Exact Words:
...In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no
Technology for efficient human uploading. Ideally backed by theory we can independently verify as correct and doing what it's intended to do (rather than e. g. replacing the human upload with a copy of the AGI who developed this technology).
What are the current best theories regarding why Altman is doing the for-profit transition at all?
I don't buy the getting-rid-of-profit-caps motive. It seems to live in a fantasy universe in which they expect to have a controllable ASI on their hands, yet... still be bound by economic rules and regulations? If OpenAI controls an ASI, OpenAI's leadership would be able to unilaterally decide where the resources go, regardless of what various contracts and laws say. If the profit caps are there but Altman wants to reward loyal investors, all profits will go t...
If OpenAI controls an ASI, OpenAI's leadership would be able to unilaterally decide where the resources go, regardless of what various contracts and laws say. If the profit caps are there but Altman wants to reward loyal investors, all profits will go to his cronies. If the profit caps are gone but Altman is feeling altruistic, he'll throw the investors a modest fraction of the gains and distribute the rest however he sees fit. The legal structure doesn't matter; what matters is who physically types what commands into the ASI control terminal.
Sama knows this but the investors he is courting don't, and I imagine he's not keen to enlighten them.
Altman might be thinking in terms of ASI (a) existing and (b) holding all meaningful power in the world. All the people he's trying to get money from are thinking in terms of AGI limited enough that it and its owners could be brought to heel by the legal system.
I do think RL works "as intended" to some extent, teaching models some actual reasoning skills, much like SSL works "as intended" to some extent, chiseling-in some generalizable knowledge. The question is to what extent it's one or the other.
Some more evidence that whatever the AI progress on benchmarks is measuring, it's likely not measuring what you think it's measuring:
...AIME I 2025: A Cautionary Tale About Math Benchmarks and Data Contamination
AIME 2025 part I was conducted yesterday, and the scores of some language models are available here:
https://matharena.ai thanks to @mbalunovic, @ni_jovanovic et al.I have to say I was impressed, as I predicted the smaller distilled models would crash and burn, but they actually scored at a reasonable 25-50%.
That was surprising to me! Since these
One way to make this work is to just not consider your driven-to-madness future self an authority on the matter of what's good or not. You can expect to start wishing for death, and still take actions that would lead you to this state, because present!you thinks that existing in a state of wishing for death is better than not existing at all.
I think that's perfectly coherent.
No, I mean, I think some people actually hold that any existence is better than non-existence, so death is -inf for them and existence, even in any kind of hellscape, is above-zero utility.
I think "enforce NAP then give everyone a giant pile of resources to do whatever they want with" is a reasonable first-approximation idea regarding what to do with ASI, and it sounds consistent with Altman's words.
But I don't believe that he's actually going to do that, so I think it's just (3).
Good point regarding GPT-"4.5". I guess I shouldn't have assumed that everyone else has also read Nesov's analyses and immediately (accurately) classified them as correct.
that is a surprising amount of information
Is it? What of this is new?
To my eyes, the only remotely new thing is the admission that "there’s a lot of research still to get to [a coding agent]".
The estimate of the compute of their largest version ever (which is a very helpful way to phrase it) at only <=50x GPT-4 is quite relevant to many discussions (props to Nesov) and something Altman probably shouldn't've said.
The estimate of test-time compute at 1000x effective-compute is confirmation of looser talk.
The scientific research part is of uncertain importance but we may well be referring back to this statement a year from now.
You're not accounting for enemy action. They couldn't have been sure, at the onset, how successful the AI Notkilleveryoneism faction will be at raising alarm, and in general, how blatant the risks will become to the outsiders as capabilities progress. And they have been intimately familiar with the relevant discussions, after all.
So they might've overcorrected, and considered that the "strategic middle ground" would be to admit the risk is plausible (but not as certain as the "doomers" say), rather than to deny it (which they might've expected to become a delusional-looking position in the future, so not a PR-friendly stance to take).
Or, at least, I think this could've been a relevant factor there.
My model is that Sam Altman regarded the EA world as a memetic threat, early on, and took actions to defuse that threat by paying lip service / taking openphil money / hiring prominent AI safety people for AI safety teams.
Like, possibly the EAs could have crea ed a widespread vibe that building AGI is a cartoon evil thing to do, sort of the way many people think of working for a tobacco company or an oil company.
Then, after ChatGPT, OpenAI was a much bigger fish than the EAs or the rationalists, and he began taking moves to extricate himself from them.
Any chance you can post (or PM me) the three problems AIs have already beaten?
This really depends on the definition of "smarter". There is a valid sense in which Stockfish is "smarter" than any human. Likewise, there are many valid senses in which GPT-4 is "smarter" than some humans, and some valid senses in which GPT-4 is "smarter" than all humans (e. g., at token prediction). There will be senses in which GPT-5 will be "smarter" than a bigger fraction of humans compared to GPT-4, perhaps being smarter than Sam Altman under a bigger set of possible definitions of "smarter".
Will that actually mean anything? Who knows.
By playing with...
If anyone is game for creating an agentic research scaffold like that Thane describes
Here's a more detailed the basic structure as envisioned after 5 minutes' thought:
I really wonder how much of the perceived performance improvement comes from agent-y training, as opposed to not sabotaging the format of the model's answer + letting it do multi-turn search.
Compared to most other LLMs, Deep Research is able to generate reports up to 10k words, which is very helpful for providing comprehensive-feeling summaries. And unlike Google's analogue, it's not been fine-tuned to follow some awful deliberately shallow report template[1].
In other words: I wonder how much of Deep Research's capability can be replicated by putting Claud...
Ah, I think I see my error.
Consider a CoT of the form "[problem statement]: ...........", which always outputs dots regardless of the problem statement. Can that still encode useful computations, with the information being passed through intermediate self-attention? Yes, but: this only increases parallel-computing capabilities, not serial-computing capabilities.
And in the framework I'd outlined, under the Paraphraser, for the purposes of scheme-y thinking, even a legible CoT would consist of such "filler" tokens.
So yep, you're right, this does force all scheming to be taking place within one forward pass.
I don't see why it would be harder to do.
Suppose you tasked a model with refactoring some database, and it secretly wants to sabotage its functionality. It outputs a CoT segment that goes, "I need to design unit tests for this module". Each word of this CoT is mapped to some internal representation, and the internal representation consists of "on-task" human-intended reasoning , and misaligned scheming (corresponding to, e. g., a tactical decision to fail to include unit tests for some subtle issue). "I" is mapped to , "need" ...
Some thoughts on protecting against LLM steganography.
tl;dr: I suspect the Paraphraser idea is near-completely ineffectual as you scale the capabilities, both for scheming and for general CoT interpretability.
In the reasoning-as-communication framework, the amount of information a fixed-length message/CoT can transfer is upper-bounded by information-theoretic constraints. That is, if we have a ten-bit message, it cannot communicate more than ten bits of information: it cannot always uniquely specify one internal state of an AI model from more than ...
Such an entity as described could absolutely be an existential threat to humanity
I agree. I think you don't even need most of the stuff on the "superhuman" list, the equivalent of a competent IQ-130 human upload probably does it, as long as it has the speed + self-copying advantages.
The problem with this neat picture is reward-hacking. This process wouldn't optimize for better performance on fuzzy tasks, it would optimize for performance on fuzzy tasks that looks better to the underlying model. And much like RLHF doesn't scale to superintelligence, this doesn't scale to superhuman fuzzy-task performance.
It can improve the performance a bit. But once you ramp up the optimization pressure, "better performance" and "looks like better performance" would decouple from each other and the model would train itself into idiosyncratic uselessne...
The above toy model assumed that we're picking one signal at a time, and that each such "signal" specifies the intended behavior for all organs simultaneously...
... But you're right that the underlying assumption there was that the set of possible desired behaviors is discrete (i. e., that X in "kidneys do X" is a discrete variable, not a vector of reals). That might've indeed assumed me straight out of the space of reasonable toy models for biological signals, oops.
Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.
The point wouldn't be to direct it, but to have different mixtures of chemicals (and timings) to mean different things to different organs.
Loose analogy: Suppose that the intended body behaviors ("kidneys do X, heart does Y, brain does Z" for all combinations of X, Y, Z) are latent features, basic chemical substances and timings are components of the input vector, and there are dramatically more ...
Have you looked at samples of CoT of o1, o3, deepseek, etc. solving hard math problems?
Certainly (experimenting with r1's CoTs right now, in fact). I agree that they're not doing the brute-force stuff I mentioned; that was just me outlining a scenario in which a system "technically" clears the bar you'd outlined, yet I end up unmoved (I don't want to end up goalpost-moving).
Though neither are they being "strategic" in the way I expect they'd need to be in order to productively use a billion-token CoT.
...Anyhow, this is nice, because I do expect that probably
Not for math benchmarks. Here's one way it can "cheat" at them: suppose that the CoT would involve the model generating candidate proofs/derivations, then running an internal (learned, not hard-coded) proof verifier on them, and either rejecting the candidate proof and trying to generate a new one, or outputting it. We know that this is possible, since we know that proof verifiers can be compactly specified.
This wouldn't actually show "agency" and strategic thinking of the kinds that might generalize to open-ended domains and "true" long-horizon tasks. In ...
Prooobably ~simultaneously, but I can maybe see it coming earlier and in a way that isn't wholly convincing to me. In particular, it would still be a fixed-length task; much longer-length than what the contemporary models can reliably manage today, but still hackable using poorly-generalizing "agency templates" instead of fully general "compact generators of agenty behavior" (which I speculate humans to have and RL'd LLMs not to). It would be some evidence in favor of "AI can accelerate AI R&D", but not necessarily "LLMs trained via SSL+RL are AGI-comp...
I wish we had something to bet on better than "inventing a new field of science,"
I've thought of one potential observable that is concrete, should be relatively low-capability, and should provoke a strong update towards your model for me:
If there is an AI model such that the complexity of R&D problems it can solve (1) scales basically boundlessly with the amount of serial compute provided to it (or to a "research fleet" based on it), (2) scales much faster with serial compute than with parallel compute, and (3) the required amount of human attention ("...
Well, he didn't do it yet either, did he? His new announcement is, likewise, just that: an announcement. Manifold is still 35% on him not following through on it, for example.
"Intensely competent and creative", basically, maybe with a side of "obsessed" (with whatever they're cracked at).
Supposedly Trump announced that back in October, so it should already be priced in.
(Here's my attempt at making sense of it, for what it's worth.)
Trump "announces" a lot of things. It doesn't matter until he actually does them.
Here's a potential interpretation of the market's apparent strange reaction to DeepSeek-R1 (shorting Nvidia).
I don't fully endorse this explanation, and the shorting may or may not have actually been due to Trump's tariffs + insider trading, rather than DeepSeek-R1. But I see a world in which reacting this way to R1 arguably makes sense, and I don't think it's an altogether implausible world.
If I recall correctly, the amount of money globally spent on inference dwarfs the amount of money spent on training. Most economic entities are not AGI labs training n...
I don't have deep expertise in the subject, but I'm inclined to concur with the people saying that the widely broadcast signals don't actually represent one consistent thing, despite your plausible argument to the contrary.
Here's a Scott Alexander post speculating why that might be the case. In short: there was an optimization pressure towards making internal biological signals very difficult to decode, because easily decodable signals were easy target for parasites evolving to exploit them. As the result, the actual signals are probably represented as "un...
Hmm. This does have the feel of gesturing at something important, but I don't see it clearly yet...
Free association: geometric rationality.
MIRI's old results argue that "corrigibility via uncertainty regarding the utility function" doesn't work, because if the agent maximizes expected utility anyway, it doesn't matter one whit whether we're taking expectation over actions or over utility functions. However, the corrigibility-via-instrumental-goals does have the feel of "make the agent uncertain regarding what goals it will want to pursue next". Is there, t...
However, the corrigibility-via-instrumental-goals does have the feel of "make the agent uncertain regarding what goals it will want to pursue next".
That's an element, but not the central piece. The central piece (in the subagents frame) is about acting-as-though there are other subagents in the environment which are also working toward your terminal goal, so you want to avoid messing them up.
The "uncertainty regarding the utility function" enters here mainly when we invoke instrumental convergence, in hopes that the subagent can "act as though other subage...
Coming back to this in the wake of DeepSeek r1...
I don't think the cumulative compute multiplier since GPT-4 is that high, I'm guessing 3x, except perhaps for DeepSeek-V3, which wasn't trained compute optimally and didn't use a lot of compute, and so it remains unknown what happens if its recipe is used compute optimally with more compute.
How did DeepSeek accidentally happen to invest precisely the amount of compute into V3 and r1 that would get them into the capability region of GPT-4/o1, despite using training methods that clearly have wildly different r...
Hm, that's not how it looks like to me? Look at the 6-months plot:
The latest growth spurt started January 14th, well before the Stargate news went public, and this seems like just its straight-line continuation. The graph past the Stargate announcement doesn't look "special" at all. Arguably, we can interpret it as Stargate being part of the reference class of events that are currently driving GE Vernova up – not an outside-context event as such, but still relevant...
Except for the fact that the market indeed did not respond to Stargate immediately. Which ...
Overall, I think that the new Stargate numbers published may (call it 40%) be true, but I also think there is a decent chance this is new administration trump-esque propoganda/bluster (call it 45%),
I think it's definitely bluster, the question is how much of a done deal it is to turn this bluster into at least $100 billion.
I don't this this changes the prior expected path of datacenter investment at all. It's precisely how the expected path was going to look like, the only change is how relatively high-profile/flashy this is being. (Like, if they invest $3...
So, Project Stargate. Is it real, or is it another "Sam Altman wants $7 trillion"? Some points:
McLaughlin announced 2025-01-13 that he had joined OpenAI
He gets onboarded only on January 28th, for clarity.
My point there is that he was talking to the reasoning team pre-hiring (forget 'onboarding', who knows what that means), so they would be unable to tell him most things - including if they have a better reason than 'faith in divine benevolence' to think that 'more RL does fix it'.
I think GPT-4 to o3 represent non-incremental narrow progress, but only, at best, incremental general progress.
(It's possible that o3 does "unlock" transfer learning, or that o4 will do that, etc., but we've seen no indication of that so far.)
I'm sure he wants hype, but he doesn't want high expectations that are very quickly falsified
There's a possibility that this was a clown attack on OpenAI instead...
Valid, I was split on whether it's worth posting vs. it'd be just me taking my part in spreading this nonsense. But it'd seemed to me that a lot of people, including LW regulars, might've been fooled, so I erred on the side of posting.
As I'd said, I think he's right about the o-series' theoretic potential. I don't think there is, as of yet, any actual indication that this potential has already been harnessed, and therefore that it works as well as the theory predicts. (And of course, the o-series scaling quickly at math is probably not even an omnicide threat. There's an argument for why it might be – that the performance boost will transfer to arbitrary domains – but that doesn't seem to be happening. I guess we'll see once o3 is public.)
The thing is - last time I heard about OpenAI rumors it was Strawberry.
That was part of my reasoning as well, why I thought it might be worth engaging with!
But I don't think this is the same case. Strawberry/Q* was being leaked-about from more reputable sources, and it was concurrent with dramatic events (the coup) that were definitely happening.
In this case, all evidence we have is these 2-3 accounts shitposting.
Alright, so I've been following the latest OpenAI Twitter freakout, and here's some urgent information about the latest closed-doors developments that I've managed to piece together:
Here's the sequence of even...
It all started from Sam's six words story. So it looks like as organized hype.
I personally put a relatively high probability of this being a galaxy brained media psyop by OpenAI/Sam Altman.
Eliezer makes a very good point that confusion around people claiming AI advances/whistleblowing benefits OpenAI significantly, and Sam Altman has a history of making galaxy brained political plays (attempting to get Helen fired (and then winning), testifying to congress that it is good he has oversight via the board and he should not be full control of OpenAI and then replacing the board with underlings, etc).
Sam is very smart and politically capable. This feels in character.
If you're wondering why OAers are suddenly weirdly, almost euphorically, optimistic on Twitter
For clarity, which OAers this is talking about, precisely? There's a cluster of guys – e. g. this, this, this – claiming to be OpenAI insiders. That cluster went absolutely bananas the last few days, claiming ASI achieved internally/will be in a few weeks, alluding to an unexpected breakthrough that has OpenAI researchers themselves scared. But none of them, as far as I can tell, have any proof that they're OpenAI insiders.
On the contrary: the Satoshi guy straight...
Current take on the implications of "GPT-4b micro": Very powerful, very cool, ~zero progress to AGI, ~zero existential risk. Cheers.
First, the gist of it appears to be:
...OpenAI’s new model, called GPT-4b micro, was trained to suggest ways to re-engineer the protein factors to increase their function. According to OpenAI, researchers used the model’s suggestions to change two of the Yamanaka factors to be more than 50 times as effective—at least according to some preliminary measures.
The model was trained on examples of protein sequences from many species, as
Fair enough, I suppose calling it an outright wrapper was an oversimplification. It still basically sounds like just the sum of the current offerings.