"Short AI timelines" have recently become mainstream. One now routinely hears the claim that somewhere in the 2026-2028 interval, we'll have AI systems that outperform humans in basically every respect.
For example, the official line from Anthropic holds that "powerful AI" will likely arrive in late 2026 or in 2027. Anthropic's OSTP submission (3/6/2025) says (emphasis in original):[1]
Based on current research trajectories, we anticipate that powerful AI systems could emerge as soon as late 2026 or 2027 [...]
Powerful AI technology will be built during this Administration [i.e. roughly by EOY 2028 -nost]
where "powerful AI" means, among other things:
- In terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.
- In addition to just being a “smart thing you talk to”, it has all the “interfaces” available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world.
Anthropic's expectations are relatively aggressive even by short-timelines standards, but it seems safe to say that many well-informed people expect something like "powerful AI" by 2030, and quite likely before that[2].
OK, so let's suppose that by some year 20XX, we will have AIs (probably scaffolded LLMs or similar) which are
smarter than a Nobel Prize winner across most relevant fields
and can
prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.
This would, obviously, be a system capable of writing things that we deem worth reading.
Amodei explicitly says it would be able to "write extremely good novels." And presumably it would also be able to write extremely good scientific papers, given the mention of the Nobel Prize.
What about blog posts, or blog comments? Surely it would be exceptionally good at those kinds of writing, too, right?
Indeed, "being good at blogging" is a vastly lower bar than the standards Amodei states or implies about the writing abilities of "powerful AI." Consider that:
- The de facto quality standards for blog posts/comments are much lower than the standards for novels or scientific papers.
- As readers, we mostly just require blog posts to be "interesting" or "thought-provoking" in some way or other, while being relatively relaxed about various elements that we hold to a higher standard in more "professional" modes of writing.
- This weighs in favor of LLMs being good at blogging (I think?). They have a ton of declarative knowledge (and thus they know a lot of things which you and I don't know, and could in principle synthesize novel insights from them), but also also tend to make a lot of weird and unpredictable errors that would seem strange coming from a human with a comparable amount of subject-matter knowledge. As readers, we expect a fairly high "error rate" in online content and mostly want it to provide us with interesting ideas that we can independently verify and possibly improve upon, and we value these ideas even if they come to us in a "flawed package."
- Blog posts and comments tend to be short (and in particular, much shorter than novels).
- This weighs in favor of LLMs being good at them, because LLMs seem to struggle with "long-horizon" tasks more than humans with a comparable amount of subject-matter knowledge.
- That is: one might worry that such long-horizon issues would hold LLMs back at novel-writing, even if they were good at writing short-form fiction. But blogging is inherently short-form, so these worries don't apply.[3]
- LLMs are, to a first approximation, generative models of text content scraped from the web.
- Intuitively, it seems like blogging should "come naturally" to them. While other tasks like coding or math-problem-solving might require specialized synthetic data, RL, etc., blogging seems like it should just "come for free" – a central example of a task which likelihood training on large-scale web data implicitly includes as a subtask.
- In Situational Awareness, Aschenbrenner argues that "automated AI researchers will be very smart" because – among other things – they'll "be able to read every single ML paper ever written." Insofar as this argument works for ML papers, it should also work for blogging: existing LLMs have presumably "read" a vastly larger quantity of internet discussion than any of us have, with (presumably...?) a commensurately deep familiarity with the norms and nuances of this form of human communication.
But note that currently existing LLMs do not cross this quality bar.
None of the blog content we read is primarily LLM-authored, except in special cases where someone is trying to prove a point[4].
The same is true for blog comments as well.
On LessWrong – which could well be the internet's premier hub for short-timelines views – LLM-written content is typically removed by moderators on grounds such as:
LLM content is generally not good enough for LessWrong, and in particular we don't want it from new users who haven't demonstrated a more general track record of good content.
More generally, it seems like the vast majority of people who engage with LLMs – even people who are bullish on capabilities, even people with short timelines – hold an extremely low opinion of LLM-written content, as such.
In cases where LLMs are considered useful or valuable, the text itself is typically a means to a narrow and user-specified end: we care about a specific judgment the LLM has made, or a specific piece of information it has relayed to us. If we actually read its outputs at all, it is usually for the sake of "extracting" a specific nugget of info that we expect to be there before we've even begun reading the words.
Very few people read this stuff in the expectation that they'll find "straightforwardly good," thought-provoking writing, of the sort that humans produce in large volumes every single day. And that's because, for the most part, LLMs do not produce this type of thing, even when we explicitly request it.
On the face of it, isn't this really, really weird?
We have these amazing systems, these artificial (quasi-?)minds that are proficient in natural language, with seriously impressive math and coding chops and long-tail expert knowledge...
...and their writing is "generally not good enough for LessWrong"?!
We have these spookily impressive AIs that are supposedly going to become world-class intellectuals within a few years – that will supposedly write novels (and "extremely good" novels at that!)[5], that will be capable of substituting in for large fractions of the workforce and doing Nobel-quality scientific thinking...
...and we don't let them post in online discussion venues, because (we claim) your average mildly-interesting non-expert online "poster" has some crucial capability which they still lack?
We have honest-to-god artificial intelligences that could almost certainly pass the Turing Test if we wanted them to...
...and we're not interested in what they have to say?
Here's a simple question for people who thing something like "powerful AI" is coming very soon:
When do we expect LLMs to become capable of writing online content that we actually think is worth reading?[6]
(And why are they not already doing so?)
Assuming short timelines, the answer cannot be later than the time we expect "powerful AI" or its equivalent, since "powerful AI" trivially implies this capability.
However, the capability is not here yet, and it's not obvious to me where we specifically expect it to come from.
It's not a data problem: pretraining already includes more than enough blog posts (one would think?), and LLMs already "know" all kinds of things that could be interesting to blog about.
In some sense it is perhaps a "reasoning" problem – maybe LLMs need to think for a long time to come up with insights worthy of blogging about? – but if so, it is not the kind of reasoning problem that will obviously get solved "for free" through RL on math and coding puzzles.
(Likewise, one could arguably frame this as a problem about insufficient "agency," but it is mysterious to me where the needed "agency" is supposed to come from given that we don't have it already.
Or, to take yet another angle, this could be a limitation of HHH assistant chatbots which might be overcome by training for a different kind of AI "character" – but again, this is something that requires more than just scaling + automated AI researchers[7], and a case would need to be made that it will happen swiftly and easily in the near term, despite ~no progress on such things since the introduction of the current paradigm in Anthropic's pre-ChatGPT HHH research.)
What milestone(s) will near-future systems need to cross to grant them this capability? When should I expect those milestones to be crossed? And why hasn't this already happened?
P. S. this question feels closely related to Cole Wyeth's "Have LLMs Generated Novel Insights?" But it strikes me as independently interesting, because it sets a very concrete and low bar for the type and depth of "insight" involved.
You don't need to do groundbreaking science to write a blog worth reading; you don't need to be groundbreaking at all; you just need to say something that's in some way novel or interesting, with fairly generous and broad definitions of those terms. And yet...
- ^
See also Jack Clark's more specific formulation of the same timeframe here: "late 2026, or early 2027"
- ^
E.g. Miles Brundage (ex-OpenAI) writes:
AI that exceeds human performance in nearly every cognitive domain is almost certain to be built and deployed in the next few years.
and Daniel Kokotajlo (also ex-OpenAI) has held similar views for a long time now.
- ^
Although perhaps familiarity with recent discussion is a bottleneck here: to write a high-quality comment, you may need to read not only the post you're commenting on, but also all the other comments so far, and a number of other posts by the same author or from the same community. Still, the individual units of content are so short that one can imagine it all fitting within a single context window in a typical case, esp. if the model's context window is on the long end of today's SOTA (say, 1M+ tokens).
- ^
Or where a human is leveraging the deficiencies/weirdness of LLMs for the sake of art and/or comedy, as opposed to trying to produce "straightforwardly good" content that meets the standards we apply to humans. This was the case in my own LLM-authored blog project, which ran from 2019-2023.
- ^
As you may be able to tell, I am even more skeptical about Amodei's "extremely good" novel-writing thing than I am about most of the other components of the near-term "powerful AI" picture.
LLMs are remarkably bad at fiction writing (long-form especially, but even short-form). This is partially due to HHH chat tuning (base models are better), but not entirely, and anyways I don't see Amodei or anyone else saying "hey, we need to break out of the HHH chat paradigm because it's holding back fiction writing capabilities," so in practice I expect we'll continue to get HHH chatbots with atrocious fiction-writing abilities for the indefinite future.As far as I can tell there's been very little progress on this front at least since GPT-4 (and possibly earlier), probably because of factors like
- the (seemingly mistaken?) assumption that this is one of the capabilities that just comes for free with scaling
- it's hard to programmatically measure quality
- low/unclear economic value, compared to things like coding assistance
- it's not a capability that people at LLM labs seem to care about very much
Writing novels is much, much more intellectually challenging than blogging (I say, as someone who has done both). I focus on blogging in this post in part because it's such a low bar compared to stuff like this.
- ^
By "we" I mean something like "me, the guy writing this post, and you, the person reading it, and others with broadly similar preferences about what we read online."
And when I say "content that we think is worth reading," I'm just trying to say "content that would be straightforwardly good if a human wrote it." If LLMs become capable of writing some weird type of adversarial insight-porn that seems good despite not resembling anything a human would write, that doesn't count (though it would be very interesting, and of course bad, if that were to happen).
- ^
I mean, yes, there is some sense in which sufficiently good "automated AI researchers" would trivially solve every non-impossible problem. They're smart, aren't they? If there's a good idea, won't they find it, because they're smart? But this kind of thing runs into a chicken-and-egg problem: if these pre-AGI automated researchers are so smart, why aren't they good at blogging? (And if they are good at blogging, then it would have been our work – not theirs – that created the capability, and we still have to explain how we'll manage to do that.)
The quoted sentence is about what people like Dario Amodei, Miles Brundage, and @Daniel Kokotajlo predict that AI will be able to do by the end of the decade.
And although I haven't asked them, I would be pretty surprised if I were wrong here, hence "surely."
In the post, I quoted this bit from Amodei:
Do you really think that he means "it can do 'any actions, communications, or remote operations enabled by this interface' with a skill exceeding that of the most capable humans in the world – except for writing blog posts or comments"?
Do you think he would endorse this caveat if I were to ask him about it?
If so, why?
Likewise with Brundage, who writes:
I mean, he did say "nearly every," so there are some "cognitive domains" in which this thing is still not superhuman. But do we really think that Brundage thinks "blogging" is likely to be an exception? Seriously?
(Among other things, note that both of these people are talking about AIs that could automate basically any job doable by a remote worker on a computer. There exist remote jobs which require communication skills + having-interesting-ideas skills such that doing them effectively involves "writing interesting blog posts," just in another venue, e.g. research reports, Slack messages... sometimes these things are even framed as "posts on a company-internal blog" [in my last job I often wrote up my research in posts on a "Confluence blog"].
If you suppose that the AI can do these sorts of jobs, then you either have to infer it's good at blogging too, or you have to invent some very weirdly shaped generalization failure gerrymandered specifically to avoid this otherwise natural conclusion.)