What if LLMs are mostly crystallized intelligence?

deep

Summary

LLMs are better at developing crystallized intelligence than fluid intelligence. That is: LLM training is good at building crystallized intelligence by learning patterns from training data, and this is sufficient to make them surprisingly skillful at lots of tasks. But for a given capability level in the areas they’ve trained on, LLMs have very weak fluid intelligence compared to humans. For example, two years ago I thought human-level SAT performance would mean AGI, but turns out LLMs can do great at the SAT while being mediocre at lots of other tasks.

I’m not saying LLMs are just parrots (that’s dumb).^[1] There’s a continuity between crystallized and fluid intelligence.

At the extreme “crystal” end we have shallow locally-valid heuristics. Pure pattern matching. Now-largely-debunked “stochastic parrot” hypothesis.
At the extreme end of “fluid” you have a cross between an idealized consultant, a Renaissance man, and MacGuyver. A deep world model and general reasoning, able to come to grips with any particular environment and problem, and to invent new tools and concepts on the fly.
Some other ways to gesture at this: what n-gram of Markov chain you’d need to capture a behavioral pattern; number of tasks the pattern is relevant for. More fluid systems compress a lot of useful behavioral detail into a small amount of brain-space.

Empirically, it’s unclear how fluid their intelligence is: we see both general reasoning skills and jaggedness.

e.g. they’re good at playing Diplomacy without specialized RL or (I assume) much raw training data;
They’re good at ARC-AGI despite presumably not seeing this type of challenge before.

It’s worth considering: what if fluid intelligence progress is relatively slow, and LLM capabilities mostly grow with relevant training data?

This could imply slower AI progress, especially if general-purpose data runs dry relatively soon. (Epoch estimates 2026-2032.) That means companies will need to prioritize specialized data collection/generation, which will lead to jagged capabilities growth favoring the prioritized areas.

[Epistemic status: I only put like 20% on worlds where this dynamic puts a serious damper on AI progress compared to e.g. the AI Futures Project’s median timelines. It’s important to stay aware of these possibilities, though, and track the relevant evidence.]

Implications for AI futures

This suggests that we shouldn’t naively extrapolate forward from e.g. the METR AI R&D benchmark to real-world AI R&D improvement, for two reasons:

1) quantitative differences: longer-time tasks will be more data-poor, will rely more on fluid intelligence skills that they don’t have the data or the context to apply. (training data may suggest some of the right heuristics, but they might not know which ones to apply or in what sequence.)
2) qualitative differences: METR is measuring performance on relatively closed-form tasks. Open-ended tasks may be much harder.

Likewise, this suggests that simply scaling LLM training won’t get us to omni-competence.

But “just scaling LLMs” and “scale LLMs ‘til they’re superhuman AI R&D coders, then use those to build next-gen AI” are the two main stories for how we get to AGI very fast!

We should still expect significant progress on AI R&D. The AI labs are explicitly training for AI R&D, and have clearly hit superhuman capability in some coding-related areas (cybersecurity).

But the shape and speed of the takeoff curve matters. It matters a lot if, say, the METR time horizon hits 1 month, but we actually don’t have anything like a drop-in senior AI R&D researcher, just a really really good team of assistants. The labs still need to spend a bunch of serial time running compute-expensive experiments, and their AI tools can only moderately improve experiment selection. That could mean they get to, say, a 10x speedup over years of grueling effort. That’s much slower than AI2027 expects.

Crucially, for as long as AIs are great at technical work but mediocre at fluid intelligence, that’s great news for AI safety.

But a major caveat is: I expect at some point we’ll see people devise new paradigms that are more data-efficient, and at that point all our safety techniques and assumptions might no longer hold.

We should check if this is true!

I’d be really excited for tests of capabilities like:

Recognizing when they’re wrong or uncertain.
Self-management — e.g. Claude Plays Pokemon giving itself bad notes and getting stuck.
Meta reasoning, e.g. identifying “this situation seems contrived”.
Performance on novel games. Especially ones where their heuristics from other games don’t transfer over.
Performing well when their heuristics need to be reversed. You could design a “trap” game that preys on people who are using normal heuristics.
Re-learning. If you unlearn some knowledge or principles from a model, can it rederive them from first principles?

Modeling worlds where AI progress is hungry for domain data

Here’s a set of claims, call this the “hungry for domain data” hypothesis:

We’re approaching the ceiling on human-generated training data.
Further training will need to rely on synthetic data or on massively scaling up domain data, which we can’t easily do for all domains.
Models won’t generalize super well, so performance will be data-bottlenecked in many domains.

What types of areas see progress in this model?

I imagine we’ll have a base AI optimized for AI R&D, which gets trained to develop synthetic-data sources for domains that are amenable to simulation and/or to automated evaluation (for RL). Then those data sources are used to train AIs.

Domains will see more progress if:

AI companies can easily generate syndata.
- Some spaces are already easy to simulate or auto-evaluate. eg: digital games, programming, math.
- Here’s a spicier hypothesis: syndata for robotics might be pretty feasible to generate at scale.
- - Macro-level Newtonian physics isn’t too hard to simulate.
  - Probably there’s a fair bit of schlep in very accurately modeling how a particular actuator moves, so that could lock you in to standard robot designs, or at least standard parts.
  - A friend who knows ML says that most work on robotics these days is on coping with diverse environments, e.g. home layouts, lightning conditions, materials. It seems a lot less obvious that syndata works for these, but it still seems possible.
AI companies can easily generate real-world data.
- Via commercial deployment: the space is easy to directly engage with via a large number of sensors & actuators that AIs can usefully plug into.
- - For today’s AIs: chatbot interactions, practical coding projects.
  - For future AIs: maybe lots of practical business operations for digital agents, and diverse environments (homes, roads, factories to operate in) for robotics
- Via high-throughput experiments.
- - These could be very expensive if you need a lot of data — atoms are much more expensive than bits! So you might only do it if you’re willing to make a big bet on particular domains.
  - - But there might be big bets for AI lab leaders to make; there’s long been an expectation among futurists that AI would unlock truly advanced molecular design. (I think this is plausible, but would be highly dual-use and destabilizing by default.)
  - Here’s an interesting lens to apply to different domains: how many high-quality token-equivalents can you get per second of real-world sensory data? Per dollar of capital expenditure?
  - Ramping up experiments seems especially necessary for medicine and in vivo biology, where there are so many complex interactions to predict.
  - 3D printing strikes me as one area where it’s pretty fast to build things, the combinatorial design space is large, and so AI might be able to find a bunch of valuable stuff.
Huge amounts of data exist, and progress so far has been significantly bottlenecked on acquiring or processing this data. Some guesses: protein structure, genomics, finance.
- Note that data might become a bottleneck once AI training eats up the current stock of data. Or once we’re trying to do stuff out of distribution, e.g. designing novel proteins (that look very different from natural ones) as an early step towards advanced nanotechnology.

There are also stories for how advanced AIs could route around data bottlenecks:

Advanced AIs will be able to write much better simulations than currently exist. (Unclear which domains this is true of; seems like a very important question).
With existing data, advanced AIs will be able to intuit key principles that give them a much better ability to “one shot” the problem.
- Might be true for some areas of chemistry, molecular biology, & materials science? We have a lot of data and know the underlying principles, and human intuitions for molecular scale behavior are poor, so there might be a bunch of gains to grab here.
- I expect this effect is most likely to have impact via catalyzing further improvements: enabling AIs to build better simulations or encouraging companies to invest in real-world deployments or experiments.

Which concrete domains see progress?

LLMs will get very good at coding and math, but in a way that doesn’t generalize to other domains; e.g. their time horizon on these tasks, or their capacity for superhuman performance, will outpace ~all other tasks.
Large neural networks will get very good at other data-heavy areas, once the data pipelines are set up:
- Business operations
- Robotic operation
- Maybe some data-rich areas of science
The resulting mixing-and-matching of technical skill, broad domain knowledge, and medium-generality intuitions could dominate existing human organizations in valuable areas.
- E.g. lots of business tasks, smaller-scale coding projects, lots of jobs that mostly need decent qualitative reasoning & attention to detail.
- I wouldn’t be surprised if finance got moderately more efficient, despite being a pretty heavily optimized field, because you can do some really sophisticated processing of qualitative data for patterns. (e.g.- auto ingest all company filings, all news stories, in a sophisticated way; parse historical such data for tradeable patterns.)
- - And maybe because context-aware trading systems can e.g. identify anomalies and prevent bad trades by the rules-based systems that do most trading.

Implications for AI takeoff

While it lasts, weak fluid intelligence is great news for alignment risk

Successful scheming might rely on very good general reasoning. Otherwise, you could do some combo of…

Test the AI — regular capability evaluations, use interpretability to try to ID deception, etc.
Run domain-specialized Ais as control-style overseers
Harden the AI’s environment — cybersec measures
Without the ability to run subtle deceptions around multiple layers of oversight, the AI might be cooked.
- It seems really hard to decide to sandbag without this showing up in some layer of oversight, unless you can do complex sandbagging reasoning in a single forward pass.

A key bifurcation point: can AIs revolutionize AI R&D, or merely speed it up?

Coding is a data-rich domain, and AI companies prioritize generating data on AI R&D tasks, so we should expect AIs to get better at AI R&D over time — as we indeed see.

Case 1: AIs can significantly improve coder productivity and codebase efficiency, but they don’t reach supremacy. R&D progress is gated on compute-hungry research experiments and on expert research taste.

Case 2: AIs are able to intuit key principles of AI R&D. This lets them move smoothly from usefulness to outright supremacy. The best human experts are great at this, but they’re held back by brains with limited working memory and poor native resources for understanding massive-dimensional spaces and inhuman minds.

In both cases, my best guess is that improved AI R&D eventually leads to a paradigm that can scale to superhuman fluid intelligence. And since resources and R&D productivity are scaling so rapidly, “eventually” will probably come pretty soon.

But in case 1 especially, we’re likely to see a period where AI architectures evolves a lot. That has important implications:

AI safety insights might become less transferable across labs
AI safety techniques may no longer work. The fact that current LLMs seem pretty aligned shouldn’t give us much comfort if the final architectures look different.
New, more efficient paradigms could face a dramatic data or compute overhang, leading to a sharp jump in capabilities
Frontier AI might be much more vulnerable to theft, because a lot of value is contained in algorithmic secrets rather than full model weights or training environments.

Is this the world we live in?

Some evidence against: It seems like the “water level” of LLM capabilities is gradually rising in many areas, and that some of this is probably a generalizable-skills thing.

E.g.: probably the amount of chess games in base training data didn’t vary a ton between models, at least since 2023. But chess performance has improved a lot, especially with the advent of reasoning models.
- (The two main benchmarks seem to be Saplin and Dubesor. They have different results, and both result sets are a bit weird to someone used to Number Going Up with each model, but they do broadly show a significant leap in performance between the GPT-3 / Claude-2 generation and later models.)
Reasoning models & the ability to improve performance via longer inference suggest some “fluid intelligence” is going on — enough that you’re not just computing the answer based on simple heuristics and plateauing after a few tokens. (cf Owen)
This is part of why I think it’s useful to have a notion of a sliding scale of fluidity — it’s clear that they’re not just parrots, but also that they’re getting a lot out of domain-specific data that they wouldn’t easily be able to reconstruct via pure reasoning.

How can we test this hypothesis?

Places to look for fluid reasoning capabilities in LLMs:

Recognizing when they’re wrong or uncertain.
- LLMs are pretty bad at this right now, I think.
- Maybe the form of assistant training is partly to blame? If you only get scored based on immediate responses, “can you tell me more?” or “please clarify” isn’t useful. And I dunno how discerning the human evaluators are — maybe they just reward overconfident slop and insight porn.
Self-management — e.g. Claude Plays Pokemon giving itself bad notes and getting stuck.
Meta reasoning, e.g. identifying “this situation seems contrived”.
- They do some of this now, e.g. noticing they’re likely being evaluated.
Performing well when their heuristics need to be reversed. You could design a “trap” game that preys on people who are using normal heuristics (e.g. a chess variant designed so that controlling the center of the board is bad.)
- A simple example: We’re only recently seeing good performance on reversed riddles — LLMs used to instead give the answer to the typical riddle, and now they notice and give the right answer.
- Counterpoint: humans are often bad at “traps”, and require time to learn new situations. So I’m not sure what a fair comparison point is.
Performing well on tasks that seem heavily general-reasoning loaded, and definitely weren’t in the training data.
- Evaluable:
- - A novel game (could compare to dedicated RL systems on the same game).
  - Making money at scale in the real world.
- Are there variants that are more clearly reasoning-loaded?
Re-learning. If you unlearn some data or principles from a model, can it rederive that from first principles?
- This question is also relevant for the usefulness of unlearning as a safeguard against AI misuse or misbehavior.

Thanks to K, Adria, John, and Abi for comments.

^{^}
They’re more like a horde of precocious 12-year olds, each with a different hyperfixation.

45