I've been wondering about superintelligence as a concept for a long time, and want to lay out two distinct possibilities; either it's boundedly complex and capable, or not bounded, and can scale to impossible to understand levels.
In the first case, think of chess; superhuman chess still plays chess. You can watch AlphaZero’s games and nod along—even if it’s alien, you get what it's doing, the structure of the chess "universe" is such that unbounded intelligence still leads to mostly understandable moves. This seems to depend on domain. For AlphaGo, it's unclear to me that move 37 is fundamentally impossible to understand in Go-expert terms, or if it's just a new style of play that humans can now understand by reformulating their understanding of Go in some way.
In the near-term, there's a reason to think that even superhuman AI would stay within human-legible ranges of decisions. An AI tasked with optimizing urban environments. might give us efficient subway systems and walkable streets - but the essence of the city is for human residents, and legibility and predictability are presumably actually critical criteria. If a superintelligent designer produced fractal and biologically active cities out of a Lovecraftian fever-dream that are illegible to humans, they would be ineffective cities.
A superhuman composer might write music that breaks our rules but still stirs our souls. But a superintelligence might instead write music that sounds to us like static, full of some brilliant structure, with no ability for human brains to comprehend it. Humans might be unable to tell whether it’s genius or gibberish - but are such heights of genius a real thing? I am unsure.
The question I have, then, is whether heights of creation inaccessible to human minds are a real coherent idea. If they are, perhaps the takeover of superhuman AI would happen in ways that we cannot fathom. But if not, it seems far more likely that we end up disempowered rather than eliminated in the blink of an eye.
In the first case, think of chess; superhuman chess still plays chess. You can watch AlphaZero’s games and nod along—even if it’s alien, you get what it's doing, the structure of the chess "universe" is such that unbounded intelligence still leads to mostly understandable moves.
I guess the question here is how much is 'mostly'? We can point to areas of chess like the endgame databases, which are just plain inscrutable: when the databases play out some mate-in-50 game because that is what is provably optimal by checking every possible move, any human understanding is largely illusory. They are brute facts determined by the totality of the game tree, not any small self-contained explanation like 'knight forking'. (There is probably no 'understandability' even in principle for arbitrarily intelligent agents, similar to asking why the billionth digit of pi or Chaitin's omega is what it is.)
And if we want to expand it out to more realistic settings, we don't even get that: 'chess' doesn't exist in the real world - only specific implementations of chess. With an actual implementation in software, maybe we get something closer to a TAS speedrun, where the chess player twitches for a while and then a buffer overflow instantly wins without the opponent getting to even move a pawn.
But a superintelligence might instead write music that sounds to us like static, full of some brilliant structure, with no ability for human brains to comprehend it. Humans might be unable to tell whether it’s genius or gibberish - but are such heights of genius a real thing? I am unsure.
But what part are you unsure about? There are surely many pragmatic ways to tell if there is a structure in the apparent static (even if it cannot be explained to us the way that, say, cryptographic algorithms can be explained to us and demonstrated by simply decrypting the 'static' into a very meaningful message): for example, simply see if other superintelligences or algorithms can predict/compress the static. You and I can't see 'non-robust features' in images that neural networks do, but we can observe them indirectly by looking at the performance of neural networks dropping when we erase them, and see that they are really real.
We can point to areas of chess like the endgame databases, which are just plain inscrutable
I think there isa key difference in places where the answers are just exhaustive search, rather than more intelligence - AI isn't better at that than humans, and from the little I understand, AI doesn't outperform in endgames (compared to their overperformance in general) via better policy engines, they do it via direct memorization or longer lookahead.
The difference here matters for other domains with far larger action spaces even more, since the exponential increase makes intelligence less marginally valuable at finding increasingly rare solutions. The design space for viruses is huge, and the design space for nanomachines using arbitrary configurations is even larger. If move-37-like intuitions are common, they will be able to do things humans cannot understand, whereas if it's more like chess endgames, they will need to search an exponential space in ways that are infeasible for them.
This relates closely to a folk theorem about NP-complete problems, where exponential problems are approximately solvable with greedy algorithms in nlogn or n^2 time, and TSP is NP complete but actual salesmen find sufficiently efficient routes easily.
But what part are you unsure about?
Yeah, on reflection, the music analogy wasn't a great one. I am not concerned that pattern creation that we can't intuit could exist - humans can do that as well. (For example, it's easy to make puzzles no-one can solve.) The question is whether important domains are amenable to kinds of solutions that ASI can understand robustly in ways humans cannot. That is, can ASI solve "impossible" problems?
One specific concerning difference is whether ASI could play perfect social 12-D chess by being a better manipulator, despite all of the human-experienced uncertainties, and engineer arbitrary outcomes in social domains. There clearly isn't a feasible search strategy with exact evaluation, but if it is far smarter than "human-legible ranges" of thinking, it might be possible.
This isn't jut relevant for AI risk, of course. Another area is biological therapies, where, for example, it seems likely that curing or reversing aging requires the same sort of brilliant insight into insane complexity, figuring out whether there would be long term or unexpected out of distribution impacts years later, without actually conducting multi-decade large scale trials.
Why does this matter? To quote a Yudkowsky-ish example, maybe you can take a 16-th century human (before Newtonian physics was invented, after guns were invented) and explain to him how a nuclear bomb works. This doesn't matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.
ASI inventions can be big surprises and yet be things that you could understand if someone taught you.
We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.
This doesn't matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.
If AI systems can make 500 years of progress before we notice it's uncontrolled, it's already assuming it's a insanely strong superintelligence.
We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.
Probably, if it's of a type we can imagine and is comprehensible in those terms - but that's assuming the conclusion! As Gwern noted, we can't understand chess endgames. Similarly, in the case of a strong ASI, the ASI- created probe or cure could look more like a random set of actions that aren't explainable in our terms which cause the outcome than it does like an engineered / purpose driven system that is explainable at all.
As Gwern noted, we can't understand chess endgames.
On this example specifically, a) it's possible AI is too stupid to have a theory of mind of humans such that it can write good chess textbooks on these endgames. Maybe there is an elegant way of looking at it that isn't brute force b) chess endgames are amenable to brute force in a way that "invent a microscope" is not. Scientific discovery is searching through an exponential space so you need a good heuristic or model for every major step you take, you can't brute force it.
I agree tech beyond human comprehension is possible. I’m just giving an intuition as to why a lot of radically powerful tech likely still lies within human comprehension. 500 [1] years of progress is likely to still be within comprehension, so is 50 years or 5 years.
The most complex tech that exists in the universe is arguably human brains themselves and we could probably understand a good fraction of their working too, if someone explained it.
Important point here being the AI has to want to explain it in simple terms to us.
If you get a 16th century human to visit a nuclear facility for a day that’s not enough information for them to figure out what it does or how it works. You need to provide them textbooks that break down each of the important concepts.
[1] society in 2000 is explainable to society in 1500 but society in 2500 may or may not be explainable to society in 2000 because acceleration
Distinct views about LLM Intelligence
In discussing AI and existential risk, I’ve noticed that disagreements often hinge on views of AI capabilities—particularly whether large language models (LLMs) are "doing reasoning." This deserves clearer differentiation. Matt Yglesias once noted that “most optimists are not, fundamentally, AI optimists — they are superintelligence skeptics.” This feels deeply true, and it’s connected to the spectrum of beliefs about current AI systems’ reasoning abilities. To continue Ygelsias’ dichotomy, I’ll sketch a brief and uncharitable pastiche of each view.
The first, which I’ll (unfairly) call the stochastic parrot view, is that LLMs don’t actually understand. What models actually do is constructed based on a textual model of the world that has deeply confused relationships between different things. The successes based on various metrics like exams and reasoning tests are in large part artifacts of training on test data, i.e. leakage. Given that there is some leakage and data about all of the different types of evaluation which are performed on LLMs in the training data, along with substantive answers to similar or identical questions, statistics about the training data text, unrelated to understanding, leads to many correct answers. This success, however, is essentially entirely due to the cognitive labor of those whose input data was used, and not a successful world-model. The failures in doing slight variations of common tasks are largely an obvious result of this, since there isn’t an underlying coherent understanding.
The second, which I’ll (also unfairly) call the already AGI view, is that LLMs do have a largely coherent underlying model of the world. The textual data on which they are trained is rich enough to allow the model’s training to capture many true facts about the world, even though its implicit causal understanding is imperfect. The successes based on various metrics such as exams are similar to that of most humans who take exams, a combination of pattern matching based on training, and cached implicit models of what is expected. At least a large portion of LLM failures in doing slight variations of common tasks are because it is implicitly being asked to reproduce the typical class of answers to questions, and most humans pay only moderate attention and will make similar mistakes, especially in a single pass. The success of prompt design, such as “reason step-by-step,” is just telling the model to do what humans actually do. The abilities which are displayed are evidence that the first-pass failure is sloppiness, not inability.
I suspect that there are two stages of the crux between these views. The first is whether LLMs are doing “true” reasoning, and the second is whether humans are. I have mostly focused on the first, but to conclude, the second seems important as well - if there’s something called reasoning and humans don’t do it most of the time, and some humans don’t do at all - which some have claimed - then given how much these systems have memorized and their extant competencies, I’m not sure that this form of reasoning is needed for AGI, even if it’s needed for ASI.
Just signal-boosting the obvious references to the second: Sarah Constantin's Humans Who Are Not Concentrating Are Not General Intelligences and Robin Hanson’s Better Babblers.
After eighteen years of being a professor, I’ve graded many student essays. And while I usually try to teach a deep structure of concepts, what the median student actually learns seems to mostly be a set of low order correlations. They know what words to use, which words tend to go together, which combinations tend to have positive associations, and so on. But if you ask an exam question where the deep structure answer differs from answer you’d guess looking at low order correlations, most students usually give the wrong answer.
Simple correlations also seem sufficient to capture most polite conversation talk, such as the weather is nice, how is your mother’s illness, and damn that other political party. Simple correlations are also most of what I see in inspirational TED talks, and when public intellectuals and talk show guests pontificate on topics they really don’t understand, such as quantum mechanics, consciousness, postmodernism, or the need always for more regulation everywhere. After all, media entertainers don’t need to understand deep structures any better than do their audiences.
Let me call styles of talking (or music, etc.) that rely mostly on low order correlations “babbling”. Babbling isn’t meaningless, but to ignorant audiences it often appears to be based on a deeper understanding than is actually the case. When done well, babbling can be entertaining, comforting, titillating, or exciting. It just isn’t usually a good place to learn deep insight.
It's unclear to me how much economically-relevant activity is generated by low order correlation-type reasoning, or whatever the right generalisation of "babbling" is here.
Thank you, definitely agree about linking those as relevant.
It's unclear to me how much economically-relevant activity is generated by low order correlation-type reasoning
I think one useful question is whether babbling can work to prune, and it seems the answer from reasoning models is yes.
My own take is that I'm fairly sympathetic to the "LLMs are already able to get to AGI" view, with the caveat that most of the difference between human and LLM learning where humans are superior than LLMs comes from being able to do meta-learning over long horizons, and we haven't yet been shown this is possible for LLMs to do purely by scaling compute.
Indeed, I think it's the entire crux of the scaling hypothesis debate, in whether scale enables meta-learning over longer and longer time periods:
An intuition pump you can try is make them sit side by side with an AI and answer questions on text in 1 minute. And check whose answers are better.
Toby Ord writes that “the required resources [for LLM training] grow polynomially with the desired level of accuracy [measured by log-loss].” He then concludes that this shows “very poor returns to scale,” and christens it the "Scaling Paradox." (He continues to point out that this doesn’t imply it can’t create superintelligence, but I agree with him about that.)
But what would it look like if this were untrue? That is, what would be the conceptual alternative, where required resources grow more slowly?I think the answer is that it’s conceptually impossible.
To start, there is a fundamental bound on loss at zero, since the best possible model perfectly predicts everything - it exactly learns the distribution. This can happen when overfitting a model, but it can also happen when there is a learnable ground truth; models that are trained to learn a polynomial function can learn them exactly.
But there is strong reason to expect the bound to be significantly above zero loss. The training data for LLMs contains lots of aleatory randomness, things that are fundamentally conceptually unpredictable. I think it’s likely that things like RAND’s random number book are in the training data, and it’s fundamentally impossible to predict randomness. I think something similar is generally true for many other things - predicting world choice for semantically equivalent words, predicting where typos occur, etc.
Aside from being bound well above zero, there's a strong reason to expect that scaling is required to reduce loss for some tasks. In fact, it’s mathematically guaranteed to require significant computation to get near that level for many tasks that are in the training data. Eliezer pointed out that GPTs are predictors, and gives the example of a list of numbers followed by their two prime factors. It’s easy to generate such a list by picking pairs of primes and multiplying them, the writing the answer first - but decreasing loss for generating the next token to predict the primes from the product is definitionally going to require exponentially more computation to perform better for larger primes.
And I don't think this is the exception, I think it's at least often the rule. The training data for LLMs contains lots of data where the order of the input doesn’t follow the computational order of building that input. When I write an essay, I sometimes arrive at conclusions and then edit the beginning to make sense. When I write code, the functions placed earlier often don’t make sense until you see how they get used later. Mathematical proofs are another example where this would often be true.
An obvious response is that we’ve been using exponentially more compute for better accomplishing tasks that aren’t impossible in this way - but I’m unsure if that is true. Benchmarks keep getting saturated, and there’s no natural scale for intelligence. So I’m left wondering whether there’s any actual content in the “Scaling Paradox.”
(Edit: now also posted to my substack.)
One problem is that log-loss is not tied that closely to the types of intelligence that we care about. Extremely low log-loss necessarily implies extremely high ability to mimic a broad variety of patterns in the world, but that's sort of all you get. Moderate improvements in log-loss may or may not translate to capabilities of interest, and even when they do, the story connecting log-loss numbers to capabilities we care about is not obvious. (EG, what log-loss translates to the ability to do innovative research in neuroscience? How could you know before you got there?)
When there were rampant rumors about an AI slowdown in 2024, the speculation in the news articles often mentioned the "scaling laws" but never (in my haphazard reading) made a clear distinction between (a) frontier labs seeing that the scaling laws were violated, IE, improvements in loss are really slowing down, (b) there's a slowdown in the improvements to other metrics, (c) frontier labs are facing a qualitative slowdown, such as a feeling that GPT5 doesn't feel like as big of a jump as GPT4 did. Often these concepts were actively conflated.
Strongly agree. I was making a narrower point, but the metric is clearly different than the goal - if anything it's more surprising that we see so much correlation as we do, given how much it has been optimized.