Disclaimer

I wrote this three months ago, and abandoned it^[1]. I currently do not plan to return to it anytime soon, but I have nonetheless found myself wanting of a document I could point to for why I'm not particularly enthused by or overly concerned with "recursive self improvement". As such I publish anyway; I apologise for not being a more competent writer.

Epistemic Status

Unconfident on some object level details; I don't have a technical grasp of machine learning^[2].

Author's Note

Despite appearances, this is not actually a rhetorical question — phrasing this as a question post was a deliberate decision — do please provide answers to the questions I posited.

The footnotes are unusually extensive in this post; I'd recommend the curious reader not skip them^[3].

Abstract

The main questions this post posits:

Is "seed AI flavoured recursive self-improvement" applicable to the deep learning paradigm?
- My current position: largely "no"
In light of the answer to the above, should we revise down our estimates of the likelihood (and magnitude^[4]) of fast/hard takeoff^[5]?
- My current position: largely "yes"

Introduction

Back in the day, Yudkowsky formulated recursive self improvement (RSI) as a viable^[6] path to foom, under the paradigm of "seed AI". From the wiki^[7] (emphasis mine):

Recursive self-improvement and AI takeoff
Recursively self-improving AI is considered to be the push behind the intelligence explosion. While any sufficiently intelligent AI will be able to improve itself, Seed AIs are specifically designed to use recursive self-improvement as their primary method of gaining intelligence. Architectures that had not been designed with this goal in mind, such as neural networks or large "hand-coded" projects like Cyc, would have a harder time self-improving.
Eliezer Yudkowsky argues that a recursively self-improvement AI seems likely to deliver a hard AI takeoff – a fast, abruptly, local increase in capability - since the exponential increase in intelligence would yield an exponential return in benefits and resources that would feed even more returns in the next step, and so on. In his view a soft takeoff scenario seems unlikely: "it should either flatline or blow up. You would need exactly the right law of diminishing returns to fly through the extremely narrow soft takeoff keyhole."1.

Reading "Hard Takeoff", it seems that Yudkowsky's belief that a soft takeoff is unlikely is heavily dependent on his belief in recursive self-improvement being a component on the critical path to transformative AI:

Recursive self-improvement - an AI rewriting its own cognitive algorithms - identifies the object level of the AI with a force acting on the metacognitive level; it "closes the loop" or "folds the graph in on itself". E.g. the difference between returns on a constant investment in a bond, and reinvesting the returns into purchasing further bonds, is the difference between the equations y = f(t) = m*t, and dy/dt = f(y) = m*y whose solution is the compound interest exponential, y = e^(m*t).
When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM. An exactly right law of diminishing returns that lets the system fly through the soft takeoff keyhole is unlikely - far more unlikely than seeing such behavior in a system with a roughly-constant underlying optimizer, like evolution improving brains, or human brains improving technology. Our present life is no good indicator of things to come.

I get the impression that Yudkowsky was imagining programming a seed AI that could then look upon its own source code with greater cognitive prowess than its programmers, identify obvious flaws, inefficiencies or suboptimal features in its source code and "quickly"^[8]^[9] rewrite itself to greater capabilities (what I'm calling "seed AI flavoured recursive self-improvement"). From "Recursive Self-Improvement" (bolded mine):

Now suppose that instead you hand the AI the problem, "Write a better algorithm than X for storing, associating to, and retrieving memories". At first glance this may appear to be just another object-level problem that the AI solves using its current knowledge, metaknowledge, and cognitive algorithms. And indeed, in one sense it should be just another object-level problem. But it so happens that the AI itself uses algorithm X to store associative memories, so if the AI can improve on this algorithm, it can rewrite its code to use the new algorithm X+1.
This means that the AI's metacognitive level - the optimization process responsible for structuring the AI's cognitive algorithms in the first place - has now collapsed to identity with the AI's object level.
...
Recursion that can rewrite the cognitive level^[10] is worth distinguishing.

The successor AI so produced would be even smarter and better able to make algorithmic/architectural improvements to its source code; this process recurses, leading to an intelligence explosion.

I don't necessarily think this was a well-founded position even within the seed AI paradigm^[11]^[12]^[13], but whether that was the case is largely behind the point of this question.

I'm mostly curious if RSI is still a plausible hypothesis for deep learning systems.

Challenges

On the Feasibility of Recursive Self Improvement

Trillions to quadrillions of parameter models trained at the cost of tens of millions to billions of dollars do not seem particularly amenable to the kind of RSI Yudkowsky envisioned back in the day. They don't meet any of the criteria for "Seed AI":

Properties
A Seed AI has abilities that previous approaches lack:
Understanding its own source code. It must understand the purpose, syntax and architecture of its own programming. This type of self-reflection enables the AGI to comprehend its utility and thus preserve it.
Rewriting its own source code. The AGI must be able to overhaul the very code it uses to fulfill its utility. A critical consideration is that it must remain stable under modifications, preserving its original goals.
This combination of abilities would, in theory, allow an AGI to recursively improve itself by becoming smarter within its original purpose. A Gödel machine rigorously defines a specification for such an AGI.

Our most powerful models aren't designed but selected for via "search like" processes. They aren't by default well factored or particularly modular^[14]. Even with advanced interpretability tools/techniques, it seems like it would be hard to make the kind of modifications/improvements that are possible in well written software programs.

An AI inspecting its mind, identifying flaws/inefficiencies and making nontrivial algorithmic/architectural improvements seems to not be particularly feasible under the deep learning paradigm.

Positive feedback loops of AI development are probable, but they look more like:

In other words, we get recursive self improvement at home. 😉

I'm under the impression that while training more capable successor systems is feasible, doing so would incur considerable economic costs (and thus serve as a taut constraint to the "speed" of takeoff via such feedback loops)^[17]^[18]. It does not seem to necessarily be the case — or even particularly likely — that the above kind of positive feedback loops will lead to discontinuous AI progress.

I think it may be possible that deep learning AGI would eventually be able to look upon its own mind and factorise it^[19], but it seems like that level of cognitive capabilities would come well after AI has become existentially dangerous (or otherwise transformative)^[20]. I currently do not expect seed AI style RSI to be a component on the critical path to existential catastrophe/"the singularity".

On the Viability of Recursive Self Improvement

My current belief is that even if seed AI flavoured recursive self improvement was viable, the gains that can be eked out that way are not as radical as imagined.

To summarise my position:

Intelligence has high "intrinsic complexity": returns to cognitive investment scale (strongly?) sublinearly with investment of computational resources, and this is a fact of computer science/mathematics/optimisation. You can't self improve to cognitive algorithms that attain linear or superlinear returns on cognition because no such algorithms exist (the same way no amount of algorithmic innovation would deliver a comparison based sorting algorithm that has better worst case complexity than ; no such algorithm exist.

My reasons for this belief are largely:

Sutton's Bitter Lesson
Empirical scaling laws for deep learning systems
- Sympathy for the position that intelligence is necessarily connectionist
Most problems in computer science have superlinear time complexity
- Most interesting problems have exponential time complexity
  - I think this is especially true for search/optimisation problems
- Importantly, complexity theory does not care about your clever algorithm designs; the complexity of a problem is a bound on all possible algorithms
- If intelligence is something like "problem solving ability", then effective intelligence would grow sublinearly with computational resources^[21]^[22]

It does not seem to me like recursive self-improvement is relevant in the deep learning paradigm.

Implications

If the aforementioned objections are correct, then insomuch as one's intuitions around foom were rooted in some expectation of recursive self-improvement and insomuch as one believes that the first AGIs will be created within the deep learning paradigm^[23] then the inapplicability of RSI to deep learning should update people significantly downwards on the likelihood of hard takeoff/foom^[24].

However, that does not seem to have been the case. I get the sense that people are no longer necessarily expecting seed AI flavoured recursive self-improvement, but basically still posit very high confidence in foom^[25]. This is prima facie pretty surprising/suspicious as RSI/seed AI was posited as the main reason to expect a hard takeoff. From the wiki page on recursive self-improvement (emphasis mine):

Recursive self-improvement and AI takeoff
Recursively self-improving AI is considered to be the push behind the intelligence explosion.

From "Hard Takeoff" (emphasis mine):

RSI is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor. The advent of human intelligence was a discontinuity with the past even without RSI...

Yudkowsky, talking about a situation in which your goals changed but your beliefs about the steps to take did not said (underlines mine):

Let me guess: Yes, you admit that you originally decided you wanted to buy a million-dollar laptop by thinking, “Ooh, shiny.” Yes, you concede that this isn’t a decision process consonant with your stated goals. But since then, you’ve decided that you really ought to spend your money in such fashion as to provide laptops to as many laptopless wretches as possible. And yet you just couldn’t find any more efficient way to do this than buying a million-dollar diamond-studded laptop—because, hey, you’re giving money to a laptop store and stimulating the economy! Can’t beat that!
My friend, I am damned suspicious of this amazing coincidence. I am damned suspicious that the best answer under this lovely, rational, altruistic criterion X, is also the idea that just happened to originally pop out of the unrelated indefensible process Y. If you don’t think that rolling dice would have been likely to produce the correct answer, then how likely is it to pop out of any other irrational cognition?
It’s improbable that you used mistaken reasoning, yet made no mistakes.

I feel like the update from deep learning hasn't fully propagated through the belief pools of many in the LW sphere. I think Yudkowsky (and those influenced by his views) should be significantly less confident in hard takeoff, and give more consideration to softer takeoff scenarios; otherwise feels like too much epistemic inertia^[26]^[27].

I think there are other avenues for hard takeoff that don't hinge so strongly on RSI (e.g. hardware/content overhang^[28]^[29]^[30]^[31]), but they also seem to be somewhat weakened by the deep learning paradigm^[32]^[33]^[34] (perhaps especially so if scaling maximalism is true)^[35]^[36]^[37]^[38]. That said, my broader scepticism of foom deserves its own top level post.

I am also not persuaded by the justification for foom credulity based on AlphaGo. I don't think AlphaGo is necessarily as strong an indicator for foom as suggested: AlphaGo was able to blow past human performance in the narrow field of Go via 3 days of self-play; it does not seem that general competence in rich domains is similarly amenable to self-play^[39]^[40].

Acknowledgements

I'm grateful to @beren, @janus, @tailcalled, @jessicata, @Alex Vermillion, @clockworklady and others for their valuable feedback on drafts of this post.

^{^}
I added a new section (and some footnotes) to this post before publishing, but I did not otherwise bother to extensively review the post.
^{^}
I have solicited feedback from people more knowledgeable on ML than myself and on reflection expect the main theses of this post to be directionally correct.
Nonetheless, I flag statements I am noticeably not confident in.
^{^}
Many footnotes are more like appendices providing crucial context/elaboration on particular points and not merely authorial side comments such as this.
I also added in relevant commentary from reviewers who are more technically grounded in relevant aspects than myself.
^{^}
How steep/rapid is it? Alternatively, just how big a discontinuity does the takeoff represent?
^{^}
It's not particularly clear what people mean by "hard"/"fast" takeoff. From the taxonomy Barnett drew, I use "hard"/"fast" takeoff to refer to (a more generous version) of Yudkowsky/Bostrom's formulations of the term. I.e. local takeoff that unravels over a timespan too short for humans to react (tentatively hours to a few weeks).
[Where "takeoff" can be understood as the time for AI systems to transition from a par human regime to a strongly superhuman regime.]
^{^}
"A viable" may be an understatement; my impression/cached thoughts is that RSI was presented as the most load bearing component on the path to hard takeoff. E.g. from "Hard Takeoff" (emphasis mine):
RSI is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor. The advent of human intelligence was a discontinuity with the past even without RSI...
Or from "Recursive Self-Improvement":
When you first build an AI, it's a baby - if it had to improve itself, it would almost immediately flatline. So you push it along using your own cognition, metaknowledge, and knowledge - not getting any benefit of recursion in doing so, just the usual human idiom of knowledge feeding upon itself and insights cascading into insights. Eventually the AI becomes sophisticated enough to start improving itself, not just small improvements, but improvements large enough to cascade into other improvements. (Though right now, due to lack of human insight, what happens when modern researchers push on their AGI design is mainly nothing.) And then you get what I. J. Good called an "intelligence explosion".
^{^}
I think the LW Wiki is a pretty accurate representation of the community's consensus at the time.
^{^}
The main difference between "seed AI flavoured recursive self-improvement" and the kind of self-improvement that manifests in the deep learning paradigm (covered later) is probably the resource cost (time, computational and economic resources) to each iteration of the feedback process.
Having to spend ever greater economic and computational resources (growing at a superlinear rate) on training successors to attain constant capabilities progress in the feedback cycle will greatly change the dynamics of the flavour of self-improvement that manifests under deep learning. To borrow Yudkowsky's metaphor, such self-improvement seems much less likely to be "prompt critical".
In my terminology: the cumulative returns to cognitive capabilities from the investment of economic/computational resources are sublinear (see:^[9]).
See:^[17]^[32]
^{^}
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios $< 1$ converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
^{^}
The levels Yudkowsky is referring to:
So I shall stratify this causality into levels - the boundaries being semi-arbitrary, but you've got to draw them somewhere:
- "Metacognitive" is the optimization that builds the brain - in the case of a human, natural selection; in the case of an AI, either human programmers or, after some point, the AI itself.
- "Cognitive", in humans, is the labor performed by your neural circuitry, algorithms that consume large amounts of computing power but are mostly opaque to you. You know what you're seeing, but you don't know how the visual cortex works. The Root of All Failure in AI is to underestimate those algorithms because you can't see them... In an AI, the lines between procedural and declarative knowledge are theoretically blurred, but in practice it's often possible to distinguish cognitive algorithms and cognitive content.
- "Metaknowledge": Discoveries about how to discover, "Science" being an archetypal example, "Math" being another. You can think of these as reflective cognitive content (knowledge about how to think).
- "Knowledge": Knowing how gravity works.
- "Object level": Specific actual problems like building a bridge or something.
Fuller context for the bolded:
This means that the AI's metacognitive level - the optimization process responsible for structuring the AI's cognitive algorithms in the first place - has now collapsed to identity with the AI's object level.
For some odd reason, I run into a lot of people who vigorously deny that this phenomenon is at all novel; they say, "Oh, humanity is already self-improving, humanity is already going through a FOOM, humanity is already in a Singularity" etc. etc.
Now to me, it seems clear that - at this point in the game, in advance of the observation - it is pragmatically worth drawing a distinction between inventing agriculture and using that to support more professionalized inventors, versus directly rewriting your own source code in RAM. Before you can even argue about whether the two phenomena are likely to be similar in practice, you need to accept that they are, in fact, two different things to be argued about.
And I do expect them to be very distinct in practice. Inventing science is not rewriting your neural circuitry. There is a tendency to completely overlook the power of brain algorithms, because they are invisible to introspection. It took a long time historically for people to realize that there was such a thing as a cognitive algorithm that could underlie thinking. And then, once you point out that cognitive algorithms exist, there is a tendency to tremendously underestimate them, because you don't know the specific details of how your hippocampus is storing memories well or poorly - you don't know how it could be improved, or what difference a slight degradation could make. You can't draw detailed causal links between the wiring of your neural circuitry, and your performance on real-world problems. All you can see is the knowledge and the metaknowledge, and that's where all your causal links go; that's all that's visibly important.
To see the brain circuitry vary, you've got to look at a chimpanzee, basically. Which is not something that most humans spend a lot of time doing, because chimpanzees can't play our games.
You can also see the tremendous overlooked power of the brain circuitry by observing what happens when people set out to program what looks like "knowledge" into Good-Old-Fashioned AIs, semantic nets and such. Roughly, nothing happens. Well, research papers happen. But no actual intelligence happens. Without those opaque, overlooked, invisible brain algorithms, there is no real knowledge - only a tape recorder playing back human words. If you have a small amount of fake knowledge, it doesn't do anything, and if you have a huge amount of fake knowledge programmed in at huge expense, it still doesn't do anything.
So the cognitive level - in humans, the level of neural circuitry and neural algorithms - is a level of tremendous but invisible power. The difficulty of penetrating this invisibility and creating a real cognitive level is what stops modern-day humans from creating AI. (Not that an AI's cognitive level would be made of neurons or anything equivalent to neurons; it would just do cognitive labor on the same level of organization. Planes don't flap their wings, but they have to produce lift somehow.)
Recursion that can rewrite the cognitive level is worth distinguishing.
^{^}
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:^[9]) to cognitive reinvestment. See:^[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
^{^}
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:^[13]
^{^}
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
^{^}
Compared to idiomatic software at least.
^{^}
tailcalled says:
AI doing prompt engineering. Especially if future AI systems will consist of a bunch of GPT-style prompts feeding data to each other. Then the AI could change the architecture in these pipelines.
Also, AI doing ordinary programming. (Not just discovering more efficient algorithms, there's lots and lots of ordinary programming work that can be done. Some of this can feed back into the AI systems.)
^{^}
I think "AI metalearning better optimisation algorithms" and "AI automating AI architecture search" come closest to seed AI flavoured recursive self improvement, but they aren't quite it as it seems like AI creating more capable successor systems than a system improving itself directly. Succession can still induce foom (though see my [poorly raised] objections:^[11]^[12]).
^{^}
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:^[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is $> 1$ , it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:^[32]
^{^}
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
^{^}
clockworklady:
I'd like to express the controversial view that law of requisite variety, at minimum, renders full / reliable factorization of system's architecture/mind BY same system very nontrivial and very architecturally dependent. And yes, current approaches don't seem like they would have a good time sidestepping LRW (very large state/parameter count, enormous dependence on "scale" suggest that these systems just aren't architectures that would allow for efficient, compressed representations that would allow a system to successfully introspect all relevant aspects of itself)

Current systems seem to have enormous parameter counts and require enormous amount of training data ("scale is all you need" boils down uncomfortably to "you need astronomic amounts of scale") and that does not seem like an architecture that would be amenable to compressed, succinct representations of states or clever TMTO-like modeling hackage that would allow the system itself to efficiently and reliably self-model to the point of high-quality "planned" self-improvement.

If anything, current LLM seem to be very opposite of that .

NB! It doesn't mean that LLMs are at all "banned" from self-improvement, but that its quality will by virtue of their architecture remain massively below "factorizing its own mind" quality benchmark
^{^}
Other phrasings that convey what I'm trying to.
Recursive self-improvement seems to not be all that likely to play a role in:
* The path to superintelligent systems
* The path to attaining decisive strategic advantage
I.e. I don't think recursive self-improvement is on the critical path of humanity's future; it is in this sense that I don't think it's particularly relevant in the deep learning paradigm.
^{^}
There are caveats to this. They sound like:
* Often we don't want exact solutions, and we just want good enough approximations
* Or we do not care about deterministic correctness, and correctness with high probability is desired
* Or we do not care about worst case complexity and it's the expected complexity on real world probability distributions that matters
* Or ...
My reply to all such objections of the above form is that as far as I'm aware, these relaxations generally do not take the "effective complexity" of a problem from superlinear to linear or sublinear. Some problems have exponential time complexity even after all these relaxations and most others are generally super quadratic.
^{^}
That said the "complexity of intelligence" hypothesis does not rule out fast takeoff. Effective intelligence level as a function of time can still grow at a superlinear rate if computational resources grows sufficiently fast. Economic feedback loops could facilitate this.
I do have some scepticism of compute investment driven foom, but this post is about recursive self improvement in particular.
^{^}
The Metaculus community prediction is at 90%.
^{^}
A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".
That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).
Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".
I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.
Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.
That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.
^{^}
E.g. I was left with that impression listening to some of Yudkowsky's recent AI safety writing, and empirically in AI safety conversations I've had online, many people still seem to have high confidence in foom.
^{^}
Again, if it's indeed the case that deep learning is not particularly amenable to seed AI flavoured recursive self-improvement and they expect AGI to arise within the deep learning paradigm.
^{^}
Largely, I mean that many people are not making updates I think consistency demands they make given their previously stated beliefs, their reasons for those beliefs, and the available evidence.
Also, see:^[24].
^{^}
Within lifetime human learning seems to be remarkably more sample efficient than our best ML models, so I'm somewhat (a lot?) more sympathetic to the content overhang thesis than the hardware overhang one.
Even so, I don't expect the first AGIs to be anywhere near as sample efficient as the brain (in general biology is way more resource (e.g. energy) efficient than human engineering (bird flight, animal locomotion in general, the brain's "10 -12 watt" energy budget [it seems to be 1+ orders of magnitude more energy efficient than GPUs of comparable processing power and within an order of magnitude of the theoretical limit given its size], etc.) and so I wouldn't be surprised if biology is way more data efficient than our first AIs (and I wouldn't necessarily expect AI to reach biology's data efficiency quickly). See:^[29]
^{^}
For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison.
I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.
A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.
Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:^[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain.
Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:^[31]).
^{^}
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
^{^}
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:^[11]^[12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:^[17]^[32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
^{^}
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:^[33]^[34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
^{^}
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI
- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
^{^}
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
^{^}
I am particularly unconfident about this point, and especially with respect to a hardware overhang; it seems to me that scaling maximalism could make a hardware overhang much more likely (see:^[36]) as training is often (much? see:^[37]) more expensive than inference. But I don't have a solid grasp of how easily/readily trained models can be scaled up to use more compute at inference and how that affects their performance (see:^[38]).
I would be grateful if those more technically grounded than myself were to address this point in particular.
^{^}
beren:
I disagree with this point. Almost all the RSI methods seem to require some level of running training experiments and not just inference -- i.e. model has to fine-tune itself or train a successor. This will be primarily bottlenecked by training compute
^{^}
beren:
is it really true that training is only slightly more expensive than inference? I mean I guess it depends on how much inference you do but the approximate cost of a training run is basically dataset-size / batch_size * N_epochs * 2 (assuming 1 forward and 1 backward pass of approx equal compute).
^{^}
beren:
This is potentially possible (i.e. adaptive compute) and the brain does something like this. But current ML systems don't really do this and so inference is just a forward pass with a fixed cost. The most probably increase in capabilities could come from finetuning which are essentially just very short additional training runs
^{^}
At least, not with amounts of compute available in the near term. But I could also be wrong on this; I'm not technically grounded enough here to feel strongly about my intuitions in this regard.
Alternatively, it's not necessarily the case that even if we had unbounded compute, we could specify training objectives that select for general intelligence under iterated self-play. janus suspects this:
I agree with this, though not even primarily because of compute limitations (although that's also a factor). Unlike Go, we don't have the "true name" of a win condition for AGI operating in rich domains that is amenable to iterated games.
^{^}
beren:
yeah I mean alpha-go is not actually an example of foom at all and self-play doesn't really have RSI like characteristics imho. It would be great to get scaling laws for self-play. I suspect diminishing returns but don't actually know it. To me self-play is more an example of standard scaling