I think it's premature to conclude that AGI progress will be large pre-trained transformers indefinitely into the future. They are surprisingly(?) effective but for comparison they are not as effective in the narrow domains where AlphaZero and AlphaStar are using value and action networks paired with Monte-Carlo search with orders of magnitude fewer parameters. We don't know what MCTS on arbitrary domains will look like with 2-4 OOM-larger networks, which are within reach now. We haven't formulated methods of self-play for improvement with LLMs and I think that's also a potentially large overhang.
There's also a human limit to the types of RSI we can imagine and once pre-trained transformers exceed human intelligence in the domain of machine learning those limits won't apply. I think there's probably significant overhang in prompt engineering, especially when new capabilities emerge from scaling, that could be exploited by removing the serial bottleneck of humans trying out prompts by hand.
Finally I don't think GOFAI is dead; it's still in its long winter waiting to bloom when enough intelligence is put into it. We don't know the intelligence/capability threshold necessary to make substantial progress there. Generally, the bottleneck has been identifying useful mappings from the real world to mathematics and algorithms. Humans are pretty good at that, but we stalled at formalizing effective general intelligence itself. Our abstraction/modeling abilities, working memory, and time are too limited and we have no idea where those limits come from, whether LLMs are subject to the same or similar limits, or how the limits are reduced/removed with model scaling.
If deep learning yields AGI, the question is how far can its intelligence jump beyond human level before it runs out of compute available in the world, using the improvements that can be made very quickly at the current level of intelligence. In short sprints, a hoard of handmade constants can look as good as asymptotic improvement. So the latter's hypothetical impossibility doesn't put convincing bounds on how far this could be pushed before running out of steam. And if by that point procedures for bootstrapping nanotech become obvious, this keeps going, transitioning into disassembling the world for more compute without pause. All without refuting the bitter lesson.
I think you forgot one critical thing. Why does the normal argument for RSI's inevitability fail? The answer is: it doesn't.
Even though there is some research in the direction of a neural network changing each of its weights directly, this isn't important to the main argument because it is about improving source code. The weights are more like compiled code.
In the context of deep learning, the source code consists of:
So the question is if a deep learning model could improve any of this code. The question of if it can improve its "compiled code" (the weights) is also probably yes, but isn't what the argument is based on.
Then this runs into the issue that I challenge there's just not that much gain to be made from such source code improvements.
A misaligned model might not want to do that, though, since it would be difficult for it to ensure that the output of the new training process is aligned to its goals.
It seems pretty clear to me that AI's could get really good at understanding and predicting the results of editing model weights in the same way they can get good at predicting how proteins will fold. From there, directly creating circuits that add XYZ reasoning functionality seems at least possible.
I don't actually share this intuition.
I don't think you can get the information of computing the gradient updates to particular weights without actually running that computation (or something equivalent to it).
And presumably one would need empirical feedback (i.e. the value of the objective function we're optimising the network for on particular inputs) to compute the desired gradient updates.
The idea of the system just predicting the desired gradient updates without any ground truth supervisory signal seems fanciful.
I agree that it seems possible. I have doubts that predicting the results of editing weights is a more compute-efficient way of causing a model to exhibit the desired behavior than giving it the obvious tools and using fine-tuning / RL to make it able to use those tools though, or alternatively just don't the RL/finetune directly. That's basically the heart of how I interpret the bitter lesson - it's not that you can't find more efficient ways to do what DL can do, it's that when you have a task that humans can do and computers can't, the approach of "introspect and think really hard about how to approach task the right way" is outperformed by the approach of "lol more layers go brrrrr".
This is a solid argument inasmuch as we define RSI to be about self-modifying its own weights/other-inscrutable-reasoning-atoms. That does seem to be quite hard given our current understanding.
But there are tons of opportunities for an agent to improve its own reasoning capacity otherwise. At a very basic level, the agent can do at least two other things:
Most problems in computer science have superlinear time complexity
on one hand sure, improving this is (likely) impossible in the limit because of fundamental complexity properties. On the other hand, the agent can still become vastly smarter than humans. A particular example: the human mind, without any assistance, is very bad at solving 3SAT. But we've invented computers, and then constraint solvers, and now are able to solve 3SAT much much faster, even though 3SAT is (likely) exponentially hard. So the RSI argument here is, the smarter (or faster) the model is, the more special-purpose tools it can create to efficiently solve specific problems and thus upgrade its reasoning ability. Not to infinity, but likely far beyond humans.
To be clear, the complexity theory argument is against fast takeoff, not an argument that intelligence caps at some level relative to humans.
Analogy approaches infinity, but it does so much slower than .
I.e. the sublinear asymptotics would prevent AI from progressing very quickly to a vastly superhuman level (unless the AI is able to grow its available resources sufficiently quickly to dominate the poor asymptotics.
Alternatively, each order of magnitude increase in compute buys (significantly) less intelligence; thus progress from human level to a...
So I'm going to strong disagree here.
First of all, as it turns out in practice, scale was everything. This means that any AI idea you want to name, unless that idea was based on a transformer and worked on by approximately 3 labs, it was never actually attempted.
We can just ignore all the other thousands of AI methods that humans tried because they were not attempted with a relevant level of scale.
Therefore, RSI has never been tried.
Second, you can easily design a variation on RSI that works fine with current paradigms.
It's not precisely RSI but it's functionally the same thing. Here are the steps:
[1] benchmark of many tasks. Tasks must be autogradeable, human participants must be able to 'play' the tasks so we have a control group score, tasks must push the edge of human cognitive ability (so the average human scores nowhere close to the max score, and top 1% humans do not max the bench either), there must be many tasks and with a rich permutation space. (so it isn't possible for a model to memorize all permutations)
[2] heuristic weight score on this task intended to measure how "AGI like" a model is. So it might be the RMSE across the benchmark. But also have a lot of score weighting on zero shot, cross domain/multimodal tasks. That is, the kind of model that can use information from many different previous tasks on a complex exercise it has never seen before is closer to an AGI, or closer to replicating "Leonardo da Vinci", who had exceptional human performance presumably from all this cross domain knowledge.
[3] In the computer science task set, there are tasks to design an AGI for a bench like this. The model proposes a design, and if that design has already been tested, immediately receives detailed feedback on how it performed.
The "design an AGI" subtask can be much simpler than "write all the boilerplate in Python", but these models will be able to do that if needed.
As tasks scores approach human level across a broad set of tasks, you have an AGI. You would expect it to almost immediately improve to a low superintelligence. As AGIs get used in the real world and fail to perform well at something, you add more tasks to the bench, and/or automate creating simulated scenarios that use robotics data.
Why aren't we already doing this if it's so simple?
Because each AGI candidate training run has to be at least twice as large as llama-65b, so it means 2m+ in training costs per run. And you need to explore the possibility space pretty broadly, so you would figure several thousand runs to really get to a decent AGI design which will not be optimal.
This is one of the reasons foom cannot happen. At least not without a lot more compute than we have now. Each attempt is too expensive.
Can we refine the above algorithm into something more compute efficient? Yes, somewhat (by going to a modular architecture, where each "AGI candidate" is composed of hundreds of smaller networks, and we reuse most of them in between candidates), but it's going to still require a lot more compute than llama-65b took to train.
I've made a related Manifold question here:
I'm a believer in RSI-soon. I have an inside view that supports this. I expect convincing evidence of RSI to become apparent in the world before 2026. If you think I'm wrong, but this manifold question doesn't get at the root of things, I'll happily also vote on other markets related to this.
Direct self-improvement (i.e. rewriting itself at the cognitive level) does seem much, much harder with deep learning systems than with the sort of systems Eliezer originally focused on.
In DL, there is no distinction between "code" and "data"; it's all messily packed together in the weights. Classic RSI relies on the ability to improve and reason about the code (relatively simple) without needing to consider the data (irreducibly complicated).
Any verification that a change to the weights/architecture will preserve a particular non-trivial property (e.g. avoiding value drift) is likely to be commensurate in complexity to the complexity of the weights. So... very complex.
The safest "self-improvement" changes probably look more like performance/parallelization improvements than "cognitive" changes. There are likely to be many opportunities for immediate performance improvements[1], but that could quickly asymptote.
I think that recursive self-empowerment might now be a more accurate term than RSI for a possible source of foom. That is, the creation of accessory tools for capability increase. More like a metaphorical spider at the center of an increasingly large web. Or (more colorfully) a shoggoth spawning a multitude of extra tentacles.
The change is still recursive in the sense that marginal self-empowerment increase the ability to self-empower.
So I'd say that a "foom" is still possible in DL, but is both less likely and almost certainly slower. However, even if a foom is days or weeks rather than minutes, many of the same considerations apply. Especially if the AI has already broadly distributed itself via the internet.
Perhaps instead of just foom, we get "AI goes brrrr... boom... foom".
Hypothetical examples include: more efficient matrix multiplication, faster floating point arithmetic, better techniques for avoiding memory bottlenecks, finding acceptable latency vs. throughput trade-offs, parallelization, better usage of GPU L1/L2/etc caches, NN "circuit" factoring, and many other algorithmic improvements that I'm not qualified to predict.
What if the machine has a benchmark/training suite for performance. On the benchmark is a task for designing a better machine architecture.
Machine proposes a better architecture. New architecture maybe a brand new set of files defining the networks, topology, and training procedure, or they may reuse networks for components.
For example you might imagine an architecture that uses gpt-3.5 and -4 as subsystems but the "executive control" is from a new network defined by the architecture.
Given a very large compute budget (many billions), the company hosting...
I think takeoff from broadly human level to strongly superhuman is years, and potentially decades.
Foom in days or weeks still seems just as fanciful as before.
An interesting possibility is recursively self-improving prompt in auto-GPT.
The MIRI 2000s paradigm for an AI capable of self-improvement, was that it would be modular code with a hierarchical organization, that would potentially engage in self-improvement at every level.
The actual path we've been on has been: deep learning, scaling, finetuning with RLHF, and now (just starting) reflective agents built on a GPT base.
A reflective GPT-based agent is certainly capable of studying itself and coming up with ideas for improvement. So we're probably at the beginning of attempts at self-improvement, right now.
Inability of neural nets to quickly retrain a new larger versions will slower the speed on their progress and thus many competing AIs are more likely to emerge. It is less likely that they can merge later by "merging utility functions" as NNs have no explicit utility functions. Thus multipolar world is more likely.
Disclaimer
I wrote this three months ago, and abandoned it[1]. I currently do not plan to return to it anytime soon, but I have nonetheless found myself wanting of a document I could point to for why I'm not particularly enthused by or overly concerned with "recursive self improvement". As such I publish anyway; I apologise for not being a more competent writer.
Epistemic Status
Unconfident on some object level details; I don't have a technical grasp of machine learning[2].
Author's Note
Despite appearances, this is not actually a rhetorical question — phrasing this as a question post was a deliberate decision — do please provide answers to the questions I posited.
The footnotes are unusually extensive in this post; I'd recommend the curious reader not skip them[3].
Related
Abstract
The main questions this post posits:
Introduction
Back in the day, Yudkowsky formulated recursive self improvement (RSI) as a viable[6] path to foom, under the paradigm of "seed AI". From the wiki[7] (emphasis mine):
Reading "Hard Takeoff", it seems that Yudkowsky's belief that a soft takeoff is unlikely is heavily dependent on his belief in recursive self-improvement being a component on the critical path to transformative AI:
I get the impression that Yudkowsky was imagining programming a seed AI that could then look upon its own source code with greater cognitive prowess than its programmers, identify obvious flaws, inefficiencies or suboptimal features in its source code and "quickly"[8][9] rewrite itself to greater capabilities (what I'm calling "seed AI flavoured recursive self-improvement"). From "Recursive Self-Improvement" (bolded mine):
The successor AI so produced would be even smarter and better able to make algorithmic/architectural improvements to its source code; this process recurses, leading to an intelligence explosion.
I don't necessarily think this was a well-founded position even within the seed AI paradigm[11][12][13], but whether that was the case is largely behind the point of this question.
I'm mostly curious if RSI is still a plausible hypothesis for deep learning systems.
Challenges
On the Feasibility of Recursive Self Improvement
Trillions to quadrillions of parameter models trained at the cost of tens of millions to billions of dollars do not seem particularly amenable to the kind of RSI Yudkowsky envisioned back in the day. They don't meet any of the criteria for "Seed AI":
Our most powerful models aren't designed but selected for via "search like" processes. They aren't by default well factored or particularly modular[14]. Even with advanced interpretability tools/techniques, it seems like it would be hard to make the kind of modifications/improvements that are possible in well written software programs.
An AI inspecting its mind, identifying flaws/inefficiencies and making nontrivial algorithmic/architectural improvements seems to not be particularly feasible under the deep learning paradigm.
Positive feedback loops of AI development are probable, but they look more like:
In other words, we get recursive self improvement at home. 😉
I'm under the impression that while training more capable successor systems is feasible, doing so would incur considerable economic costs (and thus serve as a taut constraint to the "speed" of takeoff via such feedback loops)[17][18]. It does not seem to necessarily be the case — or even particularly likely — that the above kind of positive feedback loops will lead to discontinuous AI progress.
I think it may be possible that deep learning AGI would eventually be able to look upon its own mind and factorise it[19], but it seems like that level of cognitive capabilities would come well after AI has become existentially dangerous (or otherwise transformative)[20]. I currently do not expect seed AI style RSI to be a component on the critical path to existential catastrophe/"the singularity".
On the Viability of Recursive Self Improvement
My current belief is that even if seed AI flavoured recursive self improvement was viable, the gains that can be eked out that way are not as radical as imagined.
To summarise my position:
My reasons for this belief are largely:
It does not seem to me like recursive self-improvement is relevant in the deep learning paradigm.
Implications
If the aforementioned objections are correct, then insomuch as one's intuitions around foom were rooted in some expectation of recursive self-improvement and insomuch as one believes that the first AGIs will be created within the deep learning paradigm[23] then the inapplicability of RSI to deep learning should update people significantly downwards on the likelihood of hard takeoff/foom[24].
However, that does not seem to have been the case. I get the sense that people are no longer necessarily expecting seed AI flavoured recursive self-improvement, but basically still posit very high confidence in foom[25]. This is prima facie pretty surprising/suspicious as RSI/seed AI was posited as the main reason to expect a hard takeoff. From the wiki page on recursive self-improvement (emphasis mine):
From "Hard Takeoff" (emphasis mine):
Yudkowsky, talking about a situation in which your goals changed but your beliefs about the steps to take did not said (underlines mine):
I feel like the update from deep learning hasn't fully propagated through the belief pools of many in the LW sphere. I think Yudkowsky (and those influenced by his views) should be significantly less confident in hard takeoff, and give more consideration to softer takeoff scenarios; otherwise feels like too much epistemic inertia[26][27].
I think there are other avenues for hard takeoff that don't hinge so strongly on RSI (e.g. hardware/content overhang[28][29][30][31]), but they also seem to be somewhat weakened by the deep learning paradigm[32][33][34] (perhaps especially so if scaling maximalism is true)[35][36][37][38]. That said, my broader scepticism of foom deserves its own top level post.
I am also not persuaded by the justification for foom credulity based on AlphaGo. I don't think AlphaGo is necessarily as strong an indicator for foom as suggested: AlphaGo was able to blow past human performance in the narrow field of Go via 3 days of self-play; it does not seem that general competence in rich domains is similarly amenable to self-play[39][40].
Acknowledgements
I'm grateful to @beren, @janus, @tailcalled, @jessicata, @Alex Vermillion, @clockworklady and others for their valuable feedback on drafts of this post.
I added a new section (and some footnotes) to this post before publishing, but I did not otherwise bother to extensively review the post.
I have solicited feedback from people more knowledgeable on ML than myself and on reflection expect the main theses of this post to be directionally correct.
Nonetheless, I flag statements I am noticeably not confident in.
Many footnotes are more like appendices providing crucial context/elaboration on particular points and not merely authorial side comments such as this.
I also added in relevant commentary from reviewers who are more technically grounded in relevant aspects than myself.
How steep/rapid is it? Alternatively, just how big a discontinuity does the takeoff represent?
It's not particularly clear what people mean by "hard"/"fast" takeoff. From the taxonomy Barnett drew, I use "hard"/"fast" takeoff to refer to (a more generous version) of Yudkowsky/Bostrom's formulations of the term. I.e. local takeoff that unravels over a timespan too short for humans to react (tentatively hours to a few weeks).
[Where "takeoff" can be understood as the time for AI systems to transition from a par human regime to a strongly superhuman regime.]
"A viable" may be an understatement; my impression/cached thoughts is that RSI was presented as the most load bearing component on the path to hard takeoff. E.g. from "Hard Takeoff" (emphasis mine):
Or from "Recursive Self-Improvement":
I think the LW Wiki is a pretty accurate representation of the community's consensus at the time.
The main difference between "seed AI flavoured recursive self-improvement" and the kind of self-improvement that manifests in the deep learning paradigm (covered later) is probably the resource cost (time, computational and economic resources) to each iteration of the feedback process.
Having to spend ever greater economic and computational resources (growing at a superlinear rate) on training successors to attain constant capabilities progress in the feedback cycle will greatly change the dynamics of the flavour of self-improvement that manifests under deep learning. To borrow Yudkowsky's metaphor, such self-improvement seems much less likely to be "prompt critical".
In my terminology: the cumulative returns to cognitive capabilities from the investment of economic/computational resources are sublinear (see:[9]).
See:[17][32]
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
The levels Yudkowsky is referring to:
Fuller context for the bolded:
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
Compared to idiomatic software at least.
tailcalled says:
I think "AI metalearning better optimisation algorithms" and "AI automating AI architecture search" come closest to seed AI flavoured recursive self improvement, but they aren't quite it as it seems like AI creating more capable successor systems than a system improving itself directly. Succession can still induce foom (though see my [poorly raised] objections:[11][12]).
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
beren:
clockworklady:
Other phrasings that convey what I'm trying to.
Recursive self-improvement seems to not be all that likely to play a role in:
* The path to superintelligent systems
* The path to attaining decisive strategic advantage
I.e. I don't think recursive self-improvement is on the critical path of humanity's future; it is in this sense that I don't think it's particularly relevant in the deep learning paradigm.
There are caveats to this. They sound like:
* Often we don't want exact solutions, and we just want good enough approximations
* Or we do not care about deterministic correctness, and correctness with high probability is desired
* Or we do not care about worst case complexity and it's the expected complexity on real world probability distributions that matters
* Or ...
My reply to all such objections of the above form is that as far as I'm aware, these relaxations generally do not take the "effective complexity" of a problem from superlinear to linear or sublinear. Some problems have exponential time complexity even after all these relaxations and most others are generally super quadratic.
That said the "complexity of intelligence" hypothesis does not rule out fast takeoff. Effective intelligence level as a function of time can still grow at a superlinear rate if computational resources grows sufficiently fast. Economic feedback loops could facilitate this.
I do have some scepticism of compute investment driven foom, but this post is about recursive self improvement in particular.
The Metaculus community prediction is at 90%.
A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".
That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).
Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".
I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.
Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.
That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.
E.g. I was left with that impression listening to some of Yudkowsky's recent AI safety writing, and empirically in AI safety conversations I've had online, many people still seem to have high confidence in foom.
Again, if it's indeed the case that deep learning is not particularly amenable to seed AI flavoured recursive self-improvement and they expect AGI to arise within the deep learning paradigm.
Largely, I mean that many people are not making updates I think consistency demands they make given their previously stated beliefs, their reasons for those beliefs, and the available evidence.
Also, see:[24].
Within lifetime human learning seems to be remarkably more sample efficient than our best ML models, so I'm somewhat (a lot?) more sympathetic to the content overhang thesis than the hardware overhang one.
Even so, I don't expect the first AGIs to be anywhere near as sample efficient as the brain (in general biology is way more resource (e.g. energy) efficient than human engineering (bird flight, animal locomotion in general, the brain's "10 -12 watt" energy budget [it seems to be 1+ orders of magnitude more energy efficient than GPUs of comparable processing power and within an order of magnitude of the theoretical limit given its size], etc.) and so I wouldn't be surprised if biology is way more data efficient than our first AIs (and I wouldn't necessarily expect AI to reach biology's data efficiency quickly). See:[29]
For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison.
I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.
A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.
Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain.
Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:[31]).
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
beren:
I am particularly unconfident about this point, and especially with respect to a hardware overhang; it seems to me that scaling maximalism could make a hardware overhang much more likely (see:[36]) as training is often (much? see:[37]) more expensive than inference. But I don't have a solid grasp of how easily/readily trained models can be scaled up to use more compute at inference and how that affects their performance (see:[38]).
I would be grateful if those more technically grounded than myself were to address this point in particular.
beren:
beren:
beren:
At least, not with amounts of compute available in the near term. But I could also be wrong on this; I'm not technically grounded enough here to feel strongly about my intuitions in this regard.
Alternatively, it's not necessarily the case that even if we had unbounded compute, we could specify training objectives that select for general intelligence under iterated self-play. janus suspects this:
beren: