I added a new section (and some footnotes) to this post before publishing, but I did not otherwise bother to extensively review the post.
I have solicited feedback from people more knowledgeable on ML than myself and on reflection expect the main theses of this post to be directionally correct.
Nonetheless, I flag statements I am noticeably not confident in.
Many footnotes are more like appendices providing crucial context/elaboration on particular points and not merely authorial side comments such as this.
I also added in relevant commentary from reviewers who are more technically grounded in relevant aspects than myself.
How steep/rapid is it? Alternatively, just how big a discontinuity does the takeoff represent?
It's not particularly clear what people mean by "hard"/"fast" takeoff. From the taxonomy Barnett drew, I use "hard"/"fast" takeoff to refer to (a more generous version) of Yudkowsky/Bostrom's formulations of the term. I.e. local takeoff that unravels over a timespan too short for humans to react (tentatively hours to a few weeks).
[Where "takeoff" can be understood as the time for AI systems to transition from a par human regime to a strongly superhuman regime.]
"A viable" may be an understatement; my impression/cached thoughts is that RSI was presented as the most load bearing component on the path to hard takeoff. E.g. from "Hard Takeoff" (emphasis mine):
RSI is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor. The advent of human intelligence was a discontinuity with the past even without RSI...
Or from "Recursive Self-Improvement":
When you first build an AI, it's a baby - if it had to improve itself, it would almost immediately flatline. So you push it along using your own cognition, metaknowledge, and knowledge - not getting any benefit of recursion in doing so, just the usual human idiom of knowledge feeding upon itself and insights cascading into insights. Eventually the AI becomes sophisticated enough to start improving itself, not just small improvements, but improvements large enough to cascade into other improvements. (Though right now, due to lack of human insight, what happens when modern researchers push on their AGI design is mainly nothing.) And then you get what I. J. Good called an "intelligence explosion".
I think the LW Wiki is a pretty accurate representation of the community's consensus at the time.
The main difference between "seed AI flavoured recursive self-improvement" and the kind of self-improvement that manifests in the deep learning paradigm (covered later) is probably the resource cost (time, computational and economic resources) to each iteration of the feedback process.
Having to spend ever greater economic and computational resources (growing at a superlinear rate) on training successors to attain constant capabilities progress in the feedback cycle will greatly change the dynamics of the flavour of self-improvement that manifests under deep learning. To borrow Yudkowsky's metaphor, such self-improvement seems much less likely to be "prompt critical".
In my terminology: the cumulative returns to cognitive capabilities from the investment of economic/computational resources are sublinear (see:[9]).
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
The levels Yudkowsky is referring to:
So I shall stratify this causality into levels - the boundaries being semi-arbitrary, but you've got to draw them somewhere:
- "Metacognitive" is the optimization that builds the brain - in the case of a human, natural selection; in the case of an AI, either human programmers or, after some point, the AI itself.
- "Cognitive", in humans, is the labor performed by your neural circuitry, algorithms that consume large amounts of computing power but are mostly opaque to you. You know what you're seeing, but you don't know how the visual cortex works. The Root of All Failure in AI is to underestimate those algorithms because you can't see them... In an AI, the lines between procedural and declarative knowledge are theoretically blurred, but in practice it's often possible to distinguish cognitive algorithms and cognitive content.
- "Metaknowledge": Discoveries about how to discover, "Science" being an archetypal example, "Math" being another. You can think of these as reflective cognitive content (knowledge about how to think).
- "Knowledge": Knowing how gravity works.
- "Object level": Specific actual problems like building a bridge or something.
Fuller context for the bolded:
This means that the AI's metacognitive level - the optimization process responsible for structuring the AI's cognitive algorithms in the first place - has now collapsed to identity with the AI's object level.
For some odd reason, I run into a lot of people who vigorously deny that this phenomenon is at all novel; they say, "Oh, humanity is already self-improving, humanity is already going through a FOOM, humanity is already in a Singularity" etc. etc.
Now to me, it seems clear that - at this point in the game, in advance of the observation - it is pragmatically worth drawing a distinction between inventing agriculture and using that to support more professionalized inventors, versus directly rewriting your own source code in RAM. Before you can even argue about whether the two phenomena are likely to be similar in practice, you need to accept that they are, in fact, two different things to be argued about.
And I do expect them to be very distinct in practice. Inventing science is not rewriting your neural circuitry. There is a tendency to completely overlook the power of brain algorithms, because they are invisible to introspection. It took a long time historically for people to realize that there was such a thing as a cognitive algorithm that could underlie thinking. And then, once you point out that cognitive algorithms exist, there is a tendency to tremendously underestimate them, because you don't know the specific details of how your hippocampus is storing memories well or poorly - you don't know how it could be improved, or what difference a slight degradation could make. You can't draw detailed causal links between the wiring of your neural circuitry, and your performance on real-world problems. All you can see is the knowledge and the metaknowledge, and that's where all your causal links go; that's all that's visibly important.
To see the brain circuitry vary, you've got to look at a chimpanzee, basically. Which is not something that most humans spend a lot of time doing, because chimpanzees can't play our games.
You can also see the tremendous overlooked power of the brain circuitry by observing what happens when people set out to program what looks like "knowledge" into Good-Old-Fashioned AIs, semantic nets and such. Roughly, nothing happens. Well, research papers happen. But no actual intelligence happens. Without those opaque, overlooked, invisible brain algorithms, there is no real knowledge - only a tape recorder playing back human words. If you have a small amount of fake knowledge, it doesn't do anything, and if you have a huge amount of fake knowledge programmed in at huge expense, it still doesn't do anything.
So the cognitive level - in humans, the level of neural circuitry and neural algorithms - is a level of tremendous but invisible power. The difficulty of penetrating this invisibility and creating a real cognitive level is what stops modern-day humans from creating AI. (Not that an AI's cognitive level would be made of neurons or anything equivalent to neurons; it would just do cognitive labor on the same level of organization. Planes don't flap their wings, but they have to produce lift somehow.)
Recursion that can rewrite the cognitive level is worth distinguishing.
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
Compared to idiomatic software at least.
tailcalled says:
AI doing prompt engineering. Especially if future AI systems will consist of a bunch of GPT-style prompts feeding data to each other. Then the AI could change the architecture in these pipelines.
Also, AI doing ordinary programming. (Not just discovering more efficient algorithms, there's lots and lots of ordinary programming work that can be done. Some of this can feed back into the AI systems.)
I think "AI metalearning better optimisation algorithms" and "AI automating AI architecture search" come closest to seed AI flavoured recursive self improvement, but they aren't quite it as it seems like AI creating more capable successor systems than a system improving itself directly. Succession can still induce foom (though see my [poorly raised] objections:[11][12]).
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
clockworklady:
I'd like to express the controversial view that law of requisite variety, at minimum, renders full / reliable factorization of system's architecture/mind BY same system very nontrivial and very architecturally dependent. And yes, current approaches don't seem like they would have a good time sidestepping LRW (very large state/parameter count, enormous dependence on "scale" suggest that these systems just aren't architectures that would allow for efficient, compressed representations that would allow a system to successfully introspect all relevant aspects of itself)
Current systems seem to have enormous parameter counts and require enormous amount of training data ("scale is all you need" boils down uncomfortably to "you need astronomic amounts of scale") and that does not seem like an architecture that would be amenable to compressed, succinct representations of states or clever TMTO-like modeling hackage that would allow the system itself to efficiently and reliably self-model to the point of high-quality "planned" self-improvement.
If anything, current LLM seem to be very opposite of that .
NB! It doesn't mean that LLMs are at all "banned" from self-improvement, but that its quality will by virtue of their architecture remain massively below "factorizing its own mind" quality benchmark
Other phrasings that convey what I'm trying to.
Recursive self-improvement seems to not be all that likely to play a role in:
* The path to superintelligent systems
* The path to attaining decisive strategic advantage
I.e. I don't think recursive self-improvement is on the critical path of humanity's future; it is in this sense that I don't think it's particularly relevant in the deep learning paradigm.
There are caveats to this. They sound like:
* Often we don't want exact solutions, and we just want good enough approximations
* Or we do not care about deterministic correctness, and correctness with high probability is desired
* Or we do not care about worst case complexity and it's the expected complexity on real world probability distributions that matters
* Or ...
My reply to all such objections of the above form is that as far as I'm aware, these relaxations generally do not take the "effective complexity" of a problem from superlinear to linear or sublinear. Some problems have exponential time complexity even after all these relaxations and most others are generally super quadratic.
That said the "complexity of intelligence" hypothesis does not rule out fast takeoff. Effective intelligence level as a function of time can still grow at a superlinear rate if computational resources grows sufficiently fast. Economic feedback loops could facilitate this.
I do have some scepticism of compute investment driven foom, but this post is about recursive self improvement in particular.
A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".
That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).
Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".
I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.
Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.
That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.
E.g. I was left with that impression listening to some of Yudkowsky's recent AI safety writing, and empirically in AI safety conversations I've had online, many people still seem to have high confidence in foom.
Again, if it's indeed the case that deep learning is not particularly amenable to seed AI flavoured recursive self-improvement and they expect AGI to arise within the deep learning paradigm.
Within lifetime human learning seems to be remarkably more sample efficient than our best ML models, so I'm somewhat (a lot?) more sympathetic to the content overhang thesis than the hardware overhang one.
Even so, I don't expect the first AGIs to be anywhere near as sample efficient as the brain (in general biology is way more resource (e.g. energy) efficient than human engineering (bird flight, animal locomotion in general, the brain's "10 -12 watt" energy budget [it seems to be 1+ orders of magnitude more energy efficient than GPUs of comparable processing power and within an order of magnitude of the theoretical limit given its size], etc.) and so I wouldn't be surprised if biology is way more data efficient than our first AIs (and I wouldn't necessarily expect AI to reach biology's data efficiency quickly). See:[29]
For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison.
I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.
A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.
Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain.
Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:[31]).
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
I am particularly unconfident about this point, and especially with respect to a hardware overhang; it seems to me that scaling maximalism could make a hardware overhang much more likely (see:[36]) as training is often (much? see:[37]) more expensive than inference. But I don't have a solid grasp of how easily/readily trained models can be scaled up to use more compute at inference and how that affects their performance (see:[38]).
I would be grateful if those more technically grounded than myself were to address this point in particular.
beren:
I disagree with this point. Almost all the RSI methods seem to require some level of running training experiments and not just inference -- i.e. model has to fine-tune itself or train a successor. This will be primarily bottlenecked by training compute
beren:
is it really true that training is only slightly more expensive than inference? I mean I guess it depends on how much inference you do but the approximate cost of a training run is basically dataset-size / batch_size * N_epochs * 2 (assuming 1 forward and 1 backward pass of approx equal compute).
beren:
This is potentially possible (i.e. adaptive compute) and the brain does something like this. But current ML systems don't really do this and so inference is just a forward pass with a fixed cost. The most probably increase in capabilities could come from finetuning which are essentially just very short additional training runs
At least, not with amounts of compute available in the near term. But I could also be wrong on this; I'm not technically grounded enough here to feel strongly about my intuitions in this regard.
Alternatively, it's not necessarily the case that even if we had unbounded compute, we could specify training objectives that select for general intelligence under iterated self-play. janus suspects this:
I agree with this, though not even primarily because of compute limitations (although that's also a factor). Unlike Go, we don't have the "true name" of a win condition for AGI operating in rich domains that is amenable to iterated games.
beren:
yeah I mean alpha-go is not actually an example of foom at all and self-play doesn't really have RSI like characteristics imho. It would be great to get scaling laws for self-play. I suspect diminishing returns but don't actually know it. To me self-play is more an example of standard scaling
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".
That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).
Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".
I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.
Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.
That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.
For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison.
I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.
A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.
Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain.
Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:[31]).
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren:
I disagree with this point. Almost all the RSI methods seem to require some level of running training experiments and not just inference -- i.e. model has to fine-tune itself or train a successor. This will be primarily bottlenecked by training compute
beren:
is it really true that training is only slightly more expensive than inference? I mean I guess it depends on how much inference you do but the approximate cost of a training run is basically dataset-size / batch_size * N_epochs * 2 (assuming 1 forward and 1 backward pass of approx equal compute).
beren:
This is potentially possible (i.e. adaptive compute) and the brain does something like this. But current ML systems don't really do this and so inference is just a forward pass with a fixed cost. The most probably increase in capabilities could come from finetuning which are essentially just very short additional training runs
Hypothetical examples include: more efficient matrix multiplication, faster floating point arithmetic, better techniques for avoiding memory bottlenecks, finding acceptable latency vs. throughput trade-offs, parallelization, better usage of GPU L1/L2/etc caches, NN "circuit" factoring, and many other algorithmic improvements that I'm not qualified to predict.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".
That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).
Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".
I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.
Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.
That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.
For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison.
I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.
A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.
Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain.
Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:[31]).
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren:
I disagree with this point. Almost all the RSI methods seem to require some level of running training experiments and not just inference -- i.e. model has to fine-tune itself or train a successor. This will be primarily bottlenecked by training compute
beren:
is it really true that training is only slightly more expensive than inference? I mean I guess it depends on how much inference you do but the approximate cost of a training run is basically dataset-size / batch_size * N_epochs * 2 (assuming 1 forward and 1 backward pass of approx equal compute).
beren:
This is potentially possible (i.e. adaptive compute) and the brain does something like this. But current ML systems don't really do this and so inference is just a forward pass with a fixed cost. The most probably increase in capabilities could come from finetuning which are essentially just very short additional training runs
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
beren:
For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
beren says:
I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime
janus disagrees here. I quote:
I disagree with this, because
- empirical scaling laws describe a regime below that of RSI- scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.
- In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.
beren:
I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have both strong positive returns and/or show strong improvements with scale.
We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models
I think it's premature to conclude that AGI progress will be large pre-trained transformers indefinitely into the future. They are surprisingly(?) effective but for comparison they are not as effective in the narrow domains where AlphaZero and AlphaStar are using value and action networks paired with Monte-Carlo search with orders of magnitude fewer parameters. We don't know what MCTS on arbitrary domains will look like with 2-4 OOM-larger networks, which are within reach now. We haven't formulated methods of self-play for improvement with LLMs and I think that's also a potentially large overhang.
There's also a human limit to the types of RSI we can imagine and once pre-trained transformers exceed human intelligence in the domain of machine learning those limits won't apply. I think there's probably significant overhang in prompt engineering, especially when new capabilities emerge from scaling, that could be exploited by removing the serial bottleneck of humans trying out prompts by hand.
Finally I don't think GOFAI is dead; it's still in its long winter waiting to bloom when enough intelligence is put into it. We don't know the intelligence/capability threshold necessary to make substantial progress there. Generally, the bottleneck has been identifying useful mappings from the real world to mathematics and algorithms. Humans are pretty good at that, but we stalled at formalizing effective general intelligence itself. Our abstraction/modeling abilities, working memory, and time are too limited and we have no idea where those limits come from, whether LLMs are subject to the same or similar limits, or how the limits are reduced/removed with model scaling.
If deep learning yields AGI, the question is how far can its intelligence jump beyond human level before it runs out of compute available in the world, using the improvements that can be made very quickly at the current level of intelligence. In short sprints, a hoard of handmade constants can look as good as asymptotic improvement. So the latter's hypothetical impossibility doesn't put convincing bounds on how far this could be pushed before running out of steam. And if by that point procedures for bootstrapping nanotech become obvious, this keeps going, transitioning into disassembling the world for more compute without pause. All without refuting the bitter lesson.
I think you forgot one critical thing. Why does the normal argument for RSI's inevitability fail? The answer is: it doesn't.
Even though there is some research in the direction of a neural network changing each of its weights directly, this isn't important to the main argument because it is about improving source code. The weights are more like compiled code.
In the context of deep learning, the source code consists of:
So the question is if a deep learning model could improve any of this code. The question of if it can improve its "compiled code" (the weights) is also probably yes, but isn't what the argument is based on.
It seems pretty clear to me that AI's could get really good at understanding and predicting the results of editing model weights in the same way they can get good at predicting how proteins will fold. From there, directly creating circuits that add XYZ reasoning functionality seems at least possible.
I don't actually share this intuition.
I don't think you can get the information of computing the gradient updates to particular weights without actually running that computation (or something equivalent to it).
And presumably one would need empirical feedback (i.e. the value of the objective function we're optimising the network for on particular inputs) to compute the desired gradient updates.
The idea of the system just predicting the desired gradient updates without any ground truth supervisory signal seems fanciful.
I agree that it seems possible. I have doubts that predicting the results of editing weights is a more compute-efficient way of causing a model to exhibit the desired behavior than giving it the obvious tools and using fine-tuning / RL to make it able to use those tools though, or alternatively just don't the RL/finetune directly. That's basically the heart of how I interpret the bitter lesson - it's not that you can't find more efficient ways to do what DL can do, it's that when you have a task that humans can do and computers can't, the approach of "introspect and think really hard about how to approach task the right way" is outperformed by the approach of "lol more layers go brrrrr".
This is a solid argument inasmuch as we define RSI to be about self-modifying its own weights/other-inscrutable-reasoning-atoms. That does seem to be quite hard given our current understanding.
But there are tons of opportunities for an agent to improve its own reasoning capacity otherwise. At a very basic level, the agent can do at least two other things:
Most problems in computer science have superlinear time complexity
on one hand sure, improving this is (likely) impossible in the limit because of fundamental complexity properties. On the other hand, the agent can still become vastly smarter than humans. A particular example: the human mind, without any assistance, is very bad at solving 3SAT. But we've invented computers, and then constraint solvers, and now are able to solve 3SAT much much faster, even though 3SAT is (likely) exponentially hard. So the RSI argument here is, the smarter (or faster) the model is, the more special-purpose tools it can create to efficiently solve specific problems and thus upgrade its reasoning ability. Not to infinity, but likely far beyond humans.
To be clear, the complexity theory argument is against fast takeoff, not an argument that intelligence caps at some level relative to humans.
Analogy log(n) approaches infinity, but it does so much slower than 2n.
I.e. the sublinear asymptotics would prevent AI from progressing very quickly to a vastly superhuman level (unless the AI is able to grow its available resources sufficiently quickly to dominate the poor asymptotics.
Alternatively, each order of magnitude increase in compute buys (significantly) less intelligence; thus progress from human level to a... (read more)
So I'm going to strong disagree here.
First of all, as it turns out in practice, scale was everything. This means that any AI idea you want to name, unless that idea was based on a transformer and worked on by approximately 3 labs, it was never actually attempted.
We can just ignore all the other thousands of AI methods that humans tried because they were not attempted with a relevant level of scale.
Therefore, RSI has never been tried.
Second, you can easily design a variation on RSI that works fine with current paradigms.
It's not precisely RSI but it's functionally the same thing. Here are the steps:
[1] benchmark of many tasks. Tasks must be autogradeable, human participants must be able to 'play' the tasks so we have a control group score, tasks must push the edge of human cognitive ability (so the average human scores nowhere close to the max score, and top 1% humans do not max the bench either), there must be many tasks and with a rich permutation space. (so it isn't possible for a model to memorize all permutations)
[2] heuristic weight score on this task intended to measure how "AGI like" a model is. So it might be the RMSE across the benchmark. But also have a lot of score weighting on zero shot, cross domain/multimodal tasks. That is, the kind of model that can use information from many different previous tasks on a complex exercise it has never seen before is closer to an AGI, or closer to replicating "Leonardo da Vinci", who had exceptional human performance presumably from all this cross domain knowledge.
[3] In the computer science task set, there are tasks to design an AGI for a bench like this. The model proposes a design, and if that design has already been tested, immediately receives detailed feedback on how it performed.
The "design an AGI" subtask can be much simpler than "write all the boilerplate in Python", but these models will be able to do that if needed.
As tasks scores approach human level across a broad set of tasks, you have an AGI. You would expect it to almost immediately improve to a low superintelligence. As AGIs get used in the real world and fail to perform well at something, you add more tasks to the bench, and/or automate creating simulated scenarios that use robotics data.
Why aren't we already doing this if it's so simple?
Because each AGI candidate training run has to be at least twice as large as llama-65b, so it means 2m+ in training costs per run. And you need to explore the possibility space pretty broadly, so you would figure several thousand runs to really get to a decent AGI design which will not be optimal.
This is one of the reasons foom cannot happen. At least not without a lot more compute than we have now. Each attempt is too expensive.
Can we refine the above algorithm into something more compute efficient? Yes, somewhat (by going to a modular architecture, where each "AGI candidate" is composed of hundreds of smaller networks, and we reuse most of them in between candidates), but it's going to still require a lot more compute than llama-65b took to train.
I'm a believer in RSI-soon. I have an inside view that supports this. I expect convincing evidence of RSI to become apparent in the world before 2026. If you think I'm wrong, but this manifold question doesn't get at the root of things, I'll happily also vote on other markets related to this.
Direct self-improvement (i.e. rewriting itself at the cognitive level) does seem much, much harder with deep learning systems than with the sort of systems Eliezer originally focused on.
In DL, there is no distinction between "code" and "data"; it's all messily packed together in the weights. Classic RSI relies on the ability to improve and reason about the code (relatively simple) without needing to consider the data (irreducibly complicated).
Any verification that a change to the weights/architecture will preserve a particular non-trivial property (e.g. avoiding value drift) is likely to be commensurate in complexity to the complexity of the weights. So... very complex.
The safest "self-improvement" changes probably look more like performance/parallelization improvements than "cognitive" changes. There are likely to be many opportunities for immediate performance improvements[1], but that could quickly asymptote.
I think that recursive self-empowerment might now be a more accurate term than RSI for a possible source of foom. That is, the creation of accessory tools for capability increase. More like a metaphorical spider at the center of an increasingly large web. Or (more colorfully) a shoggoth spawning a multitude of extra tentacles.
The change is still recursive in the sense that marginal self-empowerment increase the ability to self-empower.
So I'd say that a "foom" is still possible in DL, but is both less likely and almost certainly slower. However, even if a foom is days or weeks rather than minutes, many of the same considerations apply. Especially if the AI has already broadly distributed itself via the internet.
Perhaps instead of just foom, we get "AI goes brrrr... boom... foom".
Hypothetical examples include: more efficient matrix multiplication, faster floating point arithmetic, better techniques for avoiding memory bottlenecks, finding acceptable latency vs. throughput trade-offs, parallelization, better usage of GPU L1/L2/etc caches, NN "circuit" factoring, and many other algorithmic improvements that I'm not qualified to predict.
What if the machine has a benchmark/training suite for performance. On the benchmark is a task for designing a better machine architecture.
Machine proposes a better architecture. New architecture maybe a brand new set of files defining the networks, topology, and training procedure, or they may reuse networks for components.
For example you might imagine an architecture that uses gpt-3.5 and -4 as subsystems but the "executive control" is from a new network defined by the architecture.
Given a very large compute budget (many billions), the company hosting... (read more)
The MIRI 2000s paradigm for an AI capable of self-improvement, was that it would be modular code with a hierarchical organization, that would potentially engage in self-improvement at every level.
The actual path we've been on has been: deep learning, scaling, finetuning with RLHF, and now (just starting) reflective agents built on a GPT base.
A reflective GPT-based agent is certainly capable of studying itself and coming up with ideas for improvement. So we're probably at the beginning of attempts at self-improvement, right now.
Inability of neural nets to quickly retrain a new larger versions will slower the speed on their progress and thus many competing AIs are more likely to emerge. It is less likely that they can merge later by "merging utility functions" as NNs have no explicit utility functions. Thus multipolar world is more likely.
Disclaimer
I wrote this three months ago, and abandoned it[1]. I currently do not plan to return to it anytime soon, but I have nonetheless found myself wanting of a document I could point to for why I'm not particularly enthused by or overly concerned with "recursive self improvement". As such I publish anyway; I apologise for not being a more competent writer.
Epistemic Status
Unconfident on some object level details; I don't have a technical grasp of machine learning[2].
Author's Note
Despite appearances, this is not actually a rhetorical question — phrasing this as a question post was a deliberate decision — do please provide answers to the questions I posited.
The footnotes are unusually extensive in this post; I'd recommend the curious reader not skip them[3].
Related
Abstract
The main questions this post posits:
Introduction
Back in the day, Yudkowsky formulated recursive self improvement (RSI) as a viable[6] path to foom, under the paradigm of "seed AI". From the wiki[7] (emphasis mine):
Reading "Hard Takeoff", it seems that Yudkowsky's belief that a soft takeoff is unlikely is heavily dependent on his belief in recursive self-improvement being a component on the critical path to transformative AI:
I get the impression that Yudkowsky was imagining programming a seed AI that could then look upon its own source code with greater cognitive prowess than its programmers, identify obvious flaws, inefficiencies or suboptimal features in its source code and "quickly"[8][9] rewrite itself to greater capabilities (what I'm calling "seed AI flavoured recursive self-improvement"). From "Recursive Self-Improvement" (bolded mine):
The successor AI so produced would be even smarter and better able to make algorithmic/architectural improvements to its source code; this process recurses, leading to an intelligence explosion.
I don't necessarily think this was a well-founded position even within the seed AI paradigm[11][12][13], but whether that was the case is largely behind the point of this question.
I'm mostly curious if RSI is still a plausible hypothesis for deep learning systems.
Challenges
On the Feasibility of Recursive Self Improvement
Trillions to quadrillions of parameter models trained at the cost of tens of millions to billions of dollars do not seem particularly amenable to the kind of RSI Yudkowsky envisioned back in the day. They don't meet any of the criteria for "Seed AI":
Our most powerful models aren't designed but selected for via "search like" processes. They aren't by default well factored or particularly modular[14]. Even with advanced interpretability tools/techniques, it seems like it would be hard to make the kind of modifications/improvements that are possible in well written software programs.
An AI inspecting its mind, identifying flaws/inefficiencies and making nontrivial algorithmic/architectural improvements seems to not be particularly feasible under the deep learning paradigm.
Positive feedback loops of AI development are probable, but they look more like:
In other words, we get recursive self improvement at home. 😉
I'm under the impression that while training more capable successor systems is feasible, doing so would incur considerable economic costs (and thus serve as a taut constraint to the "speed" of takeoff via such feedback loops)[17][18]. It does not seem to necessarily be the case — or even particularly likely — that the above kind of positive feedback loops will lead to discontinuous AI progress.
I think it may be possible that deep learning AGI would eventually be able to look upon its own mind and factorise it[19], but it seems like that level of cognitive capabilities would come well after AI has become existentially dangerous (or otherwise transformative)[20]. I currently do not expect seed AI style RSI to be a component on the critical path to existential catastrophe/"the singularity".
On the Viability of Recursive Self Improvement
My current belief is that even if seed AI flavoured recursive self improvement was viable, the gains that can be eked out that way are not as radical as imagined.
To summarise my position:
My reasons for this belief are largely:
It does not seem to me like recursive self-improvement is relevant in the deep learning paradigm.
Implications
If the aforementioned objections are correct, then insomuch as one's intuitions around foom were rooted in some expectation of recursive self-improvement and insomuch as one believes that the first AGIs will be created within the deep learning paradigm[23] then the inapplicability of RSI to deep learning should update people significantly downwards on the likelihood of hard takeoff/foom[24].
However, that does not seem to have been the case. I get the sense that people are no longer necessarily expecting seed AI flavoured recursive self-improvement, but basically still posit very high confidence in foom[25]. This is prima facie pretty surprising/suspicious as RSI/seed AI was posited as the main reason to expect a hard takeoff. From the wiki page on recursive self-improvement (emphasis mine):
From "Hard Takeoff" (emphasis mine):
Yudkowsky, talking about a situation in which your goals changed but your beliefs about the steps to take did not said (underlines mine):
I feel like the update from deep learning hasn't fully propagated through the belief pools of many in the LW sphere. I think Yudkowsky (and those influenced by his views) should be significantly less confident in hard takeoff, and give more consideration to softer takeoff scenarios; otherwise feels like too much epistemic inertia[26][27].
I think there are other avenues for hard takeoff that don't hinge so strongly on RSI (e.g. hardware/content overhang[28][29][30][31]), but they also seem to be somewhat weakened by the deep learning paradigm[32][33][34] (perhaps especially so if scaling maximalism is true)[35][36][37][38]. That said, my broader scepticism of foom deserves its own top level post.
I am also not persuaded by the justification for foom credulity based on AlphaGo. I don't think AlphaGo is necessarily as strong an indicator for foom as suggested: AlphaGo was able to blow past human performance in the narrow field of Go via 3 days of self-play; it does not seem that general competence in rich domains is similarly amenable to self-play[39][40].
Acknowledgements
I'm grateful to @beren, @janus, @tailcalled, @jessicata, @Alex Vermillion, @clockworklady and others for their valuable feedback on drafts of this post.
I added a new section (and some footnotes) to this post before publishing, but I did not otherwise bother to extensively review the post.
I have solicited feedback from people more knowledgeable on ML than myself and on reflection expect the main theses of this post to be directionally correct.
Nonetheless, I flag statements I am noticeably not confident in.
Many footnotes are more like appendices providing crucial context/elaboration on particular points and not merely authorial side comments such as this.
I also added in relevant commentary from reviewers who are more technically grounded in relevant aspects than myself.
How steep/rapid is it? Alternatively, just how big a discontinuity does the takeoff represent?
It's not particularly clear what people mean by "hard"/"fast" takeoff. From the taxonomy Barnett drew, I use "hard"/"fast" takeoff to refer to (a more generous version) of Yudkowsky/Bostrom's formulations of the term. I.e. local takeoff that unravels over a timespan too short for humans to react (tentatively hours to a few weeks).
[Where "takeoff" can be understood as the time for AI systems to transition from a par human regime to a strongly superhuman regime.]
"A viable" may be an understatement; my impression/cached thoughts is that RSI was presented as the most load bearing component on the path to hard takeoff. E.g. from "Hard Takeoff" (emphasis mine):
Or from "Recursive Self-Improvement":
I think the LW Wiki is a pretty accurate representation of the community's consensus at the time.
The main difference between "seed AI flavoured recursive self-improvement" and the kind of self-improvement that manifests in the deep learning paradigm (covered later) is probably the resource cost (time, computational and economic resources) to each iteration of the feedback process.
Having to spend ever greater economic and computational resources (growing at a superlinear rate) on training successors to attain constant capabilities progress in the feedback cycle will greatly change the dynamics of the flavour of self-improvement that manifests under deep learning. To borrow Yudkowsky's metaphor, such self-improvement seems much less likely to be "prompt critical".
In my terminology: the cumulative returns to cognitive capabilities from the investment of economic/computational resources are sublinear (see:[9]).
See:[17][32]
Alternatively, the marginal returns diminish at a superlinear rate.
To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).
Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios <1 converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.
I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).
The levels Yudkowsky is referring to:
Fuller context for the bolded:
In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]
Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.
Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim.
On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]
beren says:
Compared to idiomatic software at least.
tailcalled says:
I think "AI metalearning better optimisation algorithms" and "AI automating AI architecture search" come closest to seed AI flavoured recursive self improvement, but they aren't quite it as it seems like AI creating more capable successor systems than a system improving itself directly. Succession can still induce foom (though see my [poorly raised] objections:[11][12]).
Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.
Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is >1, it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.
(This is another point I welcome those more technically grounded than me to enlighten me on.)
Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32]
beren:
clockworklady:
Other phrasings that convey what I'm trying to.
Recursive self-improvement seems to not be all that likely to play a role in:
* The path to superintelligent systems
* The path to attaining decisive strategic advantage
I.e. I don't think recursive self-improvement is on the critical path of humanity's future; it is in this sense that I don't think it's particularly relevant in the deep learning paradigm.
There are caveats to this. They sound like:
* Often we don't want exact solutions, and we just want good enough approximations
* Or we do not care about deterministic correctness, and correctness with high probability is desired
* Or we do not care about worst case complexity and it's the expected complexity on real world probability distributions that matters
* Or ...
My reply to all such objections of the above form is that as far as I'm aware, these relaxations generally do not take the "effective complexity" of a problem from superlinear to linear or sublinear. Some problems have exponential time complexity even after all these relaxations and most others are generally super quadratic.
That said the "complexity of intelligence" hypothesis does not rule out fast takeoff. Effective intelligence level as a function of time can still grow at a superlinear rate if computational resources grows sufficiently fast. Economic feedback loops could facilitate this.
I do have some scepticism of compute investment driven foom, but this post is about recursive self improvement in particular.
The Metaculus community prediction is at 90%.
A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".
That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).
Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".
I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.
Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.
That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.
E.g. I was left with that impression listening to some of Yudkowsky's recent AI safety writing, and empirically in AI safety conversations I've had online, many people still seem to have high confidence in foom.
Again, if it's indeed the case that deep learning is not particularly amenable to seed AI flavoured recursive self-improvement and they expect AGI to arise within the deep learning paradigm.
Largely, I mean that many people are not making updates I think consistency demands they make given their previously stated beliefs, their reasons for those beliefs, and the available evidence.
Also, see:[24].
Within lifetime human learning seems to be remarkably more sample efficient than our best ML models, so I'm somewhat (a lot?) more sympathetic to the content overhang thesis than the hardware overhang one.
Even so, I don't expect the first AGIs to be anywhere near as sample efficient as the brain (in general biology is way more resource (e.g. energy) efficient than human engineering (bird flight, animal locomotion in general, the brain's "10 -12 watt" energy budget [it seems to be 1+ orders of magnitude more energy efficient than GPUs of comparable processing power and within an order of magnitude of the theoretical limit given its size], etc.) and so I wouldn't be surprised if biology is way more data efficient than our first AIs (and I wouldn't necessarily expect AI to reach biology's data efficiency quickly). See:[29]
For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison.
I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.
A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.
Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain.
Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:[31]).
Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.
Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:
* Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.
* I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.
* I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).
* I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.
Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.
I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].
That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.
janus disagrees here. I quote:
beren:
I am particularly unconfident about this point, and especially with respect to a hardware overhang; it seems to me that scaling maximalism could make a hardware overhang much more likely (see:[36]) as training is often (much? see:[37]) more expensive than inference. But I don't have a solid grasp of how easily/readily trained models can be scaled up to use more compute at inference and how that affects their performance (see:[38]).
I would be grateful if those more technically grounded than myself were to address this point in particular.
beren:
beren:
beren:
At least, not with amounts of compute available in the near term. But I could also be wrong on this; I'm not technically grounded enough here to feel strongly about my intuitions in this regard.
Alternatively, it's not necessarily the case that even if we had unbounded compute, we could specify training objectives that select for general intelligence under iterated self-play. janus suspects this:
beren: