Review

Epistemic status: The idea here has likely been articulated before, I just haven't noticed it, so it might be worth pointing it out again.

Foom describes the idea of a rapid AI takeoff caused by an AI's ability to recursively improve itself. Most discussions about Foom assume that each next iteration of improved models can in principle be developed and deployed in a short amount of time. Current LLMs require huge amounts of data and compute to be trained. Even if GPT-4 or similar models were able to improve their own architecture, they would still need to be trained from scratch using that new architecture. This would take a long time and can't easily be done without people noticing. The most extreme Foom scenarios of models advancing many generations in < 24 hours seem therefore unlikely in the current LLM training paradigm.

There could be paths towards Foom with current LLMs that don't require new, improved models to be trained from scratch:

  1. A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.
  2. It's conceivable that the recursive self-improvement that leads to Foom doesn't happen on the level of the base LLM, but on a level above that, where multiple copies of a base model are called in a way that results in emergent behavior or agency, similar to what Auto-GPT is trying to do. I think this approach can potentially go a long way, but it might ultimately limited by how smart the base model is.


Insofar as it is required to train a new model with 100s of billions of parameters from scratch in order to make real progress towards AGI, there is an upper limit to how fast recursive self-improvement can progress.

New Comment
9 comments, sorted by Click to highlight new comments since:

Agreed that the current paradigm is somewhat hard to self-improve. But note that this is a one-time cost rather than a permanent slowdown.

If an AI is better than humans at AI design, and can get the resources to experiment and train successors, it's going to have incentives to design successors that are better at self-improvement than it is. At which point FOOM resumes apace.

Also, in the current paradigm, overhang in one part can allow for sudden progress in other parts. For example, if you have an agent with a big complicated predictive model of the world wrapped in some infrastructure that dictates how it makes plans and takes actions, then if the big complicated predictive model is powerful but the wrapping infrastructure is suboptimal, there can be sudden capability gains by optimizing the surrounding infrastructure.

It's not clear to me that it's necessarily possible to get to a point where a model can achieve rapid self-improvement without expensive training or experimenting. Evolution hasn't figured out a way to substantially reduce the time and resources required for any one human's cognitive development.

I agree that even in the current paradigm there are many paths towards sudden capability gains, like the suboptimal infrastructure scenario you pointed to. I just don't know if I would consider that FOOM, which in my understanding implies rapid recursive self-improvement.

Maybe this is just a technicality. I expect things to advance pretty rapidly from now on with no end in sight. But before we had these huge models, FOOM with very fast recursive self-improvement seemed almost inevitable to me. Now I think that it's possible that model size and training compute put at least some cap on the rate of self-improvement (maybe weeks instead of minutes).

Funny, I had exactly the same thought and was just considering writing a short post on it. So I agree and I do think it's a very relevant model update. Some people probably already updated before. I also agree though with your second point about Auto-GPT and similar peripherals. So it looks like we're in a not-too-fast take-off with humans pretty solidly in the loop for now?

As long as there's no autonomous self-improvement of the core model, maybe an existential event could look like this: GPT-X gets trained and released, open sourcers build unsafe peripherals around it, and one of these peripherals turns out to be sufficiently capable (perhaps by self-improving its peripheral capabilities) to take over the world. Or: GPT-X itself turns out to be powerful enough to take over the world and it does, without self-improvement, just after training.

I'm curious whether in the medium term, AI progress is talent-constrained (intelligence important, self-improvement important) or compute/data-constrained (AI doesn't add much, because intelligence doesn't add much, therefore no self-improvement).

I up-voted your post because I think this is a useful discussion to have, although I am not inclined to use the same argument and my position is more conditional. I learned this lesson from the time I played with GPT-3, which seemed to me as a safe pathway toward AGI, but I failed to anticipate how all the guardrails to scale back deployment were overrun by other concerns, such as profits. It is like taking a safe pathway and incrementally make it more dangerous over time. In the future, I expect something similar to happen to GPT-4, e.g. people develop hardware to put it directly on a box/device and selling it in stores. Not just as a service, but as a tool where the hardware is patented/marketed. For now, it looks like the training is the bottleneck for deployment, but I don't expect this to stay as there are many incentives to bring the training costs down. Also, I think one should be careful about using flaws of architecture as argument against the path toward self-improvement. There could be a corresponding architecture design that provides a work-around that is cheaper. The basic problem is that we only see a limited number of options while the world processes in parallel many more options that are available to a single person.

The post gives an argument for FOOM not happening right away after AGI. I think solid examples of FOOM are superintelligence that fits into modern compute, and as-low-as-human-level intelligence on nanotech-manufactured massive compute. LLMs are fast enough that if they turn AGI and get some specialized hardware from ordinary modern fabs, they can do serial theoretical research 10s to 100s times faster than humans, even if they aren't smarter. Also, they are superhumanly erudite and have read literally everything. No need to redesign them while this is happening, unless that's even more efficient than not doing it.

Which gives decades of AI theory and chip design in a year. Which could buy a lot of training efficiency, possibly enough for the superintelligence FOOM if that's possible directly (in an aligned way), or at least for further acceleration of research if it's not. That further acceleration of research gets it to nanotech, and then compute becomes many OOMs more abundant, very quickly, that's FOOM enough even without superintelligence, though not having superintelligence by that point seems outlandish.

Gary Marcus Yann LeCun describes LLMs as "an off-ramp on the road to AGI," and I'm inclined to agree. LLMs themselves aren't likely to "turn AGI." Each generation of LLMs demonstrates the same fundamental flaws, even as they get better at hiding them.

But I also completely buy the "FOOM even without superintelligence" angle, as well as the argument that they'll speed up AI research by an unpredictable amount.

Current LLMs require huge amounts of data and compute to be trained.


Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it's possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models "from scratch" is because Humans don't have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that's the bottleneck when it comes to interpretability?

Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then... foom!

A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.

Yes, exactly this.

While it's true that this could require "a lot of compute-intensive experiments," that's not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do "Alignment" on other LLMs, as part of their Super Alignment project. 

As part of this process, we can expect the Alignment LLM to be "running a lot of compute-intensive experiments" on another LLM. And, the Humans are not likely to have any idea what those "compute-intensive experiments" are doing? They could also be adjusting the other LLM's weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the "Training" LLM... and back and forth, and... foom!

Super-human LLMs running RL(M)F and "alignment" on other LLMs, using only "synthetic" training data.... 
What could go wrong?

Interesting! I agree that in the current paradigm, Foom seems very unlikely in days. But I predict that soon, we will step out of the LLM paradigm to something that works better. Take coding, GPT-4 is great at coding from only predicting code without any weight updates from experience of trial and error coding like how a human improves at it. I expect it will become possible to take a LLM base model and then train it using RL on tasks of writing full programs/apps/websites... where the feedback comes from executing the code and comparing the results with its expectation. You might be able to create a dataset of websites, for example, and give it the goal of "recreate this" so that the reward can be given autonomously. The LLM process brings
common sense (according to the lead author Bubeck of the sparks of AGI paper in his YouTube Presentation), plausible idea generation, and the ability to look up other people's idea's online. If you add learning from trying out ideas on real tasks like coding full programs, this might go very fast upwards in capability. And in doing this you create an agentic AI that unlike Auto-GPT does learn from experience. 

Hi, writing this while on the go but just throwing it out there, this seems to be Sam Altman’s intent with OpenAI in pursuing fast timelines with slow takeoffs.