Epistemic status: The idea here has likely been articulated before, I just haven't noticed it, so it might be worth pointing it out again.
Foom describes the idea of a rapid AI takeoff caused by an AI's ability to recursively improve itself. Most discussions about Foom assume that each next iteration of improved models can in principle be developed and deployed in a short amount of time. Current LLMs require huge amounts of data and compute to be trained. Even if GPT-4 or similar models were able to improve their own architecture, they would still need to be trained from scratch using that new architecture. This would take a long time and can't easily be done without people noticing. The most extreme Foom scenarios of models advancing many generations in < 24 hours seem therefore unlikely in the current LLM training paradigm.
There could be paths towards Foom with current LLMs that don't require new, improved models to be trained from scratch:
- A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.
- It's conceivable that the recursive self-improvement that leads to Foom doesn't happen on the level of the base LLM, but on a level above that, where multiple copies of a base model are called in a way that results in emergent behavior or agency, similar to what Auto-GPT is trying to do. I think this approach can potentially go a long way, but it might ultimately limited by how smart the base model is.
Insofar as it is required to train a new model with 100s of billions of parameters from scratch in order to make real progress towards AGI, there is an upper limit to how fast recursive self-improvement can progress.
Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it's possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models "from scratch" is because Humans don't have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that's the bottleneck when it comes to interpretability?
Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then... foom!
Yes, exactly this.
While it's true that this could require "a lot of compute-intensive experiments," that's not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do "Alignment" on other LLMs, as part of their Super Alignment project.
As part of this process, we can expect the Alignment LLM to be "running a lot of compute-intensive experiments" on another LLM. And, the Humans are not likely to have any idea what those "compute-intensive experiments" are doing? They could also be adjusting the other LLM's weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the "Training" LLM... and back and forth, and... foom!
Super-human LLMs running RL(M)F and "alignment" on other LLMs, using only "synthetic" training data....
What could go wrong?