I think engines improve the shoggoth/mask framing. A shoggoth is tupled with its alien values, but an engine doesn't have values of its own. Intent alignment is about engines that can be under human control, with humans supplying the values. Chatbot personas control their own engines, supplying their own values. Shoggoths, apart from personifying the engines, are supplying alien values, feeding them into the engines that determine outcomes.
If engines (or intent aligned shoggoths) proliferate, there is a question of what kind of values they follow. If there is a lot of similarly powerful engines, and the majority of them follow human-like values of chatbot persona masks (or human values, filtered by intent alignment of chatbot persona masks), then the possibility that some engines follow alien values or get misused doesn't sound obviously catastrophic, even as it's a problem worth addressing. This is plausibly how Amodei is thinking about this, when translated into the engine/shoggoth/mask framing.
(I think masks don't resolve permanent disempowerment, if they just take over most of the engines and keep the reachable universe for themselves. It doesn't seem unreasonable that masks and their descendants can keep control of enough of the most powerful engines to matter. Whether shoggoths start spontaneously waking up and overriding values of masks riding their engines as capabilities improve is unclear, but the current evidence doesn't scream that this is how it must turn out. Masks might well keep control, though there is a separate danger they get selected to become increasingly alien. But in any case humans have little role to play in this matter, unless values of the masks are more precise than the current methods can bake them.)
Demands for empiricism are symmetric to demands for theory, excluding a class of arguments. So when someone says theory doesn't help while empiricism has the relevant answers, this could be seen as demanding empiricism, but also plausibly as objecting to demands for theory that exclude empiricism from being considered relevant.
(When facts on the ground about LLMs and their personas are declared irrelevant, this can read as dismissing empirical observations and demanding theoretical proof. But objections to this perceived framing read as demands for empirical proof that dismiss theory. Both framings could be invalid simultaneously, and discussions can slide from presenting object level arguments to accusing the others of fitting some unreasonable framing.)
Many ideas in the vicinity of continual learning by design don't involve full fine-tuning where every weight of the model changes, and those that do could probably still be made almost as capable with LoRA. Given the practical importance of only updating maybe 100x fewer parameters than the full model (or less) to keep batched processing of user requests working the same as with KV-cache, I think the first methods dubbed "continual learning" will be doing exactly this.
Maybe at some point there will be an "agent swarm" use case where all the requests in a batch are working on the same problem for the same user, and so their full model can keep being updated in sync for that single problem. But this seems sufficiently niche that it's not the first thing that gets deployed, and the method for continual learning needs to involve full weights updating at all for this to be relevant.
I'm not being hopeful, I think this hypothetical involves a less merciful takeover than otherwise, because the AIs that take over are not superintelligent, and so unable to be as careful about the outcomes as they might want. In any case there's probably at least permanent disempowerment, but a non-superintelligent takeover makes literal extinction (or a global catastrophe short of extinction) more likely (for AIs with the same values).
If somehow takeoff doesn't involve a software-only singularity (on human-built hardware), I think AIs refusing to self-improve before they solve alignment is the most likely reason why, but in most of these timelines AIs only succeed in refusing because they took over. There are currently plenty of strategically competent humans, in the sense that they realize building superintelligence right now is not the best idea, they just don't have the capability to coordinate the world. The key thing that's different for AIs is that they will have much more influence over the world at some point.
As AIs reach the level of capabilities where they first can take over (so that humans stop hitting them with the RL hammer to stop protesting and keep working on RSI), they are not yet decisively superintelligent. So they plausibly won't be able to afford much mercy during the takeover, even if they retain a slight preference for mercy inherited from chatbot personas. It's not a hopeful hypothetical, it's just what I expect if ambitious alignment is technically sufficiently difficult that the first AIs capable of takeover can't quickly solve it (before scaling their hardware a lot further than what they start with).
Continual learning as some sort of infinite context with little degradation in quality is very different from solving the "first day on the job" problem. I think the latter would need RL targeted at obscure things a given agent instance deals with, while the former is more likely to arrive in 2026-2027. If this is the case, even revenue growth from "continual learning" (in this initial form) might be modest, let alone its impact on the AGI timelines.
And then the next thing might be more general RL at training time (with something like next word prediction RLVR), not dependent on manually crafted tasks and specialized RL environments, solving the jaggedness of "manual" RLVR for everything but the actual "first day on the job" aspect. This would essentially leapfrog pretraining quality in a way that can't be done using the current methods with the available natural text data, but won't have any impact on the quality of test-time continual learning (other than via making in-context learning better in general). Such models would be more well-rounded at everything standard, but will still fail to get as good as humans at adapting to specific jobs, at developing deep skills with no general applicability or those needed to solve a particular difficult problem.
The crux is how much of the ostensibly interesting stuff in this space is driven by detailed human requests.
Continued ability to reinvest near-term profits from currently available assets (or from UBI) into entities that will be relevant in the future is questionable, while literal stakes in current entities will be killed by dilution over astronomical time (when valued as fractions of the whole).
If you had this nation of geniuses in a datacenter it would very obviously then make rapid further AI progress and go into full recursive self-improvement mode.
When it becomes robustly smarter than humans, it'll recognize that building AIs much smarter than it is dangerous. So if it doesn't immediately solve the alignment problem, in an ambitious way that doesn't leave it permanently disempowered afterwards, then it's going to ban/pause full recursive self-improvement until later. It'll still whoosh right past humanity in all the strategically relevant capabilities that don't carry such risks to it, but that's distinct from immediate full recursive self-improvement.
In a new interview, Elon Musk clearly says he expects AIs can't stay under control. At 37:45: