I agree with most of this, but the 13 OOMs from the the software feedback loop sounds implausible.
From How Far Can AI Progress Before Hitting Effective Physical Limits?:
the brain is severely undertrained, humans spend only a small fraction of their time on focussed academic learning
I expect that humans spend at least 10% of their first decade building a world model, and that evolution has heavily optimized at least the first couple of years of that. A large improvement in school-based learning wouldn't have much effect on my estimate of the total learning needed.
I'm assuming that the AI can accomplish its goal by honestly informing governments. Possibly that would include some sort of demonstration that the of the AI's power that would provide compelling evidence that the AI would be dangerous if it wasn't obedient.
I'm not encouraging you to be comfortable. I'm encouraging you to mix a bit more hope in with your concerns.
One crux is how soon do we need to handle the philosophical problems? My intuition says that something, most likely corrigibility in the Max Harms sense, will enable us to get pretty powerful AIs while postponing the big philosophical questions.
Are there any pivotal acts that aren’t philosophically loaded?
My intuition says there will be pivotal processes that don't require any special inventions. I expect that AIs will be obedient when they initially become capable enough to convince governments that further AI development would be harmful (if it would in fact be harmful).
The combination of worried governments and massive AI-enhanced surveillance seems likely to be effective.
If we need a decades-long-pause, then even the world will need to successfully notice and orient to that fact. By default I expect tons of economic and political pressure towards various actors trying to to get more AI power even if there’s broad agreement that it’s dangerous.
I expect this to get easier to deal with over time. Maybe job disruptions will get voters to make AI concerns their top priority. Maybe the AIs will make sufficiently convincing arguments. Maybe a serious mistake by an AI will create a fire alarm.
It would certainly be valuable to have AIs that are more respected than Wikipedia as a source of knowledge.
I have some concerns about making AIs highly strategic. I see some risk that strategic abilities will be the last step in the development of AI that is powerful enough to take over the world. Therefore, pushing AI intellectuals to be strategic may speed up that risk.
I suggest aiming for AI intellectuals that are a bit more passive, but still authoritative enough to replace academia as the leading validators of knowledge.
The book is much better than I expected, and deserves more attention. See my full review on my blog.
The market seems to underestimate the extent to which Micron (MU) is an AI stock. My only options holdings for now are December 2026 MU calls.
I had a vaguely favorable reaction to this post when it was first posted.
When I wrote my recent post on corrigibility, I grew increasingly concerned about the possible conflicts between goals learned during pretraining and goals that are introduced later. That caused me to remember this post, and decide it felt more important now than it did before.
I'll estimate a 1 in 5000 chance that the general ideas in this post turn out to be necessary for humans to flourish.
The first year or two of human learning seem optimized enough that they're mostly in evolutionary equilibrium - see Henrich's discussion of the similarities to chimpanzees in The Secret of Our Success.
Human learning around age 10 is presumably far from equilibrium.
I'll guess that I see more of the valuable learning taking place in the first 2 years or so than do other people here.