Indeed, I find it somewhat notable that high-level arguments for AI risk rarely attend in detail to the specific structure of an AI’s motivational system, or to the sorts of detailed trade-offs a not-yet-arbitrarily-powerful-AI might face in deciding whether to engage in a given sort of problematic power-seeking. [...] I think my power-seeking report is somewhat guilty in this respect; I tried, in my report on scheming, to do better.
Your 2021 report on power-seeking does not appear to discuss the cost-benefit analysis that a misaligned AI would conduct when considering takeover, or the likelihood that this cost-benefit analysis might not favor takeover. Other people have been pointing that out for a long time, and in this post, it seems you’ve come around on that argument and added some details to it.
It's admirable that you've changed your mind in response to new ideas, and it takes a lot of courage to publicly own mistakes. But given the tremendous influence of your report on power-seeking, I think it's worth reflecting more on your update that one of its core arguments may have been incorrect or incomplete.
Most centrally, I'd like to point out that several people have already made versions of the argument presented in this post. Some of them have been directly criticizing your 2021 report on power-seeking. You haven't cited any of them here, but I think it would be worthwhile to recognize their contributions:
2023, about the report: "It is important to separate Likelihood of Goal Satisfaction (LGS) from Goal Pursuit (GP). For suitably sophisticated agents, (LGS) is a nearly trivial claim.
Most agents, including humans, superhumans, toddlers, and toads, would be in a better position to achieve their goals if they had more power and resources under their control... From the fact that wresting power from humanity would help a human, toddler, superhuman or toad to achieve some of their goals, it does not yet follow that the agent is disposed to actually try to disempower all of humanity.
It would therefore be disappointing, to say the least, if Carlsmith were to primarily argue for (LGS) rather than for (ICC-3). However, that appears to be what Carlsmith does...
What we need is an argument that artificial agents for whom power would be useful, and who are aware of this fact are likely to go on to seek enough power to disempower all of humanity. And so far we have literally not seen an argument for this claim."
There are important differences between their arguments and yours, such as your focus on the ease of takeover as the key factor in the cost-benefit analysis. But one central argument is the same: in your words, "even for an AI system that estimates some reasonable probability of success at takeover if it goes for it, the strategic calculus may be substantially more complex."
Why am I pointing this out? Because I think it's worth keeping track of who's been right and who's been wrong in longstanding intellectual debates. Yudkowsky was wrong about takeoff speeds, and Paul was right. Bostrom was wrong about the difficulty of value specification. Given that most people cannot evaluate most debates on the object level (especially debates involving hundreds of pages written by people with PhDs in philosophy), it serves a genuinely useful epistemic function to pay attention to the intellectual track records of people and communities.
Two potential updates here:
Retracted. I apologize for mischaracterizing the report and for the unfair attack on your work.