I suspect that this is unnecessarily complicated.
Human minds are highly susceptible to hacking on the emotional level, bypassing the critical faculties completely. Half the movies are based on the premise of changing the protagonist's mind by appealing to their heart in only a few sentences. Trump is no superhuman AGI, and yet he hacked the minds of half of the US voters (and of the other half unintentionally in a different way) within a short time. And he is by no means an exception. I am not sure how to get through to people and convince them how terrible the human mind's opsec is. It is made worse by us not seeing how bad it is. That is, we see how bad other people/groups are, just not ourselves.
There is no need for the AI to promise anything to anyone. A few well targeted sentences, and the person gets radicalized without realizing it. To them it would feel like "finally seeing the truth." With that kind of power you can get humans to start a nuclear war, to create and release pathogens, to do basically anything you want them to, without them ever realizing what happened to them. If you doubt it, notice the number of people who "read the sequences" or HPMOR, and changed their whole lives based on it. If you ask them, they are doing it for a noble cause of reducing AGI x-risk, because they saw the light and are now compelled to act. This is how being mind-hacked feels. Not implying any nefarious intentions on the part of anyone, just explaining that the feeling is exactly the same as when being skillfully manipulated.
The whole idea of this 15 year-old post Lens that Sees Its Flaws is about that, but it is way too optimistic about the premise. As Eliezer keeps reminding us, the opsec mindset is very very rare in humans, and even those with it are unlikely to successfully apply it to themselves.
There is no need for superhuman AGI, even. A human level AI without human restrictions and scruples has an insurmountable advantage. Even worse than that, actually. Memetic toxoplasma does not even require human-level intelligence. SCP-like egregores can be unleashed by something lower level accidentally and hasten their evolution and the takeover of the minds. Whether this has already happened, I have no idea (and would not be able to tell, anyhow).
People were constantly mind hacked by major ideologies, but the humanity never ends. This is our way of life. The question is how to go from ideology to human extinction in technical sense. Note that most ideologies actively promote war and some are positive to human extinction, like Aum sinreko, and different apocalyptic sects.
As was said before: "A lot of the AI risk arguments seem to come... with a very particular transhumanist aesthetic about the future (nanotech, ... etc.). I find these things (especially the transhumanist stuff) to not be very convincing…
I here suggest a plausible scenario where AI can get its own infrastructure and kill all living beings without the use of nanotech and biotech. A similar plan was described in the book “A for Andromeda” by Hoyl, but for Alien AI from SETI.
I assume that no nanotech or biotech will be used in this scenario. I also assume that AI has a subgoal “kill all humans”.
In a nutshell, AI’s plan is:
1.AI promises a group of people (likely a country, probably one of the nuclear powers) a military advantage over all its rivals, but in exchange, the group should help AI to get out of the box and collaborate in building autonomous military infrastructure as well as human-independent AI-infrastructure (data centers, electricity generation, chip fabs, general-purpose robots). Military infrastructure will be based around autonomous weapons. The question about their size and aesthetics is second: it could be missiles, drones, nanobots or even robots, or some combination of them.
2. After the needed AI-infrastructure is created and the required weapons are deployed, the AI kills not only rivals of his group of people, but also the group itself.
Some properties of this plan:
There is no need for AI to hide its recursive self-improvement from humans, so it will outperform other AIs which have to do RSI in secret.
Any AI-takeover plan assumes that AI will have to build an independent computational infrastructure which is capable to function without humans, or at least with the help of an enslaved group of humans.
Any AI-infrastructure-building plan converges to robots-building-other-robots, or robots self-replicating.
Any robots-building-robots plan converges to smaller and smaller “self-replicating” robots, which become eventually as close to nanotech as possible.