My perception, possibly misperception, is that you are too focused on vague hypotheticals. I believe that it is not unlikely that future tool AI will be based on, or be inspired by (at least partly), previous generations of tool AI that did not turn themselves into agent AIs. I further believe that, instead of speculating about specific failure modes, it would be fruitful to research whether we should expect some sort of black swan event in the development of these systems.
I think the idea around here is to expect a strong discontinuity and almost completely dismiss current narrow AI systems. But this seems like black-and-white thinking to me. I don't think that current narrow AI systems are very similar to your hypothetical superintelligent tools. But I also don't think that it is warranted to dismiss the possibility that we will arrive at those superintelligent tools by incremental improvements of our current systems.
What I am trying to communicate is that it seems much more important to me to technically define at what point you believe tools to turn into agents, rather than using it as a premise for speculative scenarios.
Another point I would like to make is that researching how to create the kind of tool AI you have in mind, and speculating about its failure modes, are completely intervened problems. It seems futile to come up with vague scenarios of how these completely undefined systems might fail, and to expect to gain valuable insights from these speculations.
I also think that it would make sense to talk about this with experts outside of your social circles. Do they believe that your speculations are worthwhile at this point in time? If not, why not?
technically define at what point you believe tools to turn into agents
Just because I haven't posted on this, doesn't mean I haven't been working on it :-) but the work is not yet ready.
In the spirit of "satisficers want to become maximisers" here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that "tool AIs" would want to become agent AIs.
The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.
And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.
In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.