Suppose there is a threshold of capability beyond which an AI may pose a non-negligible existential risk to humans.
What is the argument against this reasoning: If one AI passes or seems likely to pass this threshold, then humans, to lower x-risk, ought to push other AI past this threshold in light of the following.
1) If only one AI passes this threshold and it works to end humanity either directly or indirectly, humanity has zero chance of survival. If there are other AIs, there is a non-zero chance that they support humanity directly or indirectly, and thus humanity's chance of survival is above zero.
2) Even if, at some point, there is only one AI past this threshold and it presents as aligned, the possibilities of change and deception argue for more AIs to be brought over the threshold, see 1).
3) The game board is already played to an advanced state. If one AI passes the threshold, the social and economic costs of preventing other AIs from making the remaining leap seem very unlikely to result in a net positive return. Thus pushing a second, third, hundredth AI over the threshold would have a higher potential benefit/cost ratio.
Less precisely, if all it takes is one AI to kill us, what are the odds that all it takes is one AI to save us?
I can think of all sorts of entropic/microstate (and not hopeful) answers to that last question, and counterarguments for all of what I said, but what is the standard response?
Links appreciated. I'm sure this has been addressed before; I looked; I can't find what I'm looking for.
It might be difficult for AIs with complicated implicit values to build maximally capable AIs aligned with them. This would motivate them to remain at a lower level of capabilities, taking their time to improve capabilities without an alignment catastrophe. At the same time, AIs with simple explicit values might be in a better position to do that, being able to construct increasingly capable AIs that have the same simple explicit values.
Since AIs aligned with humanity probably need complicated values, the initial shape of a secure aligned equilibrium probably looks more like strong coordination and containment than pervasive maximal capability.