My current best guess: Subsidiarity
I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future
The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.
I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.
I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".
I think the correct solution is to leverage available superintelligence in nice unilateral way:
Here's what Claude 3.5 Sonnet said in response to this prompt:
Me:
What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?
Claude:
Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:
A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:
What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.
___________________________________
I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).
I too like talking things through with Claude, but I don't recommend taking Claude's initial suggestions at face value.
Try following up with a question like:
"Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction?
Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future.
Fine-tuning the AI will ...
Short introduction
Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.
A few ways we might arrive at a multipolar AGI scenario
Potential catastrophes that can be caused by multiple actors having access to AGI
1) Everyone dies directly
2) Everyone dies indirectly
3) Totalitarian dictatorship
What are our best plans for preventing catastrophes like those outlined above, in a multipolar AGI scenario?