In this post I'll try to show a surprising link between two research topics on LW: game-theoretic cooperation between AIs (quining, Loebian cooperation, modal combat, etc) and stable self-modification of AIs (tiling agents, Loebian obstacle, etc).
When you're trying to cooperate with another AI, you need to ensure that its action will fulfill your utility function. And when doing self-modification, you also need to ensure that the successor AI will fulfill your utility function. In both cases, naive utility maximization doesn't work, because you can't fully understand another agent that's as powerful and complex as you. That's a familiar difficulty in game theory, and in self-modification it's known as the Loebian obstacle (fully understandable successors become weaker and weaker).
In general, any AI will be faced with two kinds of situations. In "single player" situations, you're faced with a choice like eating chocolate or not, where you can figure out the outcome of each action. (Most situations covered by UDT are also "single player", involving identical copies of yourself.) Whereas in "multiplayer" situations your action gets combined with the actions of other agents to determine the outcome. Both cooperation and self-modification are "multiplayer" situations, and are hard for the same reason. When someone proposes a self-modification to you, you might as well evaluate it with the same code that you use for game theory contests.
If I'm right, then any good theory for cooperation between AIs will also double as a theory of stable self-modification for a single AI. That means neither problem can be much easier than the other, and in particular self-modification won't be a special case of utility maximization, as some people seem to hope. But on the plus side, we need to solve one problem instead of two, so creating FAI becomes a little bit easier.
The idea came to me while working on this mathy post on IAFF, which translates some game theory ideas into the self-modification world. For example, Loebian cooperation (from the game theory world) might lead to a solution for the Loebian obstacle (from the self-modification world) - two LW ideas with the same name that people didn't think to combine before!
I'm flashing back to reading Jim Fearon!
Fearon's paper concludes that pretty much only two mechanisms can explain "why rationally led states" would go to war instead of striking a peaceful bargain: private information, and commitment problems.
Your comment brushes off commitment problems in the case of superintelligences, which might turn out to be right. (It's not clear to me that superintelligence entails commitment ability, but nor is it clear that it doesn't entail commitment ability.) I'm less comfortable with setting aside the issue of private information, though.
Assuming rational choice, competing agents are only going to truthfully share information if they have incentives to do so, or at least no incentive not to do so, but in cases where war is a real possibility, I'd expect the incentives to actively encourage secrecy: exaggerating war-making power and/or resolve could allow an agent to drive a harder potential bargain.
You suggest that the ability to precommit could guarantee information sharing, but I feel unease about assuming that without a systematic argument or model. Did Schelling or anybody else formally analyze how that would work? My gut has the sinking feeling that drawing up the implied extensive-form game and solving for equilibrium would produce a non-zero probability of non-commitment, imperfect information exchange, and conflict.
Finally I'll bring in a new point: Fearon's analysis explicitly relies on assuming unitary states. In practice, though, states are multipartite, and if the war-choosing bit of the state can grab most of the benefits from a potential war, while dumping most of the potential costs on another bit of the state, that can enable war. I expect something analogous could produce war between superintelligences, as I don't see why superintelligences have to be unitary agents.
That's a good question and I'm not sure my thinking is right. Let's say two AIs want to go to war for whatever reason. Then they can agree to some other procedure that predicts the outcome of war (e.g. war in 1% of the universe, or simulated war) and precommit to accept it as binding. It seems like both would benefit from that.
That said I agree that bargaining is very tricky. Coming up with an extensive form game might not help, because what if the AIs use a different extensive form game? There's been pretty much no progress on this for a decade, I don't see any viable attack.