In this post I'll try to show a surprising link between two research topics on LW: game-theoretic cooperation between AIs (quining, Loebian cooperation, modal combat, etc) and stable self-modification of AIs (tiling agents, Loebian obstacle, etc).
When you're trying to cooperate with another AI, you need to ensure that its action will fulfill your utility function. And when doing self-modification, you also need to ensure that the successor AI will fulfill your utility function. In both cases, naive utility maximization doesn't work, because you can't fully understand another agent that's as powerful and complex as you. That's a familiar difficulty in game theory, and in self-modification it's known as the Loebian obstacle (fully understandable successors become weaker and weaker).
In general, any AI will be faced with two kinds of situations. In "single player" situations, you're faced with a choice like eating chocolate or not, where you can figure out the outcome of each action. (Most situations covered by UDT are also "single player", involving identical copies of yourself.) Whereas in "multiplayer" situations your action gets combined with the actions of other agents to determine the outcome. Both cooperation and self-modification are "multiplayer" situations, and are hard for the same reason. When someone proposes a self-modification to you, you might as well evaluate it with the same code that you use for game theory contests.
If I'm right, then any good theory for cooperation between AIs will also double as a theory of stable self-modification for a single AI. That means neither problem can be much easier than the other, and in particular self-modification won't be a special case of utility maximization, as some people seem to hope. But on the plus side, we need to solve one problem instead of two, so creating FAI becomes a little bit easier.
The idea came to me while working on this mathy post on IAFF, which translates some game theory ideas into the self-modification world. For example, Loebian cooperation (from the game theory world) might lead to a solution for the Loebian obstacle (from the self-modification world) - two LW ideas with the same name that people didn't think to combine before!
I imagine merging like this:
1) Bargain about a design for a joint AI, using any means of communication
2) Build it in a location monitored by both parties
3) Gradually transfer all resources to the new AI
4) Both original AIs shut down, new AI fulfills their combined goals
No proof of rationality required. You can design the process so that any deviation will help the opposing side.
I could imagine some failure modes, but surely I can't imagine the best one. For example, "both original AIs shut down" simultaneously is vulnerable for defecting.
I also have some busyness experience, and I found that almost every deal includes some cheating, and the cheating is everytime something new. So I always have to ask myself - where is the cheating from the other side? If don't see it, it's bad, as it could be something really unexpected. Personally, I hate cheating.