Perhaps they're not different problems. If we solve the decision problem and then the algorithm just replicates itself when we tell it to decide between decision algorithms, this suggests that the philosophy "behind" was actually not behind at all. But one way or the other, you don't want to be stuck with the problem of trying to choose according to a higher criterion using only human wisdom.
The set of quining algorithms is infinite, right? Restricting to the set of quining algorithms rules out many pathological cases, but using human wisdom to choose between them is still unavoidable.
But if there are leftover dilemmas then you haven't hit bottom. Then you want to step back and try to compact further.
How will you know when there are really no leftover dilemmas? In other words, how can you tell that a candidate quining decision algorithm is free of errors? For example, you once thought that it was harmless for a decision algorithm to use the Solomonoff prior (which assumes that the universe is computable). (I hope I've convinced you that's an error.) Such a decision algorithm can certainly be quining, so don't we need a way to prevent this type of error in the future?
I have an intuition that the functional parts can be formalized in a compact way, and that the rest is our own confusion.
Human philosophy is error prone, but also error tolerant and correcting in a way that no decision algorithm that anyone has ever proposed is. If you give a typical decision algorithm the wrong prior, utility function, or implicit theory of fairness, it will happily go on and do its thing without complaint. Restricting to the set of quining algorithms doesn't solve this problem. Human beings can somehow sense such errors and try to correct them. I think this ability needs to be understood and programmed into an AI before it can be considered safe.
Consider this game:
where the last payoff pair is very close to (3,2). I choose a row and you choose a column simultaneously, then I receive the first payoff in a pair and you receive the second. The game has no Nash equilibria in pure strategies, but that's beside the point right now because we drop the competitive setting and go all cooperative: all payoffs are in dollars and transferable, and we're allowed beforehand to sign a mutually binding contract about the play and the division of revenue. The question is, how much shall we win and how should we split it?
Game theory suggests we should convert the competitive game to a coalitional game and compute the Shapley value to divide the spoils. (Or some other solution concept, like the "nucleolus", but let's not go there. Assume for now that the Shapley value is "fair".) The first step is to assign a payoff to each of the 2N = 4 possible coalitions. Clearly, empty coalitions should receive 0, and the grand coalition (me and you) gets the maximum possible sum: 6 dollars. But what payoffs should we assign to the coalition of me and the coalition of you?
Now, there are at least two conflicting approaches to doing this: alpha and beta. The alpha approach says that "the value a coalition can get by itself" is its security value, i.e. the highest value it can win guaranteed if it chooses the strategy first. My alpha value is 2, and yours is 2+ϵ2. The beta approach says that "the value a coalition can get by itself" is the highest value that it cannot be prevented from winning if it chooses its strategy second. My beta value is 3+ϵ1, and yours is 3.
Astute readers already see the kicker: the Shapley value computed from alphas assigns 3-ϵ2/2 dollars to me and 3+ϵ2/2 dollars to you. The Shapley value of betas does the opposite for ϵ1. So who owes whom a penny?
That's disturbing.
Aha, you say. We should have considered mixed strategies when computing alpha and beta values! In fact, if we do so, we'll find that my alpha value equals my beta value and your alpha equals your beta, because that's true for games with mixed strategies in general (a result equivalent to the minimax theorem). My security value is (10+4ϵ1)/(4+ϵ1), and yours is (10-ϵ2)/(4-ϵ2).
This still means the signs of the epsilons determine who owes whom a penny. That's funny because, if you plot the game's payoffs, you will see that the game isn't a quadrilateral like the PD; it's a triangle. And the point (3+ϵ1,2+ϵ2) that determines the outcome, the point that we can ever-so-slightly wiggle to change who of us gets more money... lies inside that triangle. It can be reached by a weighted combination of the other three outcomes.
That's disturbing too.
...
Now, this whole rambling series of posts was spurred by Eliezer's offhand remark about "AIs with knowledge of each other's source code". I formalize the problem thus: all players simultaneously submit programs that will receive everyone else's source code as input and print strategy choices for the game as output. The challenge is to write a good program without running into the halting problem, Rice's theorem or other obstacles.
Without further ado I generalize the procedure described above and present to you an algorithm Freaky Fairness — implementable in an ordinary programming language like Python — that achieves a Nash equilibrium in algorithms and a Pareto optimum simultaneously in any N-player game with transferable utility:
Proof that all players using this algorithm is a Nash equilibrium: any coalition of players that decides to deviate (collectively or individually) cannot win total payoff greater than their group security value, by point 4. If they cooperate, they collectively get no less than their group security value, by superadditivity and construction of Shapley value.
(NB: we have tacitly assumed that all payoffs in the game are positive, so the Shapley value makes sense. If some payoffs are negative, give everyone a million dollars before the game and take them away afterward; both the Shapley value and the minimax survive such manipulations.)
In retrospect the result seems both obvious and startling. Obvious because it closely follows the historically original derivation of the Shapley value. Startling because we're dealing with a class of one-shot competitive games: players enter their programs blindly, striving to maximize only their own payoff. Yet all such games turn out to have Nash equilibria that are Pareto-optimal, and in pure strategies to boot. Pretty neat, huh?
I've seriously doubted whether to post this or not. But there might be mistakes, and many eyes will be more likely to spot them. Critique is welcome!
UPDATE 12.01.2011: benelliott found a stupid mistake in my result, so it's way less applicable than I'd thought. Ouch.