I'm writing up some math results developed on LW as a paper with the tentative title "Self-referential decision algorithms". Something interesting came up while I was cleaning up the Loebian cooperation result. Namely, how do we say precisely that Loebian cooperation is stable under minor syntactic changes? After all, if we define a "minor change" to program A as a change that preserves A's behavior against any program B, then quining cooperation is just as stable under such "minor changes" by definition. Digging down this rabbit hole, I seem to have found a nice new reformulation of the whole thing.
I will post some sections of my current draft in the comments to this post. Eventually this material is meant to become an academic paper (hopefully), so any comments on math mistakes, notation or tone would be much appreciated! And yeah, I have no clue about academic writing, so you're welcome to tell me that too.
1 Quining cooperation
In this section we will describe a variant of the Prisoner’s Dilemma where the players can achieve mutual cooperation whenever their decision processes are symmetric.
Let G be the following game: player 1 writes a computer program A that accepts the source code of another program B as an argument and returns either the value C (cooperate) or D (defect). At the same time, player 2 writes another such computer program B. The payoffs to players are determined by computing the values A(B) and B(A) (that is, passing each program’s source code to the other program) and interpreting these values as moves in a classical Prisoner’s Dilemma. For example, if A(B)=C and B(A)=D, then player 2 receives a higher payoff because his program defected while player 1’s program cooperated.
Theorem 1: the game G has a Nash equilibrium which leads to the outcome (C,C).
Proof:
Let A be a program that refers to its own source code (using standard quine techniques) and returns the value C if and only if B=A, that is, the source code of B is equal to the source code of A. Otherwise A returns the value D.
The strategy pair (A,A) leads to the outcome (C,C) because A(A)=C. For any other program B≠A, we have A(B)=D, so the strategy pair (A,B) leads to either the outcome (D,C) or (D,D), both are which are worse for the second player than the outcome (C,C). Therefore program A is a best reply to itself, QED.
2 Lobian cooperation
In this section we will continue discussing the Prisoner’s Dilemma variant described in the previous section. We will describe a program that cooperates whenever the opponent’s program is a similar enough implementation of the same algorithm, regardless of irrelevant details like coding style or comments. To achieve that, we will use a proof checker for some formal theory.
A note on language: when talking about programs, statements, or proofs, we implicitly refer to the Godel numbers of those objects. Kleene’s recursion theorem allows us to define self-referential and mutually referential objects, like programs that check proofs about themselves. In this and subsequent sections we will freely use compact jargon, for example, “let A be a program that searches for a proof that A()=1”, which expands to “by Kleene’s recursion theorem, there exists a program A that successively checks for each natural number whether it’s the Godel number of a valid proof in the formal theory T, proving that a program with the same Godel number as A returns the value 1”.
Let G be the game described in section 1. Let T be a formal theory that includes Peano arithmetic. Let F(N) be a computable function from natural numbers to natural numbers, to be defined below (in the proof of Theorem 2.2). For each natural number N, let Coop(N) be the set of all programs A that satisfy the following two conditions:
1) For any program B, A(B)=C implies B(A)=C.
2) The formal theory T can prove the following statement in N symbols or less: “For any program B, if the formal theory T can prove that A(B)=B(A) in F(N) symbols or less, then A(B)=C”.
Theorem 2.1: if the number N is large enough, then the set of programs Coop(N) is non-empty.
Proof:
Let A be a program that enumerates all proofs up to F(N) symbols long in the formal theory T, looking for a proof of the statement “A(B)=B(A)”. If such a proof is found, then program A returns the value C, otherwise D.
If A(B)=C, then A found a proof that A(B)=B(A). Assuming that the formal theory T is sound, we obtain that B(A)=C, so condition 1 holds. The formal theory T can prove the statement in condition 2 by inspection of program A. The length of that proof grows with the number of symbols needed to represent the number N within program A, which is logarithmic in N. Therefore we can choose N to be larger than the length of that proof, thus satisfying condition 2. Then program A is a member of the set Coop(N), QED.
Note that, by an analogous argument, the set Coop(N) also includes different implementations of the same algorithm, as well as other programs that enumerate proofs up to a slightly higher limit than F(N), programs that use theories stronger than the formal theory T, etc.
Theorem 2.2: for any natural number N, if both programs A and B are in the set Coop(N), then the strategy pair (A,B) is a Nash equilibrium of the game G that leads to the outcome (C,C).
Proof:
Since both programs A and B satisfy condition 2 from the definition of Coop(N), the formal theory T can prove the following statement in 2*N symbols or less: “If T can prove that A(B)=B(A)=C in F(N) symbols or less, then A(B)=B(A)=C”. By the bounded Lob’s theorem (see Appendix 1), we can finally define the function F(N)=L(2*N), and obtain that the formal theory T proves that A(B)=B(A)=C. Assuming that the formal theory T is sound, that means the strategy pair (A,B) leads to the outcome (C,C).
To prove that program B is a best reply to program A, let’s assume that some other program B’ is a better reply to A. The only way that can happen is if the strategy pair (A,B’) leads to the outcome (C,D). But program A satisfies condition 1 from the definition of Coop(N), so A(B)=C implies B(A)=C. Contradiction, QED.
(...more to come...)
Appendix 1
Bounded Lob's theorem: for any formal theory T that includes Peano arithmetic, there exists a computable function L(N) such that, for any statement S in the formal theory T and for any natural number N, if T proves in N symbols or less that the statement "T proves S in L(N) symbols or less" implies S, then T proves S.
Proof:
It's awesome that you're writing this up! I'd like to help in any way I can.
I suggest you add a sentence to make it even clearer at the outset that you're discussing Nash equilibria on the level of "deciding which program to submit" rather than the usual usage of "as a program, should you output C or D"; even though you've written it out correctly, a reader who's not already aware of these ideas could easily misinterpret it and be confused.