You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:
Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?
Wait, that was the easy version. Here's a harder puzzle:
Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?
This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?
I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.
Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.
Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading one extra comment after posting) that by Lob's Theorem that's not true. But my original idea, that the first statement is all that's required to prove anything, still seems to hold.
Okay, I can follow the first proof when I assume statement 1, but I don't quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can't seem to find it.
In fact, I think I totally misunderstand the Diagonal Lemma, because it seems to me that you could use it to prove anything. If you replace "outputs(1)" by "2==3", the proof still seems to hold. Statement 2 would still be true with "2==3" (it is true about any statement, after all), and all the logic that follows from those two statements would be true. In fact, by an unclearly written chain of reasoning which I originally intended to post before realizing that it would be much simpler to just say this, all you seem to require is the first statement to be able to prove anything. If I am mistaken, which is probable, then I expect my error lies in the assumption that "outputs(1)" could be replaced by any string of code.
For my original unclear explanation of why the first statement in the proof seems to allow anything to be proven, in case anyone cares for it:
Anyone care to point me to my mistake (or, to satisfy my wildest of dreams, say that I appear to be right ;) )?
I don't understand how the first statement can be used to prove anything...
The second statement might be true for every statement, but it's not necessarily provable for every statement, which is required in the proof. In fact, the second statement is provable for "outputs(1)" by inspection of the program (because the program searches for proofs of "outputs(1)"), but not provable for "2==3" (because then "2==3" would be true, by Lob's theorem).