Thanks for the analysis. I'll try to clarify those points, then.
Of course, I'm not suggesting my answer as a solution to the prisoner's dilemma. I actually don't think this kind of mathematical analysis is even relevant to the prisoner's dilemma in practice -- I think the latter is best dealt with by pre-commitments, reputation tracking etc. to change the payoff matrix and in particular change the assumption that the interaction is one-off -- which it typically isn't anyway. (Note that in cases involving actual criminals, we need witness protection programs to make it even approximately one-off.)
As for the logic engine, hey, I just unrolled the search loop then deleted 3^^^^3 - 1 lines of unreachable code. No different than optimizing compilers do every day :-)
If we look at the logic puzzle aspect though, this is in similar vein to this statement is true (the reverse of the classic this statement is false liar paradox -- it has two consistent solutions instead of none -- note that return 0 is just as valid a solution as return 1).
And if you ask what sort of proof system permits that, you start investigating higher-order logic, and self-reference, and the classic paradoxes thereof, and type theory and other possible solutions thereto, which are topics that are of very real significance to software verification and other branches of AI.
In summary, I think this puzzle goes places interesting enough that it's actually worth taking apart and analyzing.
Actually, this statement is provable. This is the "Henkin sentence" and it is indeed provable in Peano arithmetic, so no need for stronger things like type theory etc. My proof in 1 is basically a translation of the proofs of the Henkin sentence and Löb's theorem (which is used to prove the former).
You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:
Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?
Wait, that was the easy version. Here's a harder puzzle:
Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?
This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?
I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.
Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.