# AI cooperation in practice

You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:

*Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?*

Wait, that was the easy version. Here's a harder puzzle:

*Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?*

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.

**Edit:** by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

## Comments (157)

BestCousin_it, do you have an argument that n1 ... n6 < nmax? It seems like one way for the proof to fail is if no matter what nmax is, step 11 forces n1 > nmax.

*3 points [-]Only a very rough one, I'm afraid. The number n1 is chosen to "contain" the blow-up from the short proof of 10 to the proof of 11, which grows with the computational complexities of n1,...,nmax, not their values. These computational complexities can be made way smaller than n1.

*4 points [-]Ok, I was checking if there was an obvious flaw that you might have missed. Now I'm trying to actually understand your proof, and I'm getting stuck on the first step, where you invoke the Diagonal Lemma. Now a typical statement of the Diagonal Lemma is:

Diagonal Lemma: If T is a theory in which diag is representable, then for any formula B(x) with exactly one free variable x there is a formula G such that G <=> B(g(G)) is a theorem in T. (g(G) is the Gödel number of G. This was copied from page 2 of http://www.cs.cornell.edu/courses/cs4860/2009sp/lec-23.pdf)

If you look at page 3 of that PDF, it gives an explicit formula for G, which happens to contain an existential quantifier. Now my question is, how do I translate that G into a code string (the one named "box" in your proof)? Also, is this supposed to be obvious, or are you assuming some background knowledge beyond the standard expositions of the Diagonal Lemma and Löb's Theorem?

*2 points [-]You can build "box" by quining, I think. Like this:

Where s contains the source code of the entire snippet. I'd kinda assumed that quining and the diagonal lemma were the same thing, I guess that was wrong.

*2 points [-]Quining is implied by the diagonal lemma, and but you can use the diagonal lemma like you've written. See this webpage on quines to understand the distinction. Wei Dai, it's easy to convert between code strings and Gödel numbers, at least in theory. I don't know about your particular example's numbering, because that link isn't working for me. You may want to find something specific to complexity theory, instead of math, which I suspect you (personally) would find more understandable. The quine page has a brief intro, which includes a proof of the fixed-point theorem.

In this case I think we have B(x) = implies(proves(x, n1), eval(outputs(1))) which is fine, but doesn't bound the size of BOX. However, cousin_it specifies a way to keep BOX small using quining, so this proof seems legit to me. Its conclusion is just as unintuitive as Lob's Theorem, but the steps all look all right.

Edit: For an explicit way to compute fixed points, see the Y combinator.

*1 point [-]You don't need that to be a program for the proof to go through.

Why is this true? I am trying to understand. An obvious way to prove that eval(box) returns true is to go through the execution step by step. But when I write down what box is, it seems to call proves(box, n1) and eval(outputs(1)), which take much more than n1 steps.

*1 point [-]n1 is chosen as to contain the

sizeof the "box" statement, not its execution time.*16 points [-]I notice a curious pattern in this post: technically illiterate critical comments get voted up to +1/+2 (even after getting debunked), while technically accurate arguments (replies) stay at 0/+1. Please vote up only things you know you understand, and do vote up comments that debunk incorrect arguments.

ETA: Mostly corrected now.*6 points [-]I wanted to say that yesterday, but chose instead to stay silent on the meta-issue, try as best I can to clarify the technical issues (though having to explain that there

areactual technical issues was a nasty surprise for me), and refrain from downvoting illiterate critiques because I could be biased.So thanks for saying what I wanted to say - it sounds better when someone else says it.

Actually, someone shot down my proposed my.C = (my.C == your.C) algorithm, because if two prisoners both defect, that is in a mathematical sense "cooperating if and only if you cooperate". So if the other prisoner is Defection Rock, then my.C = (my.C == your.C) has no consistent solution.

*2 points [-]My algorithm defects against the Defection Rock just fine :-)

Does your proof of that take more than 3^^^^3 steps? If not, then doesn't your algorithm see that it defects against Defection Rock, and therefore it cooperates, which is inconsistent? If it does, when did you do that proof?

In other words, it seems to me that played against Defection Rock, your algorithm freezes-- it doesn't output either 1 or 0.

*5 points [-]Here's my proof that A defects against Defection Rock: by assumption, A's proof checker is correct. If A cooperates against the Defection Rock, A must have arrived at a proof that A's choice is equal to B's choice. But A's choice is not equal to Rock's choice. Therefore A's proof checker is incorrect, contradiction, QED.

This proof doesn't take more than 3^^^^3 steps, but it doesn't matter. A can't use it anyway because it can't assume (or prove) that its proof checker is correct. Goedel's second theorem: if a formal system asserts or proves its own consistency, it is inconsistent. That's why all proof systems that people actually use

cannotprove their own consistency.*4 points [-]This comment, and another one ("No, A doesn't know that because it doesn't know B's proof checker is correct.") are buried deep down in the comment thread, but I think they deserve their own mentions in the main post, because they are extremely helpful for the inexperienced reader to understand what the real issues are here.

Nice separation of levels in this comment. It's essential to point out the difference between what can be done by the automated proof-checker and what can be shown 'from the outside' about the operation of the proof-checker.

I second DanielVarga: you should incorporate this discussion into the post itself, since it's

exactlythe part that people get stuck on.That proves that A can't cooperate, but it doesn't prove that it defects, since there remains the possibility that the program doesn't output anything.

*2 points [-]The program is guaranteed to finish because there's a finite number of proofs to check, and checking every proof takes finite time (a proof checker cannot loop forever on a given proof).

So with an argument like the one that A defects, I can prove that your mind is inconsistent:

"Cousin_it will never prove this statement to be true."

I know that this is true, and presumably you know it too... which is a contradiction.

*0 points [-]I don't know that it's true.

And I don't see how this makes the proof wrong.

It doesn't make the proof wrong.

*1 point [-]Oh, sorry then :-)

"Lucas cannot consistently assert this sentence" is an old philosophers' joke, actually.

The informal principle still can find a formalization, and this post gives witness to that.

*2 points [-]Edit: The following is not obviously possible, see this comment.This can be generalized to provide even better algorithms for more general games. Let the following be a simplified outline of the algorithm for cooperation problem, with bounds omitted (players are A and B, our player is A):

The proof that the outcome is "Cooperate" basically hinges on the provability of Löb statement

from which we conclude that A()==B() and so A()=="Cooperate". The statement can be constructed from

and so

Now, we can get basically the same proof if we use a simpler algorithm for the players:

This gets rid of the more general relation, but achieves basically the same result. We can improve on this by defecting against cooperating rocks:

This suggests a general scheme for construction of decision-making algorithms: for all possible actions a and b of players A and B, sort the pairs <a,b> by utility for A, starting from highest, and implement the Löb algorithm for each possibility in turn:

This plays against arbitrary opponents quite well.

*3 points [-]Sorry if I'm being naive, but if we amend the program that way then doesn't it lose the ability to prove the Löb statement implies(proves(P), P) where P = "A()=='Cooperate' && B()=='Cooperate'"? (Due to the unprovability of unprovability.)

*1 point [-]Can this problem be solved by adjusting maximal length of checked proofs such that unprovability of proves("A()=="Defect" && B()="Cooperate"", maxDC) is provable by proves(..., maxsteps)?

Yes, that occurred to me too, and I

thinkit does the job.I wonder if there are theorems of the form "There's no significantly better way to show that P isn't provable in N steps than by listing all proofs of length N and observing that P isn't proved". Then of course maxsteps would need to be exponential in maxDC.

*1 point [-]On the other hand the fact that proves("A()=="Cooperate" && B()=="Cooperate"") was called means that A()=="Defect" && B()=="Cooperate" is unprovable under maxDC proof length. The question is can this fact be used in proof. The proof checker can be programmed to add such self-evident propositions into set of axioms, but I can't clearly see consequences of this. And I have a bad feeling of mixing meta- and object levels.

*1 point [-]I'm not sure if your idea will work, but have some notation to help you avoid mixing the levels. Here's how you say "try to prove this fact conditional on some other fact":

*0 points [-]"P isn't provable in N steps" is in Co-NP. Since in general it's an open question whether P=Co-NP, there might be a polynomial time algorithm to solve it, or their might not, respectively.

On the other hand, if you 1) Showed some large class of particular problems (ones that can reduced to from [no typo there] nonequivalent SAT instances, which I suspect is most of them) weren't provable in N=cP steps IF it's unprovable, then 2) You would have shown a sufficiently large class of SAT problems weren't easily solvable in N=kP steps, 3) which would imply P!=NP (and also P!=Co-NP).

If on the other hand your whole class of problems is unprovable, then you can prove P isn't provable by your theorem, which is presumably smaller than N in this context.

So, in summary my guess interesting classes of P, your question is equivalent to "Does P=NP?", which we know is a tough nut to crack.

*0 points [-]This is true, and it's a bit disturbing that it took so long for someone to point that out. I feel that there should be a way out, but don't see what it is (and finding it might suggest interesting limitations).

*2 points [-]According to my experience, the study of game-playing programs has two independent components: fairness/bargaining and enforcement. If you have a good method of enforcement (quining, löbbing, Wei Dai's joint construction), you can implement almost any concept of fairness. Even though Freaky Fairness relied on quining, I feel you can port it to löbbing easily enough. But fairness in games with non-transferable utility is so difficult that it should really be studied separately, without thinking of a specific enforcement mechanism.

Bargaining is about logical uncertainty, something we avoid using sufficiently high upper bounds in this post.

*1 point [-]Suppose we have two possible actions X,Y and A prefers <X,Y> to <Y,X> to <X,X> to <Y,Y> and B prefers <Y,X> to <X,Y> to <X,X> to <Y,Y>. What will be the outcome if both of them use main4() with their respective orderings and the same proves() function?

*0 points [-]Probably <X,X>, which is not quite optimal. This doesn't solve the bargaining problem, which is exactly what your example is.

*0 points [-]Your idea is similar to preferential voting.

*3 points [-]In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, "if A finds a proof that A returns 1" is ambiguous, because it depends on the exact value of A, not the generic description given.

In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.

In the first version, are you sure? I gave an argument that A will output 1, have you looked at it?

In the second version, you confuse truth and provability. If both programs output 0, that doesn't imply they can

provethat their output will be the same.*0 points [-]You don't specify the implementation of "proves(code, maxsteps)".

The output would appear to depend on that.

Of course, there's the whole business of universal heat death once again...

Nothing depends on the details of

proof verifier, since it completely covers all proofs up to some length, a set of proofs which is completely defined by the logical language. The arguments of proves(-,-) determine its value (and it's even a primitive recursive function).*1 point [-]On the other hand, pathological formal language can be constructed, that makes proof of "main()==1" arbitrary large (e.g. where 1 is represented by 3^^^3 symbols). Thus it is better to be proven that formal language isn't pathological, i.e there is no prover

proves'that satisfies condition*-2 points [-]...but the the "logical language" is not specified! That is one of the implementation details that is missing in the problem specification.

*7 points [-]You know, there are always unstated assumptions, like sanity of the reader.

*-1 points [-]Usually, when posing puzzles, it is best to try to minimise the number of unstated assumptions the solution depends upon - or there won't be a unique answer.Enumeration order of proofs is not specified, so we can construct pathological prover, that reorders proofs such that all proofs of "main()==1" require more steps than maxsteps.

*4 points [-]You can't influence the length of a proof by placing it at a different position in some list of proofs. Parameter 'maxsteps' specifies the maximum length (or Gödel number) of the proofs that get checked, not the maximum number of steps performed by the proof checker algorithm.

Ouch. maxsteps means maxprooflenght. My inattention, sorry.

*2 points [-]The argument is admittedly sketchy, but I don't think it depends much on the implementation of "proves". Of course I'd want the proof system to hold the same properties that are used in the proof of Löb's Theorem, because my argument is almost an exact replica of that; but pretty much all proof systems actually used by people (apart from some extremely weak ones) satisfy these properties. And you're guaranteed not to become worse off if you make your proof system stronger while your opponent stays the same. That's a really big deal, if my argument is indeed correct.

About heat death: even though the two programs say they can iterate up to 3^^^3 or something, the actual time to cooperation will be much less, because the reasoning steps that I outlined are exactly the reasoning steps they'd need to stumble upon. Yes, that would make a proof of length 10000 or so (which would mean 2^10000 proofs tried), but a that's long way from 3^^^3. If the programs use some kind of heuristics to search for working proofs (but still guarantee that they will do the brute force search if heuristics fail), the proofs can be found quicker still.

Re: "apart from some extremely weak ones"

It does depend on the implementation - because it is relatively easy to imagine poor provers that result in a failure to understand and prove anything much. So, you would need to specify some constraints on the prover before you can make statements about what it outputs.

*6 points [-]timtyler: I think you're confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn't have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.

Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof

iswrong.So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don't really need to know their internal details.

(If you already understand this, though, then I'm confused about your confusion. Could happen.)

[edit:] Actually, I think I might see the point you're making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.

It's a nice result, but I doubt it'll ever be practically relevant.

You could implement the algorithm today, although with a limit of less than 3^^^3. Rough guesstimates suggest that about a limit of 50 would be workable at home, 100 for a worldwide distributed computing effort and 500 for a universe-sized computer that does one operation per bit per plank time. In all cases, nowhere near 10000 (let alone 3^^^3).

Heuristics could help a bit, but making the algorithm more complex is likely to also make the proof that it works longer and so harder to find. The fallback to brute-force search is unhelpful because no computer that could actually exist would be able to find the proof that way.

*1 point [-]If program A uses heuristics, program B doesn't need to prove that these heuristics are correct. It only needs to see that A's use of heuristics makes A either output 1 or move on to the brute-force search.

Agreed with the rest of your comment. An easier way to reach the same conclusion: if program B is implemented as "always defect", program A has to run the brute-force search to the end, there's no shortcut (because having any shortcut in the code would make the proofs invalid if B were a copy of A).

I've looked at the argument, and it seems to me that Tim Tyler is right, the first case depends on the implementation of "proves (code, maxsteps)."

Regarding the second case, suppose A runs through checking for proofs, and doesn't find any. It then is about to output a 0. But in this case, it is impossible that B should output a 1, because then B would have a proof that A and B output the same thing, and this would be false. Therefore B also must output a 0. Therefore A and B output the same thing. So we should add a line of code to A to express this: "If A is about to output 0, then output 1, because it is proved (by the above reasoning) that A and B output the same thing." The same code should be included in B, and so both will output 1.

*1 point [-]Your proposed change will make A always output 1, and thus become a rather poor player in the Prisoner's Dilemma. You don't want to cooperate with everybody.

However, B will as well, and you do want to cooperate with everybody who cooperates with you, so my proposed change works very well.

Not if they will cooperate regardless of what you do. Then you can win more by defecting.

A knows B's source code, and so it knows that B will only output a 1 if A and B output the same thing, i.e. only if A and B both output 1.

*3 points [-]No, A doesn't know that because it doesn't know B's proof checker is correct.

You really have to be triple extra careful when talking about this stuff. I'm trying to.

Yes, I see that.

*0 points [-]I replied here.

I changed my mind anyway.

The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).

It's important, however, to be clear on just what constitutes a valid proof. Consider the following implementation:

Not valid because it didn't find a proof, you say? On the contrary, assuming we are dealing with idealized mathematical computers that are perfectly deterministic and have no outside inputs, the fact that a program returns a given value on this run does indeed constitute a proof that it will do so on every run.

The point being that, while you

canuse concepts like proof to draw rigorous conclusions, in the face of counterintuitive things like self reference, you need some experience actually nailing things down before you can be sure you've closed all the loopholes.I'm curious, can the downvoters give any reasons for disagreeing with me?

*5 points [-]I did not downvote (you were at -3 and -2 when I wrote this), but I think the there are many possible reasons to downvote you depending on the voter's implicit or explicit theory of voting.

This might be seen as a pure "puzzle" post, and you aren't offering a puzzle solution that takes the question at face value. Your program doesn't do anything quine-like (like put its own source code into a input variable), nor does it have any logic engine (even by implication).

This is the kind of solution evolution might work out in very certain specific environments where it happened to "just work", but if you put this code into a place where it actually had to "work for a living" by analyzing other programs using roughly the same routines it uses to analyze itself and it gained or lost by modulation of its behavior with them (and some programs were suckers, some were cheaters, and some could be implicitly bargained with, so there were real opportunities to gain or lose) this code would not give you "optimal play ignoring computation costs" (it is either a sucker or a cheater, depending on what "1" means and in some contexts either of these strategies will lose).

Maybe another way of putting the criticism is that your "clever trick" is in noticing that the problem can sort of be solved

in practiceby collapsing down to havenoextra levels of self awareness. But for theoretical robustness inothercircumstances it would be preferable to have a system that scales to infinity, so that agents capable of modeling other agent's models of their own attempts to model other agents can still be "trusted by bad modelers". (Remember that part of the unspoken motivation here is to build skills towards building a verifiably friendly AGI.)Also, the people here are all super smart, so telling them "just be dumber and people will trust you" isn't an answer they want to hear, and saying this (even indirectly) could

maybebe seen as an attempt to troll, or to drag people into a topic area that is too difficult to discus, given LW's current sanity waterline (which is the only reason Ipersonallyvote things below zero).Another possibility is that people don't see the reversed Tortoise and Achilles joke and just think (1) you said something dumb and (2) they should "punish dumbness" in order to maintain LW's purity (IE the signal to noise ratio).

The more reasons to downvote you expose, the more chance you have to accumulate a really low score, and that comment exposed several.

Thanks for the analysis. I'll try to clarify those points, then.

Of course, I'm not suggesting my answer as a solution to the prisoner's dilemma. I actually don't think this kind of mathematical analysis is even relevant to the prisoner's dilemma in practice -- I think the latter is best dealt with by pre-commitments, reputation tracking etc. to change the payoff matrix and in particular change the assumption that the interaction is one-off -- which it typically isn't anyway. (Note that in cases involving actual criminals, we need witness protection programs to make it even approximately one-off.)

As for the logic engine, hey, I just unrolled the search loop then deleted 3^^^^3 - 1 lines of unreachable code. No different than optimizing compilers do every day :-)

If we look at the logic puzzle aspect though, this is in similar vein to

this statement is true(the reverse of the classicthis statement is falseliar paradox -- it has two consistent solutions instead of none -- note thatreturn 0is just as valid a solution asreturn 1).And if you ask what sort of proof system permits that, you start investigating higher-order logic, and self-reference, and the classic paradoxes thereof, and type theory and other possible solutions thereto, which are topics that are of very real significance to software verification and other branches of AI.

In summary, I think this puzzle goes places interesting enough that it's actually worth taking apart and analyzing.

*4 points [-]Actually,

this statement is provable. This is the "Henkin sentence" and it is indeed provable in Peano arithmetic, so no need for stronger things like type theory etc. My proof in 1 is basically a translation of the proofs of the Henkin sentence and Löb's theorem (which is used to prove the former).Yes, in that sense you're right, this is isomorphic to the Henkin sentence, which is provable in first-order Peano arithmetic,

ifyou set everything up just right in terms of precisely chosen axioms and their correspondence to the original code. (Note that there is no guarantee of such correspondence --return 0is, after all, also a correct solution to the problem.)But the metaproof in which you showed this, and the sketch you gave of how you would go about implementing it, are

notwritten in first-order PA. And by the time you have actually written out all the axioms, implemented the proof checker etc., it will be apparent that almost all the real work is not being done on the computer, but in your brain, using methods far more powerful than first-order PA.If we don't regard this as a problem, then we should be willing to accept

return 1as a solution. But if we do regard this as a problem (and I think we should; apart from anything else, it means nearly all problems that would benefit from formal solution, don't actually receive such, because there just isn't that much skilled human labor to go around -- this problem is itself an example of that; you didn't have time to more than sketch the implementation) then we have to climb the ladder of more powerful formal techniques.And once again I will ask the downvoters, which part of my comment do you disagree with?

I upvoted the grandparent and parent, because what you said seems right to me. I wish people wouldn't downvote someone asking why e was downvoted.

Upvoted entire ancestor tree, for similar reasons.

*1 point [-]I don't understand this. Do you mean the proof checker fails to check anything? Wouldn't that be a

wrongimplementation, rather than an obvious one?No, I mean the obvious implementation would have the program basically asking "what does this program return? Well, it tries to find out what this program returns. What does this program return? Well, it tries to find out..." until it hits the length limit and comes back with no solution found (or, if the length limit is lifted, keeps going forever). This is true even if the proof

checkeralways returns quickly.*5 points [-]This is absolutely not the obvious implementation of the algorithm in the post. My program does not "try to find out" something. It mechanically checks all proofs up to some maximum length. You can't just replace a piece of code with another piece that has a vaguely similar intent, but fails to halt!

The 'fails to halt' part in either case would be only if the length limit were removed. But you are right that mechanically checking all proofs up to a maximum length is a mathematically valid implementation and can only be replaced with another piece of code that is guaranteed to give an identical result.

However, that implementation can still be reasonably described as trying to find out something. The intent can be seen not in the simple loop that iterates through all proofs up to a certain length, but in the choice of logic and axioms to encode the problem. The proofs generated from suitably chosen axioms are isomorphic to execution paths of an appropriately written program (as whpearson observed, a way to see this is to consider the Curry-Howard correspondence).

This isn't the same as the calculator that asks itself what value it will return, because there's a subtle separation of levels here.

The Curry-Howard correspondence is what you are referring to right?

*0 points [-]That program does not prove its return value under the Curry-Howard Correspondence without a lot of footwork (see first caveat). Under the CH interpretation, programs are proofs that their types are inhabited, not that they can produce their return values. Thus, the canonical thing one could say the above program proves is "int".

Some caveats. First, one could imagine a sufficiently complex type system where there is a type of all integers which are 1. Then, one could say that the above program is a proof of the type "type-of-ints-that-are-1". However, when adding this much specificity to one's type system, one has to be careful that the resulting system can decidably type terms; adding richness often leads to undecidability.

Second, there is simple constructive proof that the the above program produces 1. Just run it and see; if it returns 1 in finite time, that's a proof.

That is certainly one way to look at it.

Another way to look at it is to consider purely computational proofs, such as the four color theorem, or the solutions to checkers and 5 x 5 Go.

*0 points [-]And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/

Why?

*3 points [-]Your original formulation did not include the "I expect you to" part.

*-1 points [-]Without the very long limit you've added, the easy problem becomes formally undecidable as it leads to an infinite regress. In order to return 1, the program has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and..... You'll use up your 3^^...^^3 iterations and return 0.

It's a bit like this problem. http://en.wikipedia.org/wiki/Halting_problem

*2 points [-]Without the limit, the first program would return 1. My proof keeps working almost without modifications. The only change will be that main() will use a while loop instead of an if, which is only used in deriving (2) and doesn't change its validity. Your argument doesn't show that the program will never find a proof - it only shows that it can never find the

specificproof that you outlined, because it leads to infinite regress or whatever. It will just find another proof instead.OK. Let's suppose you successfully write an algorithm a() that can do everything that you said.

Now let's make some changes. Suppose you were, in the course of your search of proof space, to come across a proof that a() returned 0? You could simply return 0 at that point rather than continuing to search for proofs that a() returned 1. So let's make that change.

Now comes the fun bit. Change your algorithm a() such that it returns 1 when it hits the iteration limit instead of 0. Not that hard.

Now comes the really fun bit. Invert the output of this algorithm. Let's call it b(). This function now has the interesting property that it returns 1 when a() returns 0, and vice versa - except in the case where no proofs can be found of a value for a() short of the termination condition, where both now return 0, It's also interesting that it has exactly the same definition as a() does....

*3 points [-]What Wei Dai said. The first change you propose breaks A's ability to obtain step 2 by simple reasoning about itself. Originally I came up with a different problem: systematically search all proofs up to a large length, and if we find a proof that we output some number, then output that number. That problem is way more crazy than mine, I don't know what the answer is, it might actually depend on the proof checker's implementation.

If you make this change, then step 2 of cousin_it's proof won't go through anymore, I think.

So do you also think that this statement is true: "This statement is true if and only if it can be proven to be true"?

*3 points [-]See this PDF, paragraph 44.3. It constructs exactly the statement you gave (Henkin's statement) and proves it's provable.

Ok, you've convinced me. Saying that this statement is true is basically the same as saying program A outputs 1.

*0 points [-]Yes, exactly. The proof is a little fussy because it has to keep track of bounded proof lengths everywhere, but it works basically the same.

Does it help you understand the second problem better? It uses the same trick, but with a twist. (You need to get creative when you talk about two proof systems instead of one.)

I still agree with what I said originally about the second problem. You could compare it to these two statements:

A) A is true if and only if A and B are either both provable, or both non-provable. B) B is true if and only if B and A are either both provable, or both non-provable.

It is much more obvious that both of these statements are necessarily true and provable (by symmetry), than it is that "this is true if and only if it is provable" is true or provable. This is why at first I only accepted the second case.

*0 points [-]Actually, after thinking about it more, the statement is obviously provable, without the long derivation.

If "this statement is true iff it is provable" is false, then it naturally cannot be provable. But then it is both false and not provable, which logically implies that it is true iff provable. Therefore it is both true and provable.

So the first program has to output 1 for the same reason.

*3 points [-]When you have a self-referential sentence, it does no good to prove it in the "real world", because the commonsense logic we use in the "real world" becomes inconsistent when applied to self-referential statements. ("This statement is false.") You really need a proof of your statement within the formal system. And within the system, the proof you gave doesn't work: the system cannot prove that falsity improves non-provability, because that would be the system assuming its own consistency all over again. So the long derivation is needed after all.

I agree that the system can't use this argument, but that doesn't mean the argument doesn't work, just like your argument for why your algorithm must defect against a program that always defects.

The difference between my argument and "this statement is false" is that by directly depending on its own truth, "this statement is false" leaves its truth or falsity insufficiently defined. But "this statement is true iff it is provable" depends not directly on its truth but on provability, which is already defined. So I think that my "real world" argument does establish that the statement will be true also within a formal system, even though a formal system cannot establish it with this argument.

*1 point [-]Um, "this statement is true iff it is provable" directly depends on

bothtruth and provability. It does refer to its own "platonic" truth value in the "real world".It refers to its truth value but not in a paradox generating way.

*1 point [-]The generalized action idea, applied to this setting, works as follows. Agents A and B can surrender control over their output to procedures they don't fully understand, but expect to produce good results. That they expect good result is reflected, as in previous construction, in the order in which they check for the possibility:

Here, the programs might be unable to prove anything specific about the output of Foo() and Bar(), but can still use the trick to coordinate their simultaneous invocation. This is also one way to look at how to solve the coordination problem.

Note that Foo() and Bar() might as well implement the algorithms similar to what we have in main(), so choosing them is a

high-leveldecision, while the low-level decisions still remain to be madeby them. Deciding hierarchically makes individual proofs, and individual decisions, easier to perform, so that you don't need a ridiculously big upper bound on which proofs to consider. That we can forget how the hierarchical code is written (say, by inlining most functions) also shows the potential for finding similarly simplifying patterns in the more monolithic code.This is awesome - great post!

I've scanned briefly through the comments, and didn't see anyone ask this, but apologies if I missed it:

You use the diagonal lemma to establish that there is a piece of code called "box" such that the following is provable:

eval(box) == implies( proves(box, n1), eval(outputs(1)))

But couldn't there likewise be a "box2" such that:

eval(box2) == implies( proves(box2, n1), eval(outputs(2)))

And then couldn't you patch up the proof somehow to show that program has to output 2? Specifically, to get the analogue of step 2, can't you prove that proves(outputs(2), n) implies the negation of proves(outputs(1), n), and hence prove that proves(outputs(2), n) implies outputs(2)?

Even if this is correct, it only applies to the program you use in your proof, where if proves(outputs(2), n) then 2 is returned; whereas the puzzle above doesn't allow for 2 to be returned at all. However, if I'm right then presumably something must be going wrong in the proof?

*4 points [-]To prove that, the proof system would need to know its own consistency, because an inconsistent proof system could prove

bothof these statements and indeed any statement at all. But a proof system cannot know its own consistency without actually being inconsistent (Goedel's second theorem).*1 point [-]For a finite set of proofs, that a given statement is not provable by a proof from that set, is provable, possibly by a proof from that set.

*2 points [-]Technically true. You could always just evaluate the two statements and there you have your proof, with length exponential in n. I very much doubt the proof could be made shorter than n though, because the statement is equivalent to "you can't find a contradiction in these axioms using less than n steps", the difficulty of which should grow way faster than linearly in n.

Aha - very cunning.

Does the obvious Lobian proof for 1 still go through if there's a bound on the proof length built into all the sentences?

*1 point [-]In the linked text file I tried to do just that, carefully tracking proof lengths and making sure the bounds stay bounded. But this could really use some checking.

The second problem uses the same trick, of course: "if we both prove that we output the same thing, then we output the same thing", then move ahead using "we both prove" instead of "I prove" everywhere.

*0 points [-]Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading

oneextra comment after posting) that by Lob's Theorem that's not true. But my original idea, that the first statement is all that's required to prove anything, still seems to hold.Okay, I can follow the first proof when I assume statement 1, but I don't quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can't seem to find it.

In fact, I think I totally misunderstand the Diagonal Lemma, because it seems to me that you could use it to prove anything. If you replace "outputs(1)" by "2==3", the proof still seems to hold. Statement 2 would still be true with "2==3" (it is true about any statement, after all), and all the logic that follows from those two statements would be true. In fact, by an unclearly written chain of reasoning which I originally intended to post before realizing that it would be much simpler to just say this, all you seem to require is the first statement to be able to prove anything. If I am mistaken, which is probable, then I expect my error lies in the assumption that "outputs(1)" could be replaced by any string of code.

For my original unclear explanation of why the first statement in the proof seems to allow anything to be proven, in case anyone cares for it:

Anyone care to point me to my mistake (or, to satisfy my wildest of dreams, say that I appear to be right ;) )?

*0 points [-]I don't understand how the first statement can be used to prove anything...

The second statement might be true for every statement, but it's not necessarily provable for every statement, which is required in the proof. In fact, the second statement is provable for "outputs(1)" by inspection of the program (because the program searches for proofs of "outputs(1)"), but not provable for "2==3" (because then "2==3" would be true, by Lob's theorem).

I'm sorry, my comment grew into a mess, I should have cleaned it up a bit before posting. Anyway, I agree fully about the second statement only applying to this program, that's what I realized in the edit.

But for the first statement, I'll try to be a bit more clear.

My first claim is that "eval(box) == implies(proves(box, n1), eval('2==3'))" is a true statement, proven by the Diagonal Lemma. I'll refer to it as "statement 1", or "the first statement".

If "eval(box)" returns false, then for the first statement to be true, "implies(proves(box, n1), eval('2==3'))" must return false. "implies" only returns false if "proves(box, n1)" is true and "eval('2==3')" is false. Therefore for statement 1 to be true when "eval(box)" is false, then "proves(box, n1)" has to be true, which is a contradiction: "eval(box)" can't be false and also provably true. Therefore, "eval(box)" must be true.

So let's say "eval(box)" is true, that means that "implies(proves(box, n1), eval('2==3'))" must also return true for statement 1 to be true. One way for the "implies" statement to return true is for "proves(box, n1)" to return false. Since I have proven above that "eval(box)" is true, any good definition of the "proves" function will also return true, because if it finds no other, it will at least find my proof. Therefore, "proves(box, n1)" will return true.

So there is only one other way for the "implies" statement to return true: "eval('2==3')" must return true. Therefore, "eval('2==3')" returns true, and it follows from this that 2=3.

So, where did I go wrong?

*2 points [-]Your proof of eval(box) relied on the fact that eval(box) can't be false and also provably true. But that fact is equivalent to consistency of the formal theory, so it can't be proven within the formal theory itself, by Godel's theorem. Of course, since eval(box) is a terminating computation that returns true (your proof of that is correct), the formal theory can eventually prove that by step-by-step simulation. But that proof will be much longer than your proof above, and much longer than n1. In fact, proves(box,n1) will return false, and your comment serves as a nice proof of that by reductio.

Don't get discouraged, it's a very common mistake =)

I think it's possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.

The input to the programs would be a pair (opponent's source, opponent's proof), and the output would be the decision: cooperate or defect.

The algorithm for A would be: Check the B's proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out - cooperate, otherwise - defect.

*0 points [-]If A needs to give B the proof of some theorem that involves both A and B (as in the post), how does it obtain such a proof quickly without knowing the source code of B in advance? And if it's a proof of some theorem involving only A, then what is that theorem and how does the whole setup work?

Let ChecksOut(

code,proof, X) = true iffproofis a proof thatcodeis a function of two parametersPCodeandPProof, which returns "C" if ChecksOut(PCode,PProof).Def A(

code,proof): if(ChecksOut(code,proof)) return "C", otherwise return "D".The definition of ChecksOut is kinda circular, but it should be fixable with some kind of diagonalization. Like:

SeedChecksOut(

code,proof, X) = true iffproofis a proof thatcodeis a function of two parametersPCodeandPProof, which returns "C" if eval(X(PCode,PProof)),ChecksOut(

code,proof) = SeedChecksOut(code,proof, #SeedChecksOut)*0 points [-]Hmm, that's pretty cool, thanks! But it's still not obvious to me that it solves the same problem as the algorithm in the post. There are probably many mutually incompatible implementations of ChecksOut, because there are many ways to diagonalize that differ only syntactically. Do you have an argument why they will end up cooperating with each other? It seems to me that they won't, just like different implementations of quining cooperation.

Hmm, yes, you're right, it's quining cooperation all over again - as defined, the programs would only cooperate with opponents that are sufficiently similar syntactically. And they need compatible proof systems in any case. So it's not a solution to the same problem, only a possible optimization.

*0 points [-]I believe Program A in "the easy version" would return 0. Assuming zero run-time errors, its structure implements the logical structure:

nsteps, then A returns 1.nsteps, then A returns 0.However

nis defined (the post proposesn= 3^^^^3), it can be shown by the definition of the word "proof" that:nsteps, then A returns 1.and therefore the first proposition holds for every program, and cannot be used to show that A returns 1.

However, the second proposition

alsocannot be used to show that A returns 1. If the given condition holds, A does not return 1; if the given condition does not hold, the second proposition demonstrates nothing.Therefore no property of the program can be used to demonstrate that the program must return 1. Therefore no proof can demonstrate that the program must return 1. Therefore the program will find no proof that the program returns 1. Therefore the program will return 0.

Q.E.D.

IIRC, the modification of Gödel's statement which instead has the interpretation "I

canbe proved in this formal system" is called a Henkin sentence, and does in fact have a finite proof in that system. This seems weird in the intuitive sense you're talking about, but it's actually the case.*1 point [-]Yep. The Henkin sentence has already come up multiple times in the comments here. Understanding the proof of the Henkin sentence takes you about 95% of the way to understanding my original argument, I think.

*1 point [-]...Nope.

This logic stuff is pretty tricky, but I'll try to explain how

exactlythe distinction between truth and provability matters in this problem. It was quite an epiphany when it hit me.The first proposition indeed holds for every program. But it is

notwhat is used to show that A returns 1. What is really used is this: A can prove that the first proposition holds for A by looking at the source code of A. This is definitely not the case for every program! (Provability-of-provability does not imply provability-of-truth. This is the whole point of Löb's Theorem.) A's proof of the first proposition relies on how the code of A looks - specifically, that it says explicitly "if provable then return 1". This proof would break if the structure of A were changed. For example, if A were searching forboth proofs and disproofsthat it would output 1, it wouldn't be able to prove the first proposition quite so easily, and maybe not at all.I'm still confused, sadly.

Can you follow Eliezer's cartoon proof of Löb's Theorem, linked from the post? It's kind of a prerequisite to understand what's happening here.

I believe follow Eliezer's proof - should I look over your proof again?

*0 points [-]Well, my proof is basically a rewrite of Eliezer's proof in another notation and with tracking of proof lengths. The exact same steps in the exact same sequence. Even the name of the "box" statement is taken from there. So you can try, yeah.