AI cooperation in practice

cousin_it

AI cooperation in practice — LessWrong

45 AI cooperation in practice

by cousin_it

30th Jul 2010

1 min read

166

45

You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:

Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?

Wait, that was the easy version. Here's a harder puzzle:

Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.

Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

Decision theoryAI

Personal Blog

45

Mentioned in

122Robust Cooperation in the Prisoner's Dilemma

47I'm open for projects (sort of)

36Decision Theories: A Semi-Formal Analysis, Part III

35Formulas of arithmetic that behave like decision agents

Load More (5/10)

AI cooperation in practice

New Comment

166 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:21 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Vladimir_Nesov16y220

I notice a curious pattern in this post: technically illiterate critical comments get voted up to +1/+2 (even after getting debunked), while technically accurate arguments (replies) stay at 0/+1. Please vote up only things you know you understand, and do vote up comments that debunk incorrect arguments.

ETA: Mostly corrected now.

9cousin_it16y

I wanted to say that yesterday, but chose instead to stay silent on the meta-issue, try as best I can to clarify the technical issues (though having to explain that there are actual technical issues was a nasty surprise for me), and refrain from downvoting illiterate critiques because I could be biased. So thanks for saying what I wanted to say - it sounds better when someone else says it.

[-]Wei Dai16y90

Cousin_it, do you have an argument that n1 ... n6 < nmax? It seems like one way for the proof to fail is if no matter what nmax is, step 11 forces n1 > nmax.

5cousin_it16y

Only a very rough one, I'm afraid. The number n1 is chosen to "contain" the blow-up from the short proof of 10 to the proof of 11, which grows with the computational complexities of n1,...,nmax, not their values. These computational complexities can be made way smaller than n1.

4Wei Dai16y

Ok, I was checking if there was an obvious flaw that you might have missed. Now I'm trying to actually understand your proof, and I'm getting stuck on the first step, where you invoke the Diagonal Lemma. Now a typical statement of the Diagonal Lemma is: Diagonal Lemma: If T is a theory in which diag is representable, then for any formula B(x) with exactly one free variable x there is a formula G such that G <=> B(g(G)) is a theorem in T. (g(G) is the Gödel number of G. This was copied from page 2 of http://www.cs.cornell.edu/courses/cs4860/2009sp/lec-23.pdf) If you look at page 3 of that PDF, it gives an explicit formula for G, which happens to contain an existential quantifier. Now my question is, how do I translate that G into a code string (the one named "box" in your proof)? Also, is this supposed to be obvious, or are you assuming some background knowledge beyond the standard expositions of the Diagonal Lemma and Löb's Theorem?

3cousin_it16y

You can build "box" by quining, I think. Like this: var s = "..."; return implies( proves(s, n1), eval(outputs(1))); Where s contains the source code of the entire snippet. I'd kinda assumed that quining and the diagonal lemma were the same thing, I guess that was wrong.

4apophenia16y

Quining is implied by the diagonal lemma, and but you can use the diagonal lemma like you've written. See this webpage on quines to understand the distinction. Wei Dai, it's easy to convert between code strings and Gödel numbers, at least in theory. I don't know about your particular example's numbering, because that link isn't working for me. You may want to find something specific to complexity theory, instead of math, which I suspect you (personally) would find more understandable. The quine page has a brief intro, which includes a proof of the fixed-point theorem. In this case I think we have B(x) = implies(proves(x, n1), eval(outputs(1))) which is fine, but doesn't bound the size of BOX. However, cousin_it specifies a way to keep BOX small using quining, so this proof seems legit to me. Its conclusion is just as unintuitive as Lob's Theorem, but the steps all look all right. Edit: For an explicit way to compute fixed points, see the Y combinator.

2Vladimir_Nesov16y

You don't need that to be a program for the proof to go through.

0Nisan16y

Why is this true? I am trying to understand. An obvious way to prove that eval(box) returns true is to go through the execution step by step. But when I write down what box is, it seems to call proves(box, n1) and eval(outputs(1)), which take much more than n1 steps.

1cousin_it16y

n1 is chosen as to contain the size of the "box" statement, not its execution time.

[-]Eliezer Yudkowsky16y30

Actually, someone shot down my proposed my.C = (my.C == your.C) algorithm, because if two prisoners both defect, that is in a mathematical sense "cooperating if and only if you cooperate". So if the other prisoner is Defection Rock, then my.C = (my.C == your.C) has no consistent solution.

3cousin_it16y

My algorithm defects against the Defection Rock just fine :-)

0Unknowns16y

Does your proof of that take more than 3^^^^3 steps? If not, then doesn't your algorithm see that it defects against Defection Rock, and therefore it cooperates, which is inconsistent? If it does, when did you do that proof? In other words, it seems to me that played against Defection Rock, your algorithm freezes-- it doesn't output either 1 or 0.

8cousin_it16y

Here's my proof that A defects against Defection Rock: by assumption, A's proof checker is correct. If A cooperates against the Defection Rock, A must have arrived at a proof that A's choice is equal to B's choice. But A's choice is not equal to Rock's choice. Therefore A's proof checker is incorrect, contradiction, QED. This proof doesn't take more than 3^^^^3 steps, but it doesn't matter. A can't use it anyway because it can't assume (or prove) that its proof checker is correct. Goedel's second theorem: if a formal system asserts or proves its own consistency, it is inconsistent. That's why all proof systems that people actually use cannot prove their own consistency.

8DanielVarga16y

This comment, and another one ("No, A doesn't know that because it doesn't know B's proof checker is correct.") are buried deep down in the comment thread, but I think they deserve their own mentions in the main post, because they are extremely helpful for the inexperienced reader to understand what the real issues are here.

1orthonormal16y

Nice separation of levels in this comment. It's essential to point out the difference between what can be done by the automated proof-checker and what can be shown 'from the outside' about the operation of the proof-checker. I second DanielVarga: you should incorporate this discussion into the post itself, since it's exactly the part that people get stuck on.

0Unknowns16y

That proves that A can't cooperate, but it doesn't prove that it defects, since there remains the possibility that the program doesn't output anything.

5cousin_it16y

The program is guaranteed to finish because there's a finite number of proofs to check, and checking every proof takes finite time (a proof checker cannot loop forever on a given proof).

0Unknowns16y

So with an argument like the one that A defects, I can prove that your mind is inconsistent: "Cousin_it will never prove this statement to be true." I know that this is true, and presumably you know it too... which is a contradiction.

0cousin_it16y

I don't know that it's true. And I don't see how this makes the proof wrong.

0Unknowns16y

It doesn't make the proof wrong.

1cousin_it16y

Oh, sorry then :-) "Lucas cannot consistently assert this sentence" is an old philosophers' joke, actually.

2LucasSloan16y

Yes, I can.

2Vladimir_Nesov16y

The informal principle still can find a formalization, and this post gives witness to that.

[-]Unknowns16y30

In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, "if A finds a proof that A returns 1" is ambiguous, because it depends on the exact value of A, not the generic description given.

In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.

4cousin_it16y

In the first version, are you sure? I gave an argument that A will output 1, have you looked at it? In the second version, you confuse truth and provability. If both programs output 0, that doesn't imply they can prove that their output will be the same.

0Unknowns16y

I've looked at the argument, and it seems to me that Tim Tyler is right, the first case depends on the implementation of "proves (code, maxsteps)." Regarding the second case, suppose A runs through checking for proofs, and doesn't find any. It then is about to output a 0. But in this case, it is impossible that B should output a 1, because then B would have a proof that A and B output the same thing, and this would be false. Therefore B also must output a 0. Therefore A and B output the same thing. So we should add a line of code to A to express this: "If A is about to output 0, then output 1, because it is proved (by the above reasoning) that A and B output the same thing." The same code should be included in B, and so both will output 1.

2cousin_it16y

Your proposed change will make A always output 1, and thus become a rather poor player in the Prisoner's Dilemma. You don't want to cooperate with everybody.

0Unknowns16y

However, B will as well, and you do want to cooperate with everybody who cooperates with you, so my proposed change works very well.

5JGWeissman16y

Not if they will cooperate regardless of what you do. Then you can win more by defecting.

0Unknowns16y

A knows B's source code, and so it knows that B will only output a 1 if A and B output the same thing, i.e. only if A and B both output 1.

5cousin_it16y

No, A doesn't know that because it doesn't know B's proof checker is correct. You really have to be triple extra careful when talking about this stuff. I'm trying to.

0Unknowns16y

Yes, I see that.

0[anonymous]16y

Nope, this is another case of confusing truth with provability. A doesn't know B's proof checker is correct, so I don't think it can make that inference.

0[anonymous]16y

Actually, no. You confuse truth and provability for the second (third?) time. A doesn't have a proof that B's proof checker is correct, so it cannot deduce what you just deduced. Please please please, just try to parse any technical material associated with the topic. People have learned to avoid this particular pitfall decades ago. We have tools for that. I am using them heavily here.

0Vladimir_Nesov16y

I replied here.

2Unknowns16y

I changed my mind anyway.

0[anonymous]16y

Nothing depends on the details of proof verifier, since it completely covers all proofs up to some length, a set of proofs which is completely defined by the logical language. The arguments of proves(-,-) determine its value (and it's even a primitive recursive function).

-2timtyler16y

You don't specify the implementation of "proves(code, maxsteps)". The output would appear to depend on that. Of course, there's the whole business of universal heat death once again...

5cousin_it16y

The argument is admittedly sketchy, but I don't think it depends much on the implementation of "proves". Of course I'd want the proof system to hold the same properties that are used in the proof of Löb's Theorem, because my argument is almost an exact replica of that; but pretty much all proof systems actually used by people (apart from some extremely weak ones) satisfy these properties. And you're guaranteed not to become worse off if you make your proof system stronger while your opponent stays the same. That's a really big deal, if my argument is indeed correct. About heat death: even though the two programs say they can iterate up to 3^^^3 or something, the actual time to cooperation will be much less, because the reasoning steps that I outlined are exactly the reasoning steps they'd need to stumble upon. Yes, that would make a proof of length 10000 or so (which would mean 2^10000 proofs tried), but a that's long way from 3^^^3. If the programs use some kind of heuristics to search for working proofs (but still guarantee that they will do the brute force search if heuristics fail), the proofs can be found quicker still.

0[anonymous]16y

It's a nice result, but I doubt it'll ever be practically relevant. You could implement the algorithm today, although with a limit of less than 3^^^3. Rough guesstimates suggest that about a limit of 50 would be workable at home, 100 for a worldwide distributed computing effort and 500 for a universe-sized computer that does one operation per bit per plank time. In all cases, nowhere near 10000 (let alone 3^^^3). Heuristics could help a bit, but making the algorithm more complex is likely to also make the proof that it works longer and so harder to find. The fallback to brute-force search is unhelpful because no computer that could actually exist would be able to find the proof that way.

3cousin_it16y

If program A uses heuristics, program B doesn't need to prove that these heuristics are correct. It only needs to see that A's use of heuristics makes A either output 1 or move on to the brute-force search. Agreed with the rest of your comment. An easier way to reach the same conclusion: if program B is implemented as "always defect", program A has to run the brute-force search to the end, there's no shortcut (because having any shortcut in the code would make the proofs invalid if B were a copy of A).

0timtyler16y

Re: "apart from some extremely weak ones" It does depend on the implementation - because it is relatively easy to imagine poor provers that result in a failure to understand and prove anything much. So, you would need to specify some constraints on the prover before you can make statements about what it outputs.

[-]fiddlemath16y100

timtyler: I think you're confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn't have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.

Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof is wrong.

So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don't really need to know their internal details.

(If you already understand this, though, then I'm confused about your confusion. Could happen.)

[edit:] Actually, I think I might see the point you're making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.

-10timtyler16y

4Vladimir_Nesov16y

2red7516y

On the other hand, pathological formal language can be constructed, that makes proof of "main()==1" arbitrary large (e.g. where 1 is represented by 3^^^3 symbols). Thus it is better to be proven that formal language isn't pathological, i.e there is no prover proves' that satisfies condition \exists n1 \exists n2 \forall code (n1<n2 & proves'(code,n1)=proves(code,n2) )

-2red7516y

Enumeration order of proofs is not specified, so we can construct pathological prover, that reorders proofs such that all proofs of "main()==1" require more steps than maxsteps.

6Vladimir_Nesov16y

You can't influence the length of a proof by placing it at a different position in some list of proofs. Parameter 'maxsteps' specifies the maximum length (or Gödel number) of the proofs that get checked, not the maximum number of steps performed by the proof checker algorithm.

2red7516y

Ouch. maxsteps means maxprooflenght. My inattention, sorry.

-6timtyler16y

[-]Vladimir_Nesov16y20

Edit: The following is not obviously possible, see this comment.

This can be generalized to provide even better algorithms for more general games. Let the following be a simplified outline of the algorithm for cooperation problem, with bounds omitted (players are A and B, our player is A):

function main1() {
    if (proves("A()==B()"))
        return "Cooperate";
    return "Defect";
}

The proof that the outcome is "Cooperate" basically hinges on the provability of Löb statement

proves("A()==B()") => (A()

... (read more)

5AlephNeil16y

Sorry if I'm being naive, but if we amend the program that way then doesn't it lose the ability to prove the Löb statement implies(proves(P), P) where P = "A()=='Cooperate' && B()=='Cooperate'"? (Due to the unprovability of unprovability.)

2red7516y

Can this problem be solved by adjusting maximal length of checked proofs such that unprovability of proves("A()=="Defect" && B()="Cooperate"", maxDC) is provable by proves(..., maxsteps)?

0AlephNeil16y

Yes, that occurred to me too, and I think it does the job. I wonder if there are theorems of the form "There's no significantly better way to show that P isn't provable in N steps than by listing all proofs of length N and observing that P isn't proved". Then of course maxsteps would need to be exponential in maxDC.

2red7516y

On the other hand the fact that proves("A()=="Cooperate" && B()=="Cooperate"") was called means that A()=="Defect" && B()=="Cooperate" is unprovable under maxDC proof length. The question is can this fact be used in proof. The proof checker can be programmed to add such self-evident propositions into set of axioms, but I can't clearly see consequences of this. And I have a bad feeling of mixing meta- and object levels.

2cousin_it16y

I'm not sure if your idea will work, but have some notation to help you avoid mixing the levels. Here's how you say "try to prove this fact conditional on some other fact": function main() { if (proves(outputs(1), n1)) return 1; else if (proves("implies(!proves(outputs(1), n1), eval(outputs(2)))", n2)) return 2; else return 0; }

0[anonymous]16y

OK, so first we try to prove P1 = "A()=="Defect" && B()=="Cooperate"" using some theory T, then on failing that, we try to prove P2 = "A()=="Cooperate" && B()=="Cooperate"" using the augmented theory T' = T + "T cannot prove P1 in maxDC steps". And then T' can prove that "If T' proves P2 then P2 is true". And note that the number of steps doesn't matter any more. So perhaps Nesov's idea was OK all along?

0apophenia16y

"P isn't provable in N steps" is in Co-NP. Since in general it's an open question whether P=Co-NP, there might be a polynomial time algorithm to solve it, or their might not, respectively. On the other hand, if you 1) Showed some large class of particular problems (ones that can reduced to from [no typo there] nonequivalent SAT instances, which I suspect is most of them) weren't provable in N=cP steps IF it's unprovable, then 2) You would have shown a sufficiently large class of SAT problems weren't easily solvable in N=kP steps, 3) which would imply P!=NP (and also P!=Co-NP). If on the other hand your whole class of problems is unprovable, then you can prove P isn't provable by your theorem, which is presumably smaller than N in this context. So, in summary my guess interesting classes of P, your question is equivalent to "Does P=NP?", which we know is a tough nut to crack.

0Vladimir_Nesov16y

This is true, and it's a bit disturbing that it took so long for someone to point that out. I feel that there should be a way out, but don't see what it is (and finding it might suggest interesting limitations).

3cousin_it16y

According to my experience, the study of game-playing programs has two independent components: fairness/bargaining and enforcement. If you have a good method of enforcement (quining, löbbing, Wei Dai's joint construction), you can implement almost any concept of fairness. Even though Freaky Fairness relied on quining, I feel you can port it to löbbing easily enough. But fairness in games with non-transferable utility is so difficult that it should really be studied separately, without thinking of a specific enforcement mechanism.

0Vladimir_Nesov16y

Bargaining is about logical uncertainty, something we avoid using sufficiently high upper bounds in this post.

2Wei Dai16y

Suppose we have two possible actions X,Y and A prefers to to to and B prefers to to to . What will be the outcome if both of them use main4() with their respective orderings and the same proves() function?

0Vladimir_Nesov16y

Probably , which is not quite optimal. This doesn't solve the bargaining problem, which is exactly what your example is.

0[anonymous]16y

Most likely , but possibly or if one of the agents manages to "trick" the other.

0cousin_it16y

Your idea is similar to preferential voting.

[-]Eliezer Yudkowsky16y20

Does the obvious Lobian proof for 1 still go through if there's a bound on the proof length built into all the sentences?

2cousin_it16y

In the linked text file I tried to do just that, carefully tracking proof lengths and making sure the bounds stay bounded. But this could really use some checking. The second problem uses the same trick, of course: "if we both prove that we output the same thing, then we output the same thing", then move ahead using "we both prove" instead of "I prove" everywhere.

[-]Vladimir_Nesov16y10

The generalized action idea, applied to this setting, works as follows. Agents A and B can surrender control over their output to procedures they don't fully understand, but expect to produce good results. That they expect good result is reflected, as in previous construction, in the order in which they check for the possibility:

function main_A() {
    if (proves("A()==Foo() && B()==Bar()"))
        return Foo();
    ...
}

function main_B() {
    if (proves("A()==Foo() && B()==Bar()"))
        return Bar();
    ...
}

H... (read more)

[-]AlephNeil16y10

This is awesome - great post!

I've scanned briefly through the comments, and didn't see anyone ask this, but apologies if I missed it:

You use the diagonal lemma to establish that there is a piece of code called "box" such that the following is provable:

eval(box) == implies( proves(box, n1), eval(outputs(1)))

But couldn't there likewise be a "box2" such that:

eval(box2) == implies( proves(box2, n1), eval(outputs(2)))

And then couldn't you patch up the proof someh... (read more)

6cousin_it16y

To prove that, the proof system would need to know its own consistency, because an inconsistent proof system could prove both of these statements and indeed any statement at all. But a proof system cannot know its own consistency without actually being inconsistent (Goedel's second theorem).

1Vladimir_Nesov16y

For a finite set of proofs, that a given statement is not provable by a proof from that set, is provable, possibly by a proof from that set.

3cousin_it16y

Technically true. You could always just evaluate the two statements and there you have your proof, with length exponential in n. I very much doubt the proof could be made shorter than n though, because the statement is equivalent to "you can't find a contradiction in these axioms using less than n steps", the difficulty of which should grow way faster than linearly in n.

0[anonymous]16y

Good catch! Yes, my argument only shows that there's no short easy proof. You can always just evaluate the two statements and there you have your proof. But that has length exponential in nmax. I'd be amazed if "1=2 is unprovable with length < nmax" were provable with length < nmax, which is pretty much the same statement. This is a very very strong statement. It can also be reformulated as "you cannot find a contradiction in the theory using less than nmax steps".

0AlephNeil16y

Aha - very cunning.

[-]AnotherIdiot14y00

Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading one extra comment after posting) that by Lob's Theorem that's not true. But my original idea, that the first statement is all that's required to prove anything, still seems to hold.

Okay, I can follow the first proof when I assume statement 1, but I don't quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can't seem to find it.

In fact, I think I totally misunderstand the D... (read more)

0cousin_it14y

I don't understand how the first statement can be used to prove anything... The second statement might be true for every statement, but it's not necessarily provable for every statement, which is required in the proof. In fact, the second statement is provable for "outputs(1)" by inspection of the program (because the program searches for proofs of "outputs(1)"), but not provable for "2==3" (because then "2==3" would be true, by Lob's theorem).

0AnotherIdiot14y

I'm sorry, my comment grew into a mess, I should have cleaned it up a bit before posting. Anyway, I agree fully about the second statement only applying to this program, that's what I realized in the edit. But for the first statement, I'll try to be a bit more clear. My first claim is that "eval(box) == implies(proves(box, n1), eval('2==3'))" is a true statement, proven by the Diagonal Lemma. I'll refer to it as "statement 1", or "the first statement". If "eval(box)" returns false, then for the first statement to be true, "implies(proves(box, n1), eval('2==3'))" must return false. "implies" only returns false if "proves(box, n1)" is true and "eval('2==3')" is false. Therefore for statement 1 to be true when "eval(box)" is false, then "proves(box, n1)" has to be true, which is a contradiction: "eval(box)" can't be false and also provably true. Therefore, "eval(box)" must be true. So let's say "eval(box)" is true, that means that "implies(proves(box, n1), eval('2==3'))" must also return true for statement 1 to be true. One way for the "implies" statement to return true is for "proves(box, n1)" to return false. Since I have proven above that "eval(box)" is true, any good definition of the "proves" function will also return true, because if it finds no other, it will at least find my proof. Therefore, "proves(box, n1)" will return true. So there is only one other way for the "implies" statement to return true: "eval('2==3')" must return true. Therefore, "eval('2==3')" returns true, and it follows from this that 2=3. So, where did I go wrong?

2cousin_it14y

Your proof of eval(box) relied on the fact that eval(box) can't be false and also provably true. But that fact is equivalent to consistency of the formal theory, so it can't be proven within the formal theory itself, by Godel's theorem. Of course, since eval(box) is a terminating computation that returns true (your proof of that is correct), the formal theory can eventually prove that by step-by-step simulation. But that proof will be much longer than your proof above, and much longer than n1. In fact, proves(box,n1) will return false, and your comment serves as a nice proof of that by reductio. Don't get discouraged, it's a very common mistake =)

[-]gRR14y00

I think it's possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.

The input to the programs would be a pair (opponent's source, opponent's proof), and the output would be the decision: cooperate or defect.

The algorithm for A would be: Check the B's proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out - cooperate, otherwise - defect.

0cousin_it14y

If A needs to give B the proof of some theorem that involves both A and B (as in the post), how does it obtain such a proof quickly without knowing the source code of B in advance? And if it's a proof of some theorem involving only A, then what is that theorem and how does the whole setup work?

0gRR14y

Let ChecksOut(code, proof, X) = true iff proof is a proof that code is a function of two parameters PCode and PProof, which returns "C" if ChecksOut(PCode, PProof). Def A(code, proof): if(ChecksOut(code, proof)) return "C", otherwise return "D". The definition of ChecksOut is kinda circular, but it should be fixable with some kind of diagonalization. Like: SeedChecksOut(code, proof, X) = true iff proof is a proof that code is a function of two parameters PCode and PProof, which returns "C" if eval(X(PCode, PProof)), ChecksOut(code, proof) = SeedChecksOut(code, proof, #SeedChecksOut)

0cousin_it14y

Hmm, that's pretty cool, thanks! But it's still not obvious to me that it solves the same problem as the algorithm in the post. There are probably many mutually incompatible implementations of ChecksOut, because there are many ways to diagonalize that differ only syntactically. Do you have an argument why they will end up cooperating with each other? It seems to me that they won't, just like different implementations of quining cooperation.

0gRR14y

Hmm, yes, you're right, it's quining cooperation all over again - as defined, the programs would only cooperate with opponents that are sufficiently similar syntactically. And they need compatible proof systems in any case. So it's not a solution to the same problem, only a possible optimization.

[-]rwallace16y00

The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).

It's important, however, to be clear on just what constitutes a valid proof. Consider the following implementation:

main() {return 1;}

Not valid because it didn't find a proof, you say? On the contrary, assuming we are dealing with idealized mathematical computers that are perfectly deterministic and have no outside inputs, the fact that a program returns a given value on this run does indeed constitute a proof that it will do so on... (read more)

2cousin_it16y

I don't understand this. Do you mean the proof checker fails to check anything? Wouldn't that be a wrong implementation, rather than an obvious one?

-1rwallace16y

No, I mean the obvious implementation would have the program basically asking "what does this program return? Well, it tries to find out what this program returns. What does this program return? Well, it tries to find out..." until it hits the length limit and comes back with no solution found (or, if the length limit is lifted, keeps going forever). This is true even if the proof checker always returns quickly.

8cousin_it16y

This is absolutely not the obvious implementation of the algorithm in the post. My program does not "try to find out" something. It mechanically checks all proofs up to some maximum length. You can't just replace a piece of code with another piece that has a vaguely similar intent, but fails to halt!

0rwallace16y

The 'fails to halt' part in either case would be only if the length limit were removed. But you are right that mechanically checking all proofs up to a maximum length is a mathematically valid implementation and can only be replaced with another piece of code that is guaranteed to give an identical result. However, that implementation can still be reasonably described as trying to find out something. The intent can be seen not in the simple loop that iterates through all proofs up to a certain length, but in the choice of logic and axioms to encode the problem. The proofs generated from suitably chosen axioms are isomorphic to execution paths of an appropriately written program (as whpearson observed, a way to see this is to consider the Curry-Howard correspondence).

0orthonormal16y

This isn't the same as the calculator that asks itself what value it will return, because there's a subtle separation of levels here.

1[anonymous]16y

The Curry-Howard correspondence is what you are referring to right?

0mkehrt16y

That program does not prove its return value under the Curry-Howard Correspondence without a lot of footwork (see first caveat). Under the CH interpretation, programs are proofs that their types are inhabited, not that they can produce their return values. Thus, the canonical thing one could say the above program proves is "int". Some caveats. First, one could imagine a sufficiently complex type system where there is a type of all integers which are 1. Then, one could say that the above program is a proof of the type "type-of-ints-that-are-1". However, when adding this much specificity to one's type system, one has to be careful that the resulting system can decidably type terms; adding richness often leads to undecidability. Second, there is simple constructive proof that the the above program produces 1. Just run it and see; if it returns 1 in finite time, that's a proof.

0rwallace16y

That is certainly one way to look at it. Another way to look at it is to consider purely computational proofs, such as the four color theorem, or the solutions to checkers and 5 x 5 Go.

1rwallace16y

I'm curious, can the downvoters give any reasons for disagreeing with me?

6JenniferRM16y

I did not downvote (you were at -3 and -2 when I wrote this), but I think the there are many possible reasons to downvote you depending on the voter's implicit or explicit theory of voting. This might be seen as a pure "puzzle" post, and you aren't offering a puzzle solution that takes the question at face value. Your program doesn't do anything quine-like (like put its own source code into a input variable), nor does it have any logic engine (even by implication). This is the kind of solution evolution might work out in very certain specific environments where it happened to "just work", but if you put this code into a place where it actually had to "work for a living" by analyzing other programs using roughly the same routines it uses to analyze itself and it gained or lost by modulation of its behavior with them (and some programs were suckers, some were cheaters, and some could be implicitly bargained with, so there were real opportunities to gain or lose) this code would not give you "optimal play ignoring computation costs" (it is either a sucker or a cheater, depending on what "1" means and in some contexts either of these strategies will lose). Maybe another way of putting the criticism is that your "clever trick" is in noticing that the problem can sort of be solved in practice by collapsing down to have no extra levels of self awareness. But for theoretical robustness in other circumstances it would be preferable to have a system that scales to infinity, so that agents capable of modeling other agent's models of their own attempts to model other agents can still be "trusted by bad modelers". (Remember that part of the unspoken motivation here is to build skills towards building a verifiably friendly AGI.) Also, the people here are all super smart, so telling them "just be dumber and people will trust you" isn't an answer they want to hear, and saying this (even indirectly) could maybe be seen as an attempt to troll, or to drag people into a topic area

3rwallace16y

Thanks for the analysis. I'll try to clarify those points, then. Of course, I'm not suggesting my answer as a solution to the prisoner's dilemma. I actually don't think this kind of mathematical analysis is even relevant to the prisoner's dilemma in practice -- I think the latter is best dealt with by pre-commitments, reputation tracking etc. to change the payoff matrix and in particular change the assumption that the interaction is one-off -- which it typically isn't anyway. (Note that in cases involving actual criminals, we need witness protection programs to make it even approximately one-off.) As for the logic engine, hey, I just unrolled the search loop then deleted 3^^^^3 - 1 lines of unreachable code. No different than optimizing compilers do every day :-) If we look at the logic puzzle aspect though, this is in similar vein to this statement is true (the reverse of the classic this statement is false liar paradox -- it has two consistent solutions instead of none -- note that return 0 is just as valid a solution as return 1). And if you ask what sort of proof system permits that, you start investigating higher-order logic, and self-reference, and the classic paradoxes thereof, and type theory and other possible solutions thereto, which are topics that are of very real significance to software verification and other branches of AI. In summary, I think this puzzle goes places interesting enough that it's actually worth taking apart and analyzing.

7cousin_it16y

Actually, this statement is provable. This is the "Henkin sentence" and it is indeed provable in Peano arithmetic, so no need for stronger things like type theory etc. My proof in 1 is basically a translation of the proofs of the Henkin sentence and Löb's theorem (which is used to prove the former).

-5rwallace16y

-2Blueberry16y

I upvoted the grandparent and parent, because what you said seems right to me. I wish people wouldn't downvote someone asking why e was downvoted.

-1Cameron_Taylor16y

Upvoted entire ancestor tree, for similar reasons.

[-]DuncanS16y-10

Without the very long limit you've added, the easy problem becomes formally undecidable as it leads to an infinite regress. In order to return 1, the program has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and..... You'll use up your 3^^...^^3 iterations and return 0.

It's a bit like this problem. http://en.wikipedia.org/wiki/Halting_problem

6cousin_it16y

Without the limit, the first program would return 1. My proof keeps working almost without modifications. The only change will be that main() will use a while loop instead of an if, which is only used in deriving (2) and doesn't change its validity. Your argument doesn't show that the program will never find a proof - it only shows that it can never find the specific proof that you outlined, because it leads to infinite regress or whatever. It will just find another proof instead.

0DuncanS16y

OK. Let's suppose you successfully write an algorithm a() that can do everything that you said. Now let's make some changes. Suppose you were, in the course of your search of proof space, to come across a proof that a() returned 0? You could simply return 0 at that point rather than continuing to search for proofs that a() returned 1. So let's make that change. Now comes the fun bit. Change your algorithm a() such that it returns 1 when it hits the iteration limit instead of 0. Not that hard. Now comes the really fun bit. Invert the output of this algorithm. Let's call it b(). This function now has the interesting property that it returns 1 when a() returns 0, and vice versa - except in the case where no proofs can be found of a value for a() short of the termination condition, where both now return 0, It's also interesting that it has exactly the same definition as a() does....

5cousin_it16y

What Wei Dai said. The first change you propose breaks A's ability to obtain step 2 by simple reasoning about itself. Originally I came up with a different problem: systematically search all proofs up to a large length, and if we find a proof that we output some number, then output that number. That problem is way more crazy than mine, I don't know what the answer is, it might actually depend on the proof checker's implementation.

5Wei Dai16y

If you make this change, then step 2 of cousin_it's proof won't go through anymore, I think.

0Unknowns16y

So do you also think that this statement is true: "This statement is true if and only if it can be proven to be true"?

5cousin_it16y

See this PDF, paragraph 44.3. It constructs exactly the statement you gave (Henkin's statement) and proves it's provable.

6Unknowns16y

Ok, you've convinced me. Saying that this statement is true is basically the same as saying program A outputs 1.

0cousin_it16y

Yes, exactly. The proof is a little fussy because it has to keep track of bounded proof lengths everywhere, but it works basically the same. Does it help you understand the second problem better? It uses the same trick, but with a twist. (You need to get creative when you talk about two proof systems instead of one.)

0Unknowns16y

I still agree with what I said originally about the second problem. You could compare it to these two statements: A) A is true if and only if A and B are either both provable, or both non-provable. B) B is true if and only if B and A are either both provable, or both non-provable. It is much more obvious that both of these statements are necessarily true and provable (by symmetry), than it is that "this is true if and only if it is provable" is true or provable. This is why at first I only accepted the second case.

0Unknowns16y

Actually, after thinking about it more, the statement is obviously provable, without the long derivation. If "this statement is true iff it is provable" is false, then it naturally cannot be provable. But then it is both false and not provable, which logically implies that it is true iff provable. Therefore it is both true and provable. So the first program has to output 1 for the same reason.

6cousin_it16y

When you have a self-referential sentence, it does no good to prove it in the "real world", because the commonsense logic we use in the "real world" becomes inconsistent when applied to self-referential statements. ("This statement is false.") You really need a proof of your statement within the formal system. And within the system, the proof you gave doesn't work: the system cannot prove that falsity improves non-provability, because that would be the system assuming its own consistency all over again. So the long derivation is needed after all.

0Unknowns16y

I agree that the system can't use this argument, but that doesn't mean the argument doesn't work, just like your argument for why your algorithm must defect against a program that always defects. The difference between my argument and "this statement is false" is that by directly depending on its own truth, "this statement is false" leaves its truth or falsity insufficiently defined. But "this statement is true iff it is provable" depends not directly on its truth but on provability, which is already defined. So I think that my "real world" argument does establish that the statement will be true also within a formal system, even though a formal system cannot establish it with this argument.

2cousin_it16y

Um, "this statement is true iff it is provable" directly depends on both truth and provability. It does refer to its own "platonic" truth value in the "real world".

0Unknowns16y

It refers to its truth value but not in a paradox generating way.

[-]RobinZ16y-10

I believe Program A in "the easy version" would return 0. Assuming zero run-time errors, its structure implements the logical structure:

If A can be proven to return 1 in under n steps, then A returns 1.
If A cannot be proven to return 1 in under n steps, then A returns 0.

However n is defined (the post proposes n = 3^^^^3), it can be shown by the definition of the word "proof" that:

If A can be proven to return 1 in under n steps, then A returns 1.

and therefore the first proposition holds for every program, and cannot be used to... (read more)

4orthonormal16y

IIRC, the modification of Gödel's statement which instead has the interpretation "I can be proved in this formal system" is called a Henkin sentence, and does in fact have a finite proof in that system. This seems weird in the intuitive sense you're talking about, but it's actually the case.

1cousin_it16y

Yep. The Henkin sentence has already come up multiple times in the comments here. Understanding the proof of the Henkin sentence takes you about 95% of the way to understanding my original argument, I think.

1cousin_it16y

...Nope. This logic stuff is pretty tricky, but I'll try to explain how exactly the distinction between truth and provability matters in this problem. It was quite an epiphany when it hit me. The first proposition indeed holds for every program. But it is not what is used to show that A returns 1. What is really used is this: A can prove that the first proposition holds for A by looking at the source code of A. This is definitely not the case for every program! (Provability-of-provability does not imply provability-of-truth. This is the whole point of Löb's Theorem.) A's proof of the first proposition relies on how the code of A looks - specifically, that it says explicitly "if provable then return 1". This proof would break if the structure of A were changed. For example, if A were searching for both proofs and disproofs that it would output 1, it wouldn't be able to prove the first proposition quite so easily, and maybe not at all.

1RobinZ16y

I'm still confused, sadly.

0cousin_it16y

Can you follow Eliezer's cartoon proof of Löb's Theorem, linked from the post? It's kind of a prerequisite to understand what's happening here.

0RobinZ16y

I believe follow Eliezer's proof - should I look over your proof again?

0cousin_it16y

Well, my proof is basically a rewrite of Eliezer's proof in another notation and with tracking of proof lengths. The exact same steps in the exact same sequence. Even the name of the "box" statement is taken from there. So you can try, yeah.

0[anonymous]16y

Yeah - I suspect the answer is "not at all" because this 'searching for proofs or disproofs' seems to be exactly what's needed to unbreak the reasoning in my comment above.

[-]Clippy16y-10

This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/

Why?

4Tyrrell_McAllister16y

Your original formulation did not include the "I expect you to" part.

Moderation Log