I notice a curious pattern in this post: technically illiterate critical comments get voted up to +1/+2 (even after getting debunked), while technically accurate arguments (replies) stay at 0/+1. Please vote up only things you know you understand, and do vote up comments that debunk incorrect arguments.
ETA: Mostly corrected now.
Cousin_it, do you have an argument that n1 ... n6 < nmax? It seems like one way for the proof to fail is if no matter what nmax is, step 11 forces n1 > nmax.
Actually, someone shot down my proposed my.C = (my.C == your.C) algorithm, because if two prisoners both defect, that is in a mathematical sense "cooperating if and only if you cooperate". So if the other prisoner is Defection Rock, then my.C = (my.C == your.C) has no consistent solution.
In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, "if A finds a proof that A returns 1" is ambiguous, because it depends on the exact value of A, not the generic description given.
In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.
timtyler: I think you're confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn't have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.
Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof is wrong.
So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don't really need to know their internal details.
(If you already understand this, though, then I'm confused about your confusion. Could happen.)
[edit:] Actually, I think I might see the point you're making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.
Edit: The following is not obviously possible, see this comment.
This can be generalized to provide even better algorithms for more general games. Let the following be a simplified outline of the algorithm for cooperation problem, with bounds omitted (players are A and B, our player is A):
function main1() {
if (proves("A()==B()"))
return "Cooperate";
return "Defect";
}
The proof that the outcome is "Cooperate" basically hinges on the provability of Löb statement
proves("A()==B()") => (A()... Does the obvious Lobian proof for 1 still go through if there's a bound on the proof length built into all the sentences?
The generalized action idea, applied to this setting, works as follows. Agents A and B can surrender control over their output to procedures they don't fully understand, but expect to produce good results. That they expect good result is reflected, as in previous construction, in the order in which they check for the possibility:
function main_A() {
if (proves("A()==Foo() && B()==Bar()"))
return Foo();
...
}
function main_B() {
if (proves("A()==Foo() && B()==Bar()"))
return Bar();
...
}
H...
This is awesome - great post!
I've scanned briefly through the comments, and didn't see anyone ask this, but apologies if I missed it:
You use the diagonal lemma to establish that there is a piece of code called "box" such that the following is provable:
eval(box) == implies( proves(box, n1), eval(outputs(1)))
But couldn't there likewise be a "box2" such that:
eval(box2) == implies( proves(box2, n1), eval(outputs(2)))
And then couldn't you patch up the proof someh...
Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading one extra comment after posting) that by Lob's Theorem that's not true. But my original idea, that the first statement is all that's required to prove anything, still seems to hold.
Okay, I can follow the first proof when I assume statement 1, but I don't quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can't seem to find it.
In fact, I think I totally misunderstand the D...
I think it's possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.
The input to the programs would be a pair (opponent's source, opponent's proof), and the output would be the decision: cooperate or defect.
The algorithm for A would be: Check the B's proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out - cooperate, otherwise - defect.
The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).
It's important, however, to be clear on just what constitutes a valid proof. Consider the following implementation:
main() {return 1;}
Not valid because it didn't find a proof, you say? On the contrary, assuming we are dealing with idealized mathematical computers that are perfectly deterministic and have no outside inputs, the fact that a program returns a given value on this run does indeed constitute a proof that it will do so on...
Without the very long limit you've added, the easy problem becomes formally undecidable as it leads to an infinite regress. In order to return 1, the program has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and..... You'll use up your 3^^...^^3 iterations and return 0.
It's a bit like this problem. http://en.wikipedia.org/wiki/Halting_problem
I believe Program A in "the easy version" would return 0. Assuming zero run-time errors, its structure implements the logical structure:
However n is defined (the post proposes n = 3^^^^3), it can be shown by the definition of the word "proof" that:
and therefore the first proposition holds for every program, and cannot be used to...
This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?
And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/
Why?
I did not downvote (you were at -3 and -2 when I wrote this), but I think the there are many possible reasons to downvote you depending on the voter's implicit or explicit theory of voting.
This might be seen as a pure "puzzle" post, and you aren't offering a puzzle solution that takes the question at face value. Your program doesn't do anything quine-like (like put its own source code into a input variable), nor does it have any logic engine (even by implication).
This is the kind of solution evolution might work out in very certain specific environments where it happened to "just work", but if you put this code into a place where it actually had to "work for a living" by analyzing other programs using roughly the same routines it uses to analyze itself and it gained or lost by modulation of its behavior with them (and some programs were suckers, some were cheaters, and some could be implicitly bargained with, so there were real opportunities to gain or lose) this code would not give you "optimal play ignoring computation costs" (it is either a sucker or a cheater, depending on what "1" means and in some contexts either of these strategies will lose).
Maybe another way of putting the criticism is that your "clever trick" is in noticing that the problem can sort of be solved in practice by collapsing down to have no extra levels of self awareness. But for theoretical robustness in other circumstances it would be preferable to have a system that scales to infinity, so that agents capable of modeling other agent's models of their own attempts to model other agents can still be "trusted by bad modelers". (Remember that part of the unspoken motivation here is to build skills towards building a verifiably friendly AGI.)
Also, the people here are all super smart, so telling them "just be dumber and people will trust you" isn't an answer they want to hear, and saying this (even indirectly) could maybe be seen as an attempt to troll, or to drag people into a topic area that is too difficult to discus, given LW's current sanity waterline (which is the only reason I personally vote things below zero).
Another possibility is that people don't see the reversed Tortoise and Achilles joke and just think (1) you said something dumb and (2) they should "punish dumbness" in order to maintain LW's purity (IE the signal to noise ratio).
The more reasons to downvote you expose, the more chance you have to accumulate a really low score, and that comment exposed several.
Thanks for the analysis. I'll try to clarify those points, then.
Of course, I'm not suggesting my answer as a solution to the prisoner's dilemma. I actually don't think this kind of mathematical analysis is even relevant to the prisoner's dilemma in practice -- I think the latter is best dealt with by pre-commitments, reputation tracking etc. to change the payoff matrix and in particular change the assumption that the interaction is one-off -- which it typically isn't anyway. (Note that in cases involving actual criminals, we need witness protection program...
You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here's a puzzle for you:
Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?
Wait, that was the easy version. Here's a harder puzzle:
Consider programs A and B that both know their own, and each other's, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?
This second puzzle is a formalization of a Prisoner's Dilemma strategy proposed by Eliezer: "I cooperate if and only I expect you to cooperate if and only if I cooperate". So far we only knew how to make this strategy work by "reasoning from symmetry", also known as quining. But programs A and B can be very different - a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?
I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb's Theorem), b) I'm not sure it's correct because I've only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.
Edit: by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.