Unknowns comments on AI cooperation in practice - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (157)
In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, "if A finds a proof that A returns 1" is ambiguous, because it depends on the exact value of A, not the generic description given.
In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.
In the first version, are you sure? I gave an argument that A will output 1, have you looked at it?
In the second version, you confuse truth and provability. If both programs output 0, that doesn't imply they can prove that their output will be the same.
You don't specify the implementation of "proves(code, maxsteps)".
The output would appear to depend on that.
Of course, there's the whole business of universal heat death once again...
Nothing depends on the details of proof verifier, since it completely covers all proofs up to some length, a set of proofs which is completely defined by the logical language. The arguments of proves(-,-) determine its value (and it's even a primitive recursive function).
On the other hand, pathological formal language can be constructed, that makes proof of "main()==1" arbitrary large (e.g. where 1 is represented by 3^^^3 symbols). Thus it is better to be proven that formal language isn't pathological, i.e there is no prover proves' that satisfies condition
...but the the "logical language" is not specified! That is one of the implementation details that is missing in the problem specification.
You know, there are always unstated assumptions, like sanity of the reader.
Usually, when posing puzzles, it is best to try to minimise the number of unstated assumptions the solution depends upon - or there won't be a unique answer.
Enumeration order of proofs is not specified, so we can construct pathological prover, that reorders proofs such that all proofs of "main()==1" require more steps than maxsteps.
You can't influence the length of a proof by placing it at a different position in some list of proofs. Parameter 'maxsteps' specifies the maximum length (or Gödel number) of the proofs that get checked, not the maximum number of steps performed by the proof checker algorithm.
Ouch. maxsteps means maxprooflenght. My inattention, sorry.
The argument is admittedly sketchy, but I don't think it depends much on the implementation of "proves". Of course I'd want the proof system to hold the same properties that are used in the proof of Löb's Theorem, because my argument is almost an exact replica of that; but pretty much all proof systems actually used by people (apart from some extremely weak ones) satisfy these properties. And you're guaranteed not to become worse off if you make your proof system stronger while your opponent stays the same. That's a really big deal, if my argument is indeed correct.
About heat death: even though the two programs say they can iterate up to 3^^^3 or something, the actual time to cooperation will be much less, because the reasoning steps that I outlined are exactly the reasoning steps they'd need to stumble upon. Yes, that would make a proof of length 10000 or so (which would mean 2^10000 proofs tried), but a that's long way from 3^^^3. If the programs use some kind of heuristics to search for working proofs (but still guarantee that they will do the brute force search if heuristics fail), the proofs can be found quicker still.
Re: "apart from some extremely weak ones"
It does depend on the implementation - because it is relatively easy to imagine poor provers that result in a failure to understand and prove anything much. So, you would need to specify some constraints on the prover before you can make statements about what it outputs.
timtyler: I think you're confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn't have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.
Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof is wrong.
So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don't really need to know their internal details.
(If you already understand this, though, then I'm confused about your confusion. Could happen.)
[edit:] Actually, I think I might see the point you're making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.
It's a nice result, but I doubt it'll ever be practically relevant.
You could implement the algorithm today, although with a limit of less than 3^^^3. Rough guesstimates suggest that about a limit of 50 would be workable at home, 100 for a worldwide distributed computing effort and 500 for a universe-sized computer that does one operation per bit per plank time. In all cases, nowhere near 10000 (let alone 3^^^3).
Heuristics could help a bit, but making the algorithm more complex is likely to also make the proof that it works longer and so harder to find. The fallback to brute-force search is unhelpful because no computer that could actually exist would be able to find the proof that way.
If program A uses heuristics, program B doesn't need to prove that these heuristics are correct. It only needs to see that A's use of heuristics makes A either output 1 or move on to the brute-force search.
Agreed with the rest of your comment. An easier way to reach the same conclusion: if program B is implemented as "always defect", program A has to run the brute-force search to the end, there's no shortcut (because having any shortcut in the code would make the proofs invalid if B were a copy of A).
I've looked at the argument, and it seems to me that Tim Tyler is right, the first case depends on the implementation of "proves (code, maxsteps)."
Regarding the second case, suppose A runs through checking for proofs, and doesn't find any. It then is about to output a 0. But in this case, it is impossible that B should output a 1, because then B would have a proof that A and B output the same thing, and this would be false. Therefore B also must output a 0. Therefore A and B output the same thing. So we should add a line of code to A to express this: "If A is about to output 0, then output 1, because it is proved (by the above reasoning) that A and B output the same thing." The same code should be included in B, and so both will output 1.
Your proposed change will make A always output 1, and thus become a rather poor player in the Prisoner's Dilemma. You don't want to cooperate with everybody.
However, B will as well, and you do want to cooperate with everybody who cooperates with you, so my proposed change works very well.
Not if they will cooperate regardless of what you do. Then you can win more by defecting.
A knows B's source code, and so it knows that B will only output a 1 if A and B output the same thing, i.e. only if A and B both output 1.
No, A doesn't know that because it doesn't know B's proof checker is correct.
You really have to be triple extra careful when talking about this stuff. I'm trying to.
Yes, I see that.
I replied here.
I changed my mind anyway.