RobinZ comments on AI cooperation in practice - Less Wrong

26 Post author: cousin_it 30 July 2010 04:21PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (157)

You are viewing a single comment's thread.

Comment author: RobinZ 30 July 2010 07:04:25PM *  0 points [-]

I believe Program A in "the easy version" would return 0. Assuming zero run-time errors, its structure implements the logical structure:

  • If A can be proven to return 1 in under n steps, then A returns 1.
  • If A cannot be proven to return 1 in under n steps, then A returns 0.

However n is defined (the post proposes n = 3^^^^3), it can be shown by the definition of the word "proof" that:

  • If A can be proven to return 1 in under n steps, then A returns 1.

and therefore the first proposition holds for every program, and cannot be used to show that A returns 1.

However, the second proposition also cannot be used to show that A returns 1. If the given condition holds, A does not return 1; if the given condition does not hold, the second proposition demonstrates nothing.

Therefore no property of the program can be used to demonstrate that the program must return 1. Therefore no proof can demonstrate that the program must return 1. Therefore the program will find no proof that the program returns 1. Therefore the program will return 0.

Q.E.D.

Comment author: orthonormal 31 July 2010 06:19:05PM 2 points [-]

IIRC, the modification of Gödel's statement which instead has the interpretation "I can be proved in this formal system" is called a Henkin sentence, and does in fact have a finite proof in that system. This seems weird in the intuitive sense you're talking about, but it's actually the case.

Comment author: cousin_it 31 July 2010 06:56:48PM *  1 point [-]

Yep. The Henkin sentence has already come up multiple times in the comments here. Understanding the proof of the Henkin sentence takes you about 95% of the way to understanding my original argument, I think.

Comment author: cousin_it 30 July 2010 07:19:19PM *  1 point [-]

...Nope.

This logic stuff is pretty tricky, but I'll try to explain how exactly the distinction between truth and provability matters in this problem. It was quite an epiphany when it hit me.

The first proposition indeed holds for every program. But it is not what is used to show that A returns 1. What is really used is this: A can prove that the first proposition holds for A by looking at the source code of A. This is definitely not the case for every program! (Provability-of-provability does not imply provability-of-truth. This is the whole point of Löb's Theorem.) A's proof of the first proposition relies on how the code of A looks - specifically, that it says explicitly "if provable then return 1". This proof would break if the structure of A were changed. For example, if A were searching for both proofs and disproofs that it would output 1, it wouldn't be able to prove the first proposition quite so easily, and maybe not at all.

Comment author: RobinZ 30 July 2010 07:37:15PM 1 point [-]

I'm still confused, sadly.

Comment author: cousin_it 30 July 2010 07:47:11PM 0 points [-]

Can you follow Eliezer's cartoon proof of Löb's Theorem, linked from the post? It's kind of a prerequisite to understand what's happening here.

Comment author: RobinZ 30 July 2010 08:15:08PM 0 points [-]

I believe follow Eliezer's proof - should I look over your proof again?

Comment author: cousin_it 30 July 2010 08:19:47PM *  0 points [-]

Well, my proof is basically a rewrite of Eliezer's proof in another notation and with tracking of proof lengths. The exact same steps in the exact same sequence. Even the name of the "box" statement is taken from there. So you can try, yeah.