Probabilistic Löb theorem

Stuart_Armstrong

In this post (based on results from MIRI's recent workshop), I'll be looking at whether reflective theories of logical uncertainty (such as Paul's design) still suffer from Löb's theorem.

Theories of logical uncertainty are theories which can assign probability to logical statements. Reflective theories are theories which know something about themselves within themselves. In Paul's theory, there is an external P, in the meta language, which assigns probabilities to statements, an internal P, inside the theory, that computes probabilities of coded versions of the statements inside the language, and a reflection principle that relates these two P's to each other.

And Löb's theorem is the result that if a (sufficiently complex, classical) system can prove that "a proof of Q implies Q" (often abbreviated as □Q → Q), then it can prove Q. What would be the probabilistic analogue? Let's use □_aQ to mean P('Q')≥1-a (so that □₀Q is the same as the old □Q; see this post on why we can interchange probabilistic and provability notions). Then Löb's theorem in a probabilistic setting could:

Probabilistic Löb's theorem: for all a<1, if the system can prove □_aQ → Q, then the system can prove Q.

To understand this condition, we'll go through the proof of Löb's theorem in a probabilistic setting, and see if and when it breaks down. We'll conclude with an example to show that any decent reflective probability theory has to violate this theorem.

Löb's proof fails

First, the derivation conditions. Using ⊢ to designate the system can prove/the system can show with probability 1 (note that ⊢ is the meta-language equivalent of □), then the classical derivation conditions are:

If ⊢ A, then ⊢ □A.
⊢ □A → □□A.
⊢ □(A → B) → (□A → □B).

Informally, these state that "if the system can prove something, it can prove it can prove it", "the system can prove that a proof of something implies a proof of a proof of something" and "the system can prove that proofs obey modus ponens".

What are the equivalent statements for probabilistic theories? According to Paul's approach, one must always add small epsilons when talking about probabilities. So the following derivation principles seem reasonable, where the latin indexes (a,b,c...) are meant to represent numbers that can be arbitrarily close to zero:

If ⊢ A, then ⊢ □_aA.
⊢ □_aA → □_c□_a+bA.
⊢ □_a(A → B) → (□_bA → □_a+bB).

The first principle is quite clear - if the system can prove A, it can prove that A's probability is arbitrarily close to 1. The second one is a bit more complex - the system can prove that if the probability of A is more than 1-a, then for any b>0, the system can prove that the probability of "the probability of A being greater than 1-a-b" is arbitrarily close to 1. Read that sentence again once or twice. The "b" in that derivation principle will be crucial - if b can be set to zero, Löb's theorem is true, if not, it can't be proved in the usual way. And the third derivation principle is simply stating that "if P(B|A)≥1-a", then "P(A)≥1-b" implies that "P(B)≥1-a-b" - and that the system itself can prove this. Unlike the second derivation principle, there is no need to add another layer of uncertainty here.

With these tools in hand, the proof can proceed in the usual way. Fix a, and Q be sentence such that the system can prove

□_aQ → Q.

Then fix b < a, and construct, by diagonalisation, the Löb sentence:

(□_bL → Q) ↔ L.

Then the proof steps are:

A)	⊢ □_c (L → (□_bL → Q))	1st derivation, def of L
B)	⊢ □_d L → □_d+c(□_bL → Q)	3rd derivation, A)
C)	⊢ □_d L → (□_e□_bL → □_d+c+eQ)	3rd derivation, B)
D)	⊢ □_d L → (□_e□_d+fL)	2rd derivation
E)	⊢ □_d L → □_d+c+eQ	C), D), setting d+f=b
F)	⊢ □_d L → Q	E), def of Q, setting d+c+e=a

What we would want to deduce at this point is that the system can prove that L is true. This is only possible if we can set d=b. But we've shown that d+f=b, where f comes from the second derivation principle. If we can set that f to zero, we can proceed; if f>0, the proof stops here. So now imagine that we can indeed set f=0; then the rest of the proof is standard:

G)	⊢ L	F), def of L, with d=b
H)	⊢ □_b L	1st derivation, G)
I)	⊢ Q	3rd derivation, B)

Hence for any probabilistic logic to avoid Löb's theorem, we need the second derivation principle to have a little bit of extra uncertainty, that can be made arbitrarily tiny - but never zero.

Löb's theorem fails

But we can do better than that. With a little bit more conditions on the probability function P, we can construct counter-examples to the probabilistic Löb theorem. What we need is that the system knows that probabilities of opposites sum to one; that for all Q,

⊢ P('¬Q')+P('Q')=1.

Then let F be any contradiction. Thus the system can prove the tautology that is its opposite:

⊢ ¬F

By the derivation principle, the system can also prove, for any a<1,

⊢ □_1-a ¬F.

This is just another way of writing that P('¬F') ≥ a. Since probabilities sum to one, the system can prove that P('F') ≤ 1-a. Hence that □_b F (which claims P('F') ≥ 1-b) is actually false for any b<a. Since a false statement implies every statement, the system demonstrates:

⊢ □_b F → F.

Now, we can choose any a<1 and any b<a - hence we can choose any b<1. This is the condition for the probabilistic Löb theorem. And yet F is not only false, the system can disprove it!

Note that the result also holds if the system know that the probabilities sum to something arbitrarily close to 1. As a corollary to these results, we can show that no coherent system of probabilistic logic can have the second derivation principle without the extra uncertainty, while simultaneously knowing that probabilities sum to 1.

The last echoes of Löb's theorem

There is one last possibility for a probabilistic Löb theorem, not ruled out by the counterexample above. It might be that all statements A are true if

⊢ P('A')>0 → A.

However, this doesn't seem a critical flaw - P('A')>0 is something that is 'almost always' true, if we see P('A') as being an arbitrarily close approximation of the meta probability P(A). And if the system shows that something 'almost always true' implies A, then A being true seems unsurprising.

And this result poses no real problems for identical probabilistic systems trusting each other (to within an arbitrarily small bound).

In this post (based on results from MIRI's recent workshop), I'll be looking at whether reflective theories of logical uncertainty (such as Paul's design) still suffer from Löb's theorem.

Probabilistic Löb's theorem: for all a<1, if the system can prove □_aQ → Q, then the system can prove Q.

Löb's proof fails

If ⊢ A, then ⊢ □A.
⊢ □A → □□A.
⊢ □(A → B) → (□A → □B).

If ⊢ A, then ⊢ □_aA.
⊢ □_aA → □_c□_a+bA.
⊢ □_a(A → B) → (□_bA → □_a+bB).

With these tools in hand, the proof can proceed in the usual way. Fix a, and Q be sentence such that the system can prove

□_aQ → Q.

Then fix b < a, and construct, by diagonalisation, the Löb sentence:

(□_bL → Q) ↔ L.

Then the proof steps are:

A)	⊢ □_c (L → (□_bL → Q))	1st derivation, def of L
B)	⊢ □_d L → □_d+c(□_bL → Q)	3rd derivation, A)
C)	⊢ □_d L → (□_e□_bL → □_d+c+eQ)	3rd derivation, B)
D)	⊢ □_d L → (□_e□_d+fL)	2rd derivation
E)	⊢ □_d L → □_d+c+eQ	C), D), setting d+f=b
F)	⊢ □_d L → Q	E), def of Q, setting d+c+e=a

G)	⊢ L	F), def of L, with d=b
H)	⊢ □_b L	1st derivation, G)
I)	⊢ Q	3rd derivation, B)

Hence for any probabilistic logic to avoid Löb's theorem, we need the second derivation principle to have a little bit of extra uncertainty, that can be made arbitrarily tiny - but never zero.

Löb's theorem fails

⊢ P('¬Q')+P('Q')=1.

Then let F be any contradiction. Thus the system can prove the tautology that is its opposite:

⊢ ¬F

By the derivation principle, the system can also prove, for any a<1,

⊢ □_1-a ¬F.

⊢ □_b F → F.

Now, we can choose any a<1 and any b<a - hence we can choose any b<1. This is the condition for the probabilistic Löb theorem. And yet F is not only false, the system can disprove it!

The last echoes of Löb's theorem

There is one last possibility for a probabilistic Löb theorem, not ruled out by the counterexample above. It might be that all statements A are true if

⊢ P('A')>0 → A.

And this result poses no real problems for identical probabilistic systems trusting each other (to within an arbitrarily small bound).

ShardPhoenix13y00

I'm still a bit vague on this Löb business. Is it a good thing or a bad thing (from a creating AI perspective) that "Löb's theorem fails" for the probabilistic version?

edit: The old post linked suggests that it's good that it doesn't apply here.

hairyfigment13y00

Good for creating AGI, maybe bad for surviving it. Hopefully the knowledge will also help us predict the actions of strong self-modifying AI.

It does seem promising to this layman, since it removes the best reason I could imagine for considering that last goal impossible.

3Stuart_Armstrong13y

Löb's theorem's means an agent cannot trust future copies of itself, or simply identical copies of itself, to only prove true statements.

8Qiaochu_Yuan13y

It's a good thing. Löb's theorem is an obstacle (the "Löbstacle," if you will).

51

Probabilistic Löb theorem

51

Löb's proof fails

Löb's theorem fails

The last echoes of Löb's theorem

51

51

Probabilistic Löb theorem

51

Löb's proof fails

Löb's theorem fails

The last echoes of Löb's theorem

51