# Probabilistic Löb theorem

In this post (based on results from MIRI's recent workshop), I'll be looking at whether reflective theories of logical uncertainty (such as Paul's design) still suffer from Löb's theorem.

Theories of logical uncertainty are theories which can assign probability to logical statements. Reflective theories are theories which know something about themselves within themselves. In Paul's theory, there is an external P, in the meta language, which assigns probabilities to statements, an internal P, inside the theory, that computes probabilities of coded versions of the statements inside the language, and a reflection principle that relates these two P's to each other.

And Löb's theorem is the result that if a (sufficiently complex, classical) system can prove that "a proof of Q implies Q" (often abbreviated as □Q → Q), then it can prove Q. What would be the probabilistic analogue? Let's use □_{a}Q to mean P('Q')≥1-a (so that □_{0}Q is the same as the old □Q; see this post on why we can interchange probabilistic and provability notions). Then Löb's theorem in a probabilistic setting could:

Probabilistic Löb's theorem: for all a<1, if the system can prove □_{a}Q → Q, then the system can prove Q.

To understand this condition, we'll go through the proof of Löb's theorem in a probabilistic setting, and see if and when it breaks down. We'll conclude with an example to show that any decent reflective probability theory has to violate this theorem.

## Löb's proof fails

First, the derivation conditions. Using ⊢ to designate the system can prove/the system can show with probability 1 (note that ⊢ is the meta-language equivalent of □), then the classical derivation conditions are:

- If ⊢ A, then ⊢ □A.
- ⊢ □A → □□A.
- ⊢ □(A → B) → (□A → □B).

Informally, these state that "if the system can prove something, it can prove it can prove it", "the system can prove that a proof of something implies a proof of a proof of something" and "the system can prove that proofs obey modus ponens".

What are the equivalent statements for probabilistic theories? According to Paul's approach, one must always add small epsilons when talking about probabilities. So the following derivation principles seem reasonable, where the latin indexes (a,b,c...) are meant to represent numbers that can be arbitrarily close to zero:

- If ⊢ A, then ⊢ □
_{a }A. - ⊢ □
_{a}A → □_{c}□_{a+b }A. - ⊢ □
_{a}(A → B) → (□_{b}A → □_{a+b}B).

The first principle is quite clear - if the system can prove A, it can prove that A's probability is arbitrarily close to 1. The second one is a bit more complex - the system can prove that if the probability of A is more than 1-a, then for any b>0, the system can prove that the probability of "the probability of A being greater than 1-a-b" is arbitrarily close to 1. Read that sentence again once or twice. The "b" in that derivation principle will be crucial - if b can be set to zero, Löb's theorem is true, if not, it can't be proved in the usual way. And the third derivation principle is simply stating that "if P(B|A)≥1-a", then "P(A)≥1-b" implies that "P(B)≥1-a-b" - and that the system itself can prove this. Unlike the second derivation principle, there is no need to add another layer of uncertainty here.

With these tools in hand, the proof can proceed in the usual way. Fix a, and Q be sentence such that the system can prove

□_{a}Q → Q.

Then fix b < a, and construct, by diagonalisation, the Löb sentence:

(□_{b}L → Q) ↔ L.

Then the proof steps are:

A) | ⊢ □_{c} (L → (□_{b}L → Q)) |
1st derivation, def of L |

B) | ⊢ □_{d} L → □_{d+c}(□_{b}L → Q) |
3rd derivation, A) |

C) | ⊢ □_{d} L → (□_{e}□_{b}L → □_{d+c+e}Q) |
3rd derivation, B) |

D) | ⊢ □_{d} L → (□_{e}□_{d+f }L) |
2rd derivation |

E) | ⊢ □_{d} L → □_{d+c+e}Q |
C), D), setting d+f=b |

F) | ⊢ □_{d} L → Q |
E), def of Q, setting d+c+e=a |

What we would want to deduce at this point is that the system can prove that L is true. This is only possible if we can set d=b. But we've shown that d+f=b, where f comes from the second derivation principle. If we can set that f to zero, we can proceed; if f>0, the proof stops here. So now imagine that we can indeed set f=0; then the rest of the proof is standard:

G) | ⊢ L | F), def of L, with d=b |

H) | ⊢ □_{b} L |
1st derivation, G) |

I) | ⊢ Q | 3rd derivation, B) |

Hence for any probabilistic logic to avoid Löb's theorem, we need the second derivation principle to have a little bit of extra uncertainty, that can be made arbitrarily tiny - but never zero.

## Löb's theorem fails

But we can do better than that. With a little bit more conditions on the probability function P, we can construct counter-examples to the probabilistic Löb theorem. What we need is that the system knows that probabilities of opposites sum to one; that for all Q,

⊢ P('¬Q')+P('Q')=1.

Then let F be any contradiction. Thus the system can prove the tautology that is its opposite:

⊢ ¬F

By the derivation principle, the system can also prove, for any a<1,

⊢ □_{1-a} ¬F.

This is just another way of writing that P('¬F') ≥ a. Since probabilities sum to one, the system can prove that P('F') ≤ 1-a. Hence that □_{b} F (which claims P('F') ≥ 1-b) is actually false for any b<a. Since a false statement implies every statement, the system demonstrates:

⊢ □_{b} F → F.

Now, we can choose any a<1 and any b<a - hence we can choose any b<1. This is the condition for the probabilistic Löb theorem. And yet F is not only false, the system can disprove it!

Note that the result also holds if the system know that the probabilities sum to something arbitrarily close to 1. As a corollary to these results, we can show that no coherent system of probabilistic logic can have the second derivation principle without the extra uncertainty, while simultaneously knowing that probabilities sum to 1.

## The last echoes of Löb's theorem

There is one last possibility for a probabilistic Löb theorem, not ruled out by the counterexample above. It might be that all statements A are true if

⊢ P('A')>0 → A.

However, this doesn't seem a critical flaw - P('A')>0 is something that is 'almost always' true, if we see P('A') as being an arbitrarily close *approximation* of the meta probability P(A). And if the system shows that something 'almost always true' implies A, then A being true seems unsurprising.

And this result poses no real problems for identical probabilistic systems trusting each other (to within an arbitrarily small bound).

## Comments (39)

Best*2 points [-]-

Again, Christiano et al supposedly prove the existence of a coherent distribution with certain properties. (Someone who knows game theory better than I do should work out what the proof depends on.) Any such distribution necessarily violates the probabilistic Loeb's theorem, for roughly the reason given in the OP's second-to-last section.

Not as far as this lay-reader can see. Again, the distribution must satisfy derivation principle #1. I can't tell if it must obey #3, though if it does that would seem to rule out a stronger version of #2.

Do you folks at MIRI think this result is as significant an advance toward FAI as, say, Cox's theorem or Pearl-style causal inference?

I'm not at MIRI but I was at part of the workshop and my answer would be no (although I do think the result is worthwhile).

Although I would also place Cox's theorem and graphical models on pretty different standing; the latter seems much more important to me than the former.

*2 points [-]Yes, Cox's result gives a different supporting argument for the use of probability, but didn't introduce new ways of doing probabilistic reasoning, whereas graphical models have had major applications in efficient probabilistic reasoning in practice.

Thisresult isn't on its own (it's a possibility/impossibility result about a probability system that may not exist) - but if a variant of Paul's probability can work, then that would be significant result.Maybe this would be obvious if I knew anything about logic, but how do you know the system is consistent?

We don't - generally we build systems where we can show "system X is consistent iff Peano Arithmetic is consistent". And we assume that PA is consistent (or we panic).

Sorry my phrasing was bad; I actually do know that much about logic. But how do you know that this system is consistent iff Peano Arithmetic is consistent?

We don't have that system yet! Just that that is what we generally do with the systems we have.

*-1 points [-]The result linked at the beginning shows that there exists, in principle, a coherent probability distribution with certain properties. Edit: in particular, it assigns probability 0 to F or any other contradiction. And while it doesn't always (ever?) know the exact probability it assigns, it does know that P(F)<1-a for any a<1. That statement itself has probability 1. Therefore the part about violating the probabilistic Lob's Theorem clearly holds.

I can't tell at a glance if the distribution satisfies derivation principle #3, but it certainly satisfies #1.

So assuming that the Löb's theorem problem is actually solved, the next step is to define an ideal AI which can self modify without weakening its mathematical system? Is it clear how to do this once the Löb obstruction is dealt with, or is this another very difficult math problem?

Nope, it's pretty much done - you can trust your future copies probabilistically, without weakening. However, that would require a P that's both defined and computable - not a trivial task.

*0 points [-]I'm still a bit vague on this Löb business. Is it a good thing or a bad thing (from a creating AI perspective) that "Löb's theorem fails" for the probabilistic version?

edit: The old post linked suggests that it's good that it doesn't apply here.

It's a good thing. Löb's theorem is an obstacle (the "Löbstacle," if you will).

*2 points [-]Löb's theorem's means an agent cannot trust future copies of itself, or simply identical copies of itself, to only prove true statements.

Er I don't think this is right. Lob's theorem says that an agent

cannottrust future copies of itself, unless those future copies use strictly weaker axioms in their reasoning system.The "can" has now been changed into "cannot". D'oh!

Good for creating AGI, maybe bad for surviving it. Hopefully the knowledge will also help us predict the actions of strong self-modifying AI.

It does seem promising to this layman, since it removes the best reason I could imagine for considering that last goal impossible.

*0 points [-]It seems like you use (1-a-b) when what seems right to me is ((1-a)(1-b)).

That's important, because I can construct an infinite series such that the sum of that series is smaller than epsilon, but not an infinite series of terms

edit: all of which are between 0 and 1with a product that convergesto anything but 0.Plus, any system capable of proving a contradiction can disprove it; the ability to prove that 'no statement which implies a contradiction is provable' is almost the equivalent of the claim that 'all provable statements are true'.

Let a(n) be any series whose sum converges and let b(n) = exp(a(n)). Then the product of b(n) converges iff the sum of a(n) converges. In particule, we can take b(n) = exp(1/2^n) for n = 0 to infinity, whose product converges to exp(2).

My claim was intended to be limited to infinite products where each term is between 0 and 1.

exp(1/2^0)=exp(1)=e>1

exp(epsilon)>1

b(n)>1 for all positive n.

The addition (or removal) of a finite number of nonzero terms at the beginning of an infinite product does not change whether or not the product converges.

Plain language: if you remove enough of the first terms of an infinite product that converges, the product of remaining terms must converge to 1. I'm rusty on my math and LaTeX, but given an infinite product that converges, we can prove that the first N terms have a product within epsilon of the product of the series (by the definitions of limit and converge), which means that the remaining infinite number of terms must have a product within some different epsilon of 1 (again, simply by definitions).

Given an infinite series of numbers all between 0 and 1 and dropping the first N terms, I can find an epsilon such that the product of the remaining terms is always greater than that epsilon from 1. That epsilon is 1-(the N+1th term).

I've made some error somewhere, since the infinite product of b(n)=.01 converges (to 0). I intuit that I may have proven only that the infinite product does not converge to 1. I suspect that there's a step somewhere around the point where I used the phrase 'some different epsilon'.

This is not sufficiently precise. More precisely: if T(n) is the tail product of all but the first n terms, then for each

fixedn, T(n) is a convergent product, and as n approaches infinity, T(n) should to converge to 1 (assuming the initial infinite product converges to a nonzero value).I think the flaw in your proof is the confusion between convergence of T(n) for a fixed n, as an infinite product, and convergence of T(n), the

valueof that infinite product, as n goes to infinity.Also, a more general family of counterexamples is the following: take any nonnegative series a(n) whose sum converges to A, and let b(n) = exp(-a(n)). Then 0<b(n)<1 and the product of b(n) converges to exp(-A). (Obviously this is more or less recycled from the grandparent.) For example, b(n) = exp(-1/2^n).

Concur- the infinite product of 0<b(n)<1 is in [0,1). That makes sense for probabilities.

*0 points [-]Here's a counterexample to your claim:

Let a(n) be a decreasing series which tends to a nonzero limit (for example, a(n) = 1 + 1/n)

Then let b(0) = a(0), b(n) = a(n) / a(n-1)

a(n) is decreasing so for all n>0, b(n) < 1

But the product of the first N b(n)s is a(N), so the infinite product of b(n) converges to the same thing as a(n), which is nonzero.

So, a first term greater than 1, and the remainder converges to 1/b(0)?

I did only show that the infinite product of 0<n<1 cannot be one. It can be 1-epsilon for any epsilon.

I don't think that particular a(n) converges, but that doesn't invalidate your point that b(n) selected such that the product from 0 to n equals the sum of a(n) from 0 to n must have a product that converges to the sum of a(n). But the

sumof a(n) would have to be decreasing for the terms of b(n) to be between 0 and 1.I wasn't summing a(n). It's the sequence a(n) that converges, not its sum, and the partial products of the b(n) are equal to the a(n), not the partial sums of the a(n).

Certainly an infinite product of 0<n<1 can't be one. Nobody's disputing that.

Oh, then it sounds like we are in perfect agreement that my initial claim was wrong; however, we can now generate an infinite series of probabilities less than one whose product remains higher than 1-epsilon for any epsilon. If 1-epsilon is used as the determinator of what the system proves true, Löb's theorem holds.

(1-a)(1-b) = 1-a-b+ab > 1-a-b.

Hence P(A) > (1-a)(1-b) implies P(A) > (1-a-b). Since I let these quantities get arbitrarily close to zero, the quadratic difference term doesn't matter.

P(A)

≥(1-a)(1-b) U (a,b)>0 implies P(A) > (1-a-b). That's a (very slightly) stronger form. I noticed what I thought was an error where you were adding improbabilities together instead of multiplying.The proof doesn't need the stronger form, however.

*0 points [-]Why would you leave out quantifiers? Requiring the reader to stick their own existential or universal quantification in the necessary places isn't very nice.

Is this the correct interpretation of your assumptions? If not, what is it? I am not interested in figuring out which axioms are required to make your proof (which is also missing quantifiers) work.

" So the following derivation principles seem reasonable, where the latin indexes (a,b,c...) are meant to represent numbers that can be arbitrarily close to zero"

so universally quantified, but in the meta language.

Thank you.

*0 points [-]As Larks said, we can quantify (the meta language looking in), but the system itself can't quantify. Because then the system could reason that "∀x>0, P(A)<x" means "P(A)=0", which is the kind of thing that causes bad stuff to happen. Here, the system can show "P(A)<x" separately for any given x>0, but can't prove the same statement with the universal quantifier.

Is it unreasonable of me to be annoyed at that kind of writing?

If I understand what's going on correctly, you have a real-indexed schema of axioms and each of them is in your system.

When I read the axiom list the first time I saw that the letters were free variables (in the language you and I are writing in) and assumed that you did not intend for them to be free variables in the formula. My suggestion of how to bind the variables (in the language we are writing in) was (very) wrong, but I still think that it's unclear as written.

Am I confused?

*0 points [-]No, it's perhaps not the best explained post I've done. Though it was intended more for technical purposes.

Not any more, I hope!

You mean, because we can always replace the first "a" with some b>a?

This seems like a very cool - and bizarre - result.

*0 points [-]Sorry, a bit sloppy with the > and ≥ - since you can always choose a slightly larger "a", it still works out. I've corrected the text.