Logic as Probability

Manfred

Before talking about logical uncertainty, our final topic is the relationship between probabilistic logic and classical logic. A robot running on probabilistic logic stores probabilities of events, e.g. that the grass is wet outside, P(wet), and then if they collect new evidence they update that probability to P(wet|evidence). Classical logic robots, on the other hand, deduce the truth of statements from axioms and observations. Maybe our robot starts out not being able to deduce whether the grass is wet, but then they observe that it is raining, and so they use an axiom about rain causing wetness to deduce that "the grass is wet" is true.

Classical logic relies on complete certainty in its axioms and observations, and makes completely certain deductions. This is unrealistic when applied to rain, but we're going to apply this to (first order, for starters) math later, which a better fit for classical logic.

The general pattern of the deduction "It's raining, and when it rains the grass is wet, therefore the grass is wet" was modus ponens: if 'U implies R' is true, and U is true, then R must be true. There is also modus tollens: if 'U implies R' is true, and R is false, then U has to be false too. Third, there is the law of non-contradiction: "It's simultaneously raining and not-raining outside" is always false.

We can imagine a robot that does classical logic as if it were writing in a notebook. Axioms are entered in the notebook at the start. Then our robot starts writing down statements that can be deduced by modus ponens or modus tollens. Eventually, the notebook is filled with statements deducible from the axioms. Modus tollens and modus ponens can be thought of as consistency conditions that apply to the contents of the notebook.

Doing math is one important application of our classical-logic robot. The robot can read from its notebook "If variable A is a number, A=A+0" and "SS0 is a number," and then write down "SS0=SS0+0."

Note that this requires the robot to interpret variable A differently than symbol SS0. This is one of many upgrades we can make to the basic robot so that it can interpret math more easily. We also want to program in special responses to symbols like 'and', so that if A and B are in the notebook our robot will write 'A and B', and if 'A and B' is in the notebook it will add in A and B. In this light, modus ponens is just the robot having a programmed response to the 'implies' symbol.

Certainty about our axioms is what lets us use classical logic, but you can represent complete certainty in probabilistic logic too, by the probabilities 1 and 0. These two methods of reasoning shouldn't contradict each other - if a classical logic robot can deduce that it's raining out, a probabilistic logic robot with the same information should assign P(rain)=1.

If it's raining out, then my grass is wet. In the language of probabilities, this is P(wet|rain)=1. If I look outside and see rain, P(rain)=1, and then the product rule says that P(wet and rain) = P(rain)·P(wet|rain), and that's equal to 1, so my grass must be wet too. Hey, that's modus ponens!

The rules of probability can also behave like modus tollens (if P(B)=0, and P(B|A)=1, P(A)=0) and the law of the excluded middle (P(A|not-A)=0). Thus, when we're completely certain, probabilistic logic and classical logic give the same answers.

There's a very short way to prove this, which is that one of Cox's desiderata for how probabilities must behave was "when you're completely certain, your plausibilities should satisfy the rules of classical logic."

In Foundations of Probability, I alluded to the idea that we should be able to apply probabilities to math. Dutch book arguments work because our robot must act as if it had probabilities in order to avoid losing money. Savage's theorem applies because the results of our robot's actions might depend on mathematical results. Cox's theorem applies because beliefs about math behave like other beliefs.

This is completely correct. Math follows the rules of probability, and thus can be described with probabilities, because classical logic is the same as probabilistic logic when you're certain.

We can even use this correspondence to figure out what numbers the probabilities take on:

1 for every statement that follows from the axioms, 0 for their negations.

This raises an issue: what about betting on the last digit of the 3^^^3'th prime? We dragged probability into this mess because it was supposed to help our robot stop trying to prove the answer and just bet as if P(last digit is 1)=1/4. But it turns out that there is one true probability distribution over mathematical statements, given the axioms. The right distribution is obtained by straightforward application of the product rule - never mind that it takes 4^^^3 steps - and if you deviate from the right distribution that means you violate the product rule at some point.

This is why logical uncertainty is different. Even though our robot doesn't have enough resources to find the right answer, using logical uncertainty violates Savage's theorem and Cox's theorem. If we want our robot to act as if it has some "logical probability," it's going to need a stranger sort of foundation.

Part of the sequence Logical Uncertainty

Previous Post: Putting in the Numbers

Next post: Approaching Logical Uncertainty

This raises an issue: what about betting on the last digit of the 3^^^3'th prime?

This is a go to example here of using subjective probability under resource constraints. There are plenty of more familiar examples, such as having to answer a multiple-choice question on a test in 30 seconds, and having to estimate the probabilities of each answer in order to pick the likeliest. Everyone has done it, and used the basically the same tools as for the so-called "true" probabilities.

the law of the excluded middle (P(A|not-A)=0

Isn't this the LNC?

(I had to look it up: Law of Non-Contradiction.)

Yeah, this looks more like the Law of Non-Contradiction than the Law of Excluded Middle to me (which makes Manfred's jokey response seem doubly foolish).

it turns out that there is one true probability distribution over mathematical statements, given the axioms

I guess you meant to say "mathematical statements without quantifiers"?

If you want quantifiers, you can just program your robot to respond to the symbol "for all" so that when it sees "for all x, x=y" it writes all the implications in the notebook, and when x=y for all x, it writes "for all x, x=y". This is an infinite amount to writing to do, but there was always an infinite amount of writing to do - the robot is infinitely fast, and anyway is just a metaphor for the rules of our language.

Sorry, I should've said "statements that are provable or disprovable from the axioms", mentioning quantifiers was kinda irrelevant. Are you saying that your robot will eventually write out truth values for statements that are independent of the axioms as well? (Like the continuum hypothesis in ZFC.)

I feel like the robot metaphor may be outside of its domain of validity by now. Anyhow, I replied over in the other branch.

So if you give your robot the axioms of ZFC, it will eventually tell you if the continuum hypothesis is true or false?

Are you assuming that x can only range over the natural numbers? If x can range over reals or sets, or some arbitrary kind of objects described by the axioms, then it's harder to describe what the robot should do. The first problem is that an individual x can have no finite description. The second, more serious problem is that translating statements with quantifiers into statements of infinite length would require the robot to use some "true" model of the axioms, but often there are infinitely many models by Lowenheim-Skolem and no obvious way of picking out a true one.

Also, my original comment was slightly misleading - the "one true distribution" would in fact cover many statements with quantifiers, and miss many statements without quantifiers. The correct distinction is between statements that are provable or disprovable from the axioms, and statements that are independent of the axioms. If the axioms are talking about natural numbers, then all statements without quantifiers should be covered by the "one true distribution", but in general that doesn't have to be true.

Well, it's certainly a good point that there are lots of mathematical issues I'm ignoring. But for the topics in this sequence, I am interested not in those issues themselves, but in how they are different between classical logic and probabilistic logic.

This isn't trivial, since statements that are classically undetermined by the axioms can still have arbitrary probabilities (Hm, should that be its own post, do you think? I'll have to mention it in passing when discussing the correspondence between inconsistency and limited information). But in this post, the question is whether there is no difference for statements that are provable or disprovable from the axioms. I'm claiming there's no difference. Do you think that's right?

Yeah, I agree with the point that classical logic would instantly settle all digits of pi, so it can't be the basis of a theory that would let us bet on digits of pi. But that's probably not the only reason why we want a theory of logical uncertainty. The value of a digit of pi is always provable (because it's a quantifier-free statement), but our math intuition also allows us to bet on things like Con(PA), which is independent, or P!=NP, for which we don't know if it's independent. You may or may not want a theory of logical uncertainty that can cover all three cases uniformly.

But it turns out that there is one true probability distribution over mathematical statements, given the axioms. The right distribution is obtained by straightforward application of the product rule - never mind that it takes 4^^^3 steps - and if you deviate from the right distribution that means you violate the product rule at some point.

This does not seem right to me. I feel like you are sneakily trying to condition all of the robots probabilities on mathematical proofs that it does not have a-priori. E.g. consider A, A->B, therefore B. To learn that P(A->B)=1, the robot has to do a big calculation to obtain the proof. After this, it can conclude that P(B|A,A->B)=1. But before it has the proof, it should still have some P(B|A)!=1.

Sure, it seems tempting to call the probabilities you would have after obtaining all the proofs of everything the "true" probabilties, but to me it doesn't actually seem different to the claim that "after I roll my dice an infinity of times, I will know the 'true' probability of rolling a 1". I should still have some beliefs about a one being rolled before I have observed vast numbers of rolls.

In other words I suggest that proof of mathematical relationships should be treated exactly the same as any other data/evidence.

edit: in fact surely one has to consider this so that the robot can incorporate the cost of computing the proof into its loss function, in order to decide if it should bother doing it or not. Knowing the answer for certain may still not be worth the time it takes (not to mention that even after computing the proof the robot may still not have total confidence in it; if it is a really long proof, the probability that cosmic rays have caused lots of bit-flips to mess up the logic may become significant. If the robot knows it cannot ever get the answer with sufficient confidence within the given time constraints, it must choose an action which accounts for this. And the logic it uses should be just the same as how it knows when to stop rolling die).

edit2: I realised I was a little sloppy above; let me make it clearer here:

The robot knows P(B|A,A->B)=1 apriori. But it does not know "A->B" is true apriori. It therefore calculates

P(B|A) = P(B|A,A->B) P(A->B|A) + P(B|A,not A->B) P(not A->B|A) = P(A->B|A)

After it obtains proof that "A->B", call this p, we have P(A->B|A,p) = 1, so

collapses to

P(B|A,p) = P(B|A,A->B,p) = P(B|A,A->B) = 1

But I don't think it is reasonable to skip straight to this final statement, unless the cost of obtaining p is negligible.

edit3: If this somehow violates Savage or Cox's theorems I'd like to know why :).

If this somehow violates Savage or Cox's theorems I'd like to know why

Well, Cox's theorem has as a requirement that when your axioms are completely certain, you assign probability 1 to all classical consequences of those axioms. Assigning probability 0.5 to any of those consequences thus violates Cox's theorem. But this is kind of unsatisfying, so: where do we violate the product rule?

Suppose our robot knows that P(wet outside | raining) = 1. And it observes that it's raining, so P(rain)=1. But it's having trouble figuring out whether it's wet outside within its time limit, so it just gives up and says P(wet outside)=0.5. Has it violated the product rule? Yes. P(wet outside) >= P(wet outside and raining) = P(wet outside | rain) * P(rain) = 1.

If we accept that the axioms have probability 1, we can deduce the consequences with certainty using the product rule. If at any point we stop deducing the consequences with certainty, this means we have stopped using the product rule.

Hmm this does not feel the same as what I am suggesting.

Let me map my scenario onto yours:

A = "raining"

B = "wet outside"

A->B = "It will be wet outside if it is raining"

The robot does not know P("wet outside" | "raining") = 1. It only knows P("wet outside" | "raining", "raining->wet outside") = 1. It observes that it is raining, so we'll condition everything on "raining", taking it as true.

We need some priors. Let P("wet outside") = 0.5. We also need a prior for "raining->wet outside", let that be 0.5 as well. From this it follows that

according to our priors [first and second equalities are the same as in my first post, third equality follow since whether or not it is "raining" is not relevant for figuring out if "raining->wet outside"].

So the product rule is not violated.

P("wet outside") >= P("wet outside" and "raining") = P("wet outside" | "raining") P("raining") = 0.5

Where the inequality is actually an equality because our prior was P("wet outside") = 0.5. Once the proof p that "raining->wet outside" is obtained, we can update this to

P("wet outside" | p) >= P("wet outside" and "raining" | p) = P("wet outside" | "raining", p) P("raining" | p) = 1

But there is still no product rule violation because

In a nutshell: you need three pieces of information to apply this classical chain of reasoning; A, B, and A->B. All three of these propositions should have priors. Then everything seems fine to me. It seems to me you are neglecting the proposition "A->B", or rather assuming its truth value to be known, when we are explicitly saying that the robot does not know this.

edit: I just realised that I was lucky for my first inequality to work out; I assumed I was free to choose any prior for P("wet outside"), but it turns out I am not. My priors for "raining" and "raining->wet outside" determine the corresponding prior for "wet outside", in order to be compatible with the product rule. I just happened to choose the correct one by accident.

It seems to me you are neglecting the proposition "A->B"

Do you know what truth tables are? The statement "A->B" can be represented on a truth table. A and B can be possible. not-A and B can be possible. Not-A and not-B can be possible. But A and not-B is impossible.

A->B and the four statements about the truth table are interchangeable. Even though when I talk about the truth table, I never need to use the "->" symbol. They contain the same content because A->B says that A and not-B is impossible, and saying that A and not-B is impossible says that A->B. For example, "it raining but not being wet outside is impossible."

In the language of probability, saying that P(B|A)=1 means that A and not-B is impossible, while leaving the other possibilities able to vary freely. The product rule says P(A and not-B) = P(A) * P(not-B | A). What's P(not-B | A) if P(B | A)=1? It's zero, because it's the negation of our assumption.

Writing out things in classical logic doesn't just mean putting P() around the same symbols. It means making things behave the same way.

'They contain the same content because A->B says that A and not-B is impossible, and saying that A and not-B is impossible says that A->B. For example, "it raining but not being wet outside is impossible."'

If you're talking about standard propositional logic here, without bringing in probabilistic stuff, then this is just wrong or at best very misleadingly put. All 'A->B' says is that it is not the case that A and not-B - nothing modal.

Ok sure, so you can go through my reasoning leaving out the implication symbol, but retaining the dependence on the proof "p", and it all works out the same. The point is only that the robot doesn't know that A->B, therefore it doesn't set P(B|A)=1 either.

You had "Suppose our robot knows that P(wet outside | raining) = 1. And it observes that it's raining, so P(rain)=1. But it's having trouble figuring out whether it's wet outside within its time limit, so it just gives up and says P(wet outside)=0.5. Has it violated the product rule? Yes. P(wet outside) >= P(wet outside and raining) = P(wet outside | rain) * P(rain) = 1."

But you say it is doing P(wet outside)=0.5 as an approximation. This isn't true though, because it knows that it is raining, so it is setting P(wet outside|rain) = 0.5, which was the crux of my calculation anyway. Therefore when it calculates P(wet outside and raining) = P(wet outside | rain) * P(rain) it gets the answer 0.5, not 1, so it is still being consistent.

I'm just going to give up and hope you figure it on your own.

You haven't been very specific about what you think I'm doing incorrectly so it is kind of hard to figure out what you are objecting to. I corrected your example to what I think it should be so that it satisfies the product rule; where's the problem? How do you propose that the robot can possibly set P("wet outside"|"rain")=1 when it can't do the calculation?

In your example, it can't. Because the axioms you picked do not determine the answer. Because you are incorrectly translating classical logic into probabilistic logic. And then, as one would expect, your translation of classical logic doesn't reproduce classical logic.

It was your example, not mine. But you made the contradictory postulate that P("wet outside"|"rain")=1 follows from the robots prior knowledge and the probability axioms, and simultaneously that the robot was unable to compute this. To correct this I alter the robots probabilities such that P("wet outside"|"rain")=0.5 until such time as it has obtained a proof that "rain" correlates 100% with "wet outside". Of course the axioms don't determine this; it is part of the robots prior, which is not determined by any axioms.

You haven't convinced nor shown me that this violates Cox's theorem. I admit I have not tried to follow the proof of this theorem myself, but my understanding was that the requirement you speak of is that the probabilistic logic reproduces classical logic in the limit of certainty. Here, the robot is not in the limit of certainty because it cannot compute the required proof. So we should not expect to get the classical logic until updating on the proof and achieving said certainty.

It was your example, not mine.

No, you butchered it into a different example. Introduced the Lewis Carroll Paradox, even.

You haven't convinced nor shown me that this violates Cox's theorem.

He showed you. You weren't paying attention.

Here, the robot is not in the limit of certainty because it cannot compute the required proof.

It can compute the proof. The laws of inference are axioms; P(A|B) is necessarily known a priori.

such that P("wet outside"|"rain")=0.5 until such time as it has obtained a proof that "rain" correlates 100% with "wet outside".

There is no such time. Either it's true initially, or it will never be established with certainty. If it's true initially, that's because it is an axiom. Which was the whole point.

The laws of inference are axioms; P(A|B) is necessarily known a priori.

It does not follow that because someone knows some statements they also know the logical consequences of those statements.

When the someone is an idealized system of logic, it does. And we're discussing an idealized system of logic here. So it does.

No we aren't, we're discussing a robot with finite resources. I obviously agree that an omnipotent god of logic can skip these problems.

The limitation imposed by the bounded resources are the next entry in the sequence. For this, we're still discussing the unbounded case.

Very well, then i will wait for the next entry. But i thought the fact that we were explicitly discussing things the robot could not compute made it clear that resources were limited. There is clearly no such thing as logical uncertainty to the magic logic god of the idealised case.

Liked this post. One suggestion to improve readability would be for the first mention of a concept in this post (eg savages theorem) to hyperlink to the previous post that described it, or to a wiki article with details.

Thanks! I'll do that now.

Nevermind. Would be nice if you could actually delete comments in the first minute after posting them.

18

Logic as Probability

18

18