Approaching Logical Probability

Manfred

13 Approaching Logical Probability

27th Feb 2014

4 min read

13

If we design a robot that acts as if it's uncertain about mathematical statements, that violates some desiderata for probability. But realistic robots cannot prove all theorems; they have to be uncertain about hard math problems.

In the name of practicality, we want a foundation for decision-making that captures what it means to make a good decision, even with limited resources. "Good" means that even though our real-world robot can't make decisions well enough to satisfy Savage's theorem, we want to approximate that ideal, not throw it out. Although I don't have the one best answer to give you, in this post we'll take some steps forward.

The objects we call probabilities are specified by desiderata that tell us how they behave. Any uncertainty about math problems violates those desiderata, but we still want to be able to assign logical probabilities that behave a lot like probabilities. The basic principles - not making up or ignoring information, not throwing away money or being inconsistent - should be deviated from as little as possible even when computing power is scarce. We want to develop a foundation for logical probabilities by starting from the rules governing ordinary probability, and then minimally restricting the application of those rules.

As we do this, it's important to keep track of what changes we make and why. Sometimes people just define logical probabilities, without worrying about desiderata. This is fine, when it works, and is often patchable if it doesn't have the right properties. But if you use it for something important and and get a surprise failure, it's really bad. My hope here is to construct logical probabilities that have the good properties, while keeping handwaving and mysterious assumptions to a minimum.

The perils of handwaving are more dire than they appear, and they are at their most dire in the hardest and most confusing reaches of physics. After better approaches fail, many theorists resort to just making up approximations and then trying to justify them. Doing this is known colloquially as a "1/ego expansion." Simply put, it doesn't work; there are too many vital little details. It's why even condensed matter theorists tell you not to trust condensed matter theorists about high temperature superconductivity.

We must abandon regular probabilities because our robot has limited time, but other parts of the decision-making process can also go over the time limit. If the robot's resources are limited, expected utility maximization breaks down at many points: there might be too many strategies to search through, too many outcomes to foresee, there might be probabilities that are too hard to find, and the utility of the outcomes might be too complicated.

The logical probabilities considered in this sequence will help approximate hard math problems, but they don't seem to help much when there are too many outcomes to consider, or if you want to make the best use of limited computational resources. They are only a part of the full solution.

Time for a desideratum: we want our robot to only assign a logical probability of 1 or 0 to a statement after it's actually checked the proof of that statement.

We can think of this as limiting what statements our robot is allowed to be certain about - only statements with short proofs can be found by our agent. However, this desideratum is not just about proof length, because a real robot won't check every checkable proof - it will spend time generating proofs, maybe trying to prove some specific statement, and will end up only checking some subset of short proofs.

Logical probabilities, unlike probabilities, are not determined just by the starting information. If our real robot only verifies some small collection of proofs, the robot's logical probabilities depend heavily upon what proof-steps were checked. One proof-step is just one truth-preserving step by our robot, like one application of modus ponens - it's a little proof one step long. The import is that they're the atomic unit of proofs, and once all the steps of a proof are checked, the proof is checked.

If we condition on which proof-steps get checked, does that determine the logical probabilities?

For any statement our robot is going to prove or disprove, we can use the checked proof steps to find whether it's logical probability 1 or 0. This gives the same answer as a real robot that checks steps according to some process and then returns 1 or 0 if it manages to prove or disprove the statement we give it. We just have to take the steps that the real robot ends up checking, and say that those are the proved steps for our abstract mathematical definition.

There's a problem, though. We haven't changed the old axioms, so they're still only satisfied if we get the right answer for everything. Meanwhile our new desideratum says we can't get the right answer for everything - we've made our axioms internally contradictory. In order to talk about the logical probabilities of unproven statements, we'll need to weaken the original axioms so that they no longer require certainty about everything. We'll explore ways to do this next time. Then we can assign numbers to statements in the usual way, by using our weakened axioms to find constraints, then maximizing entropy subject to those constraints.

Part of the sequence Logical Uncertainty

Previous Post: Logic as Probability

Next post: Solutions and Open Problems

Personal Blog

13

New Comment

Rendering 0/24 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:36 AM

Moderation Log

13 Approaching Logical Probability

by Manfred

27th Feb 2014

4 min read

13

Followup To: Logic as Probability

Time for a desideratum: we want our robot to only assign a logical probability of 1 or 0 to a statement after it's actually checked the proof of that statement.

If we condition on which proof-steps get checked, does that determine the logical probabilities?

Part of the sequence Logical Uncertainty

Previous Post: Logic as Probability

Next post: Solutions and Open Problems

Personal Blog

13

Mentioned in

18Logic as Probability

13Solutions and Open Problems

New Comment

Rendering 0/24 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:36 AM

Moderation Log

More from Manfred

Curated and popular this week

24Comments

Comment Permalink

Manfred12y00

Cox's theorem literally has as a desideratum that the results should be identical to classical logic when you're completely certain about the axioms. This is what's violated.

I'll try illustrating this with an example.

Suppose we have a calculator that wants to add 298+587. If it can only take small steps, it just has to start with 298, and then do 298+1=299 (keeping track that this is step 1), 299+1=300 (step 2), 300+1=301 (that's 3), etc, until it reaches 884+1=885 (step 587), at which point the calculator put's "885" on the screen as its last step. And so our calculator, when you input 298+587, outputs 885.

In order to add two big numbers, our calculator only ever had to add 1 to things - the stored number and the step count - which is pretty cool.

The output follow inexorably follows from the rules of this calculator. When the first input was 298 and the step count is 587, the stored number is 885, and when the step count is equal to the second input, the calculator puts the stored number on the screen. We cannot get a different answer without using a calculator that follows different rules.

Now suppose we build a calculator that does the same thing but with different labels. We ask it for P(298+587=885 | standard axioms). So it start with something it knows - P(298=298)=1. Then it moves to P(298+1=299)=1. Then P(298+2=300)=1. Eventually it reaches P(298+587=885)=1, so it outputs "1." This is just a different label on our normal calculator. Every step it increments two numbers, one on the left and one on the fight, and then it stops when the incremented number on the left reaches 587. It's the same process.

This is the translation that obey's Cox's theorem. It's the same process as classical logic, because when your'e certain about axioms probabilistic logic and classical logic are equivalent.

Logical uncertainty is like saying that P(298+587=885 | standard axioms) = 0.5. This violates Cox's theorem because it doesn't agree with our "re-labeled calculator." But why is that bad / weird?

Well, suppose that even when we're logically uncertain, we can agree that P(298=298)=1, and that P(298+1=299)=1. So why can't we just do the calculator process, and keep adding 1 to both sides until we reach the desired answer? There must be some step where the calculator process breaks down, where our logically uncertain agent accepts that , e.g., P(298+100=398)=1, but thinks P(298+101=399)=0.5

In classical probability, this is not allowed. If you think that P(298+100=398)=1, and that adding 1 to both sides preserves probability (one of the axioms that defines addition), then it follows that P(298+101=391)=1. Anything else is... illogical.

tom_cr12y00

Thanks for taking the time to elaborate.

I don't recall that desideratum in Jaynes' derivations. I think it is not needed. Why should it be needed? Certainty about axioms is a million miles from certainty about all their consequences, as seems to be the exact point of your series.

Help me out, what am I not understanding?

See in context