One, two, and three are all integers, and so is negative four. If you keep counting up, or keep counting down, you’re bound to encounter a whole lot more integers. You will not, however, encounter anything called “positive infinity” or “negative infinity,” so these are not integers.
Positive and negative infinity are not integers, but rather special symbols for talking about the behavior of integers. People sometimes say something like, “5 + infinity = infinity,” because if you start at 5 and keep counting up without ever stopping, you’ll get higher and higher numbers without limit. But it doesn’t follow from this that “infinity - infinity = 5.” You can’t count up from 0 without ever stopping, and then count down without ever stopping, and then find yourself at 5 when you’re done.
From this we can see that infinity is not only not-an-integer, it doesn’t even behave like an integer. If you unwisely try to mix up infinities with integers, you’ll need all sorts of special new inconsistent-seeming behaviors which you don’t need for 1, 2, 3 and other actual integers.
Even though infinity isn’t an integer, you don’t have to worry about being left at a loss for numbers. Although people have seen five sheep, millions of grains of sand, and septillions of atoms, no one has ever counted an infinity of anything. The same with continuous quantities—people have measured dust specks a millimeter across, animals a meter across, cities kilometers across, and galaxies thousands of lightyears across, but no one has ever measured anything an infinity across. In the real world, you don’t need a whole lot of infinity.1
In the usual way of writing probabilities, probabilities are between 0 and 1. A coin might have a probability of 0.5 of coming up tails, or the weatherman might assign probability 0.9 to rain tomorrow.
This isn’t the only way of writing probabilities, though. For example, you can transform probabilities into odds via the transformation O = (P/(1 - P)). So a probability of 50% would go to odds of 0.5/0.5 or 1, usually written 1:1, while a probability of 0.9 would go to odds of 0.9/0.1 or 9, usually written 9:1. To take odds back to probabilities you use P = (O∕(1 + O)), and this is perfectly reversible, so the transformation is an isomorphism—a two-way reversible mapping. Thus, probabilities and odds are isomorphic, and you can use one or the other according to convenience.
For example, it’s more convenient to use odds when you’re doing Bayesian updates. Let’s say that I roll a six-sided die: If any face except 1 comes up, there’s a 10% chance of hearing a bell, but if the face 1 comes up, there’s a 20% chance of hearing the bell. Now I roll the die, and hear a bell. What are the odds that the face showing is 1? Well, the prior odds are 1:5 (corresponding to the real number 1/5 = 0.20) and the likelihood ratio is 0.2:0.1 (corresponding to the real number 2) and I can just multiply these two together to get the posterior odds 2:5 (corresponding to the real number 2/5 or 0.40). Then I convert back into a probability, if I like, and get (0.4/1.4) = 2/7 = ~29%.
So odds are more manageable for Bayesian updates—if you use probabilities, you’ve got to deploy Bayes’s Theorem in its complicated version. But probabilities are more convenient for answering questions like “If I roll a six-sided die, what’s the chance of seeing a number from 1 to 4?” You can add up the probabilities of 1/6 for each side and get 4/6, but you can’t add up the odds ratios of 0.2 for each side and get an odds ratio of 0.8.
Why am I saying all this? To show that “odd ratios” are just as legitimate a way of mapping uncertainties onto real numbers as “probabilities.” Odds ratios are more convenient for some operations, probabilities are more convenient for others. A famous proof called Cox’s Theorem (plus various extensions and refinements thereof) shows that all ways of representing uncertainties that obey some reasonable-sounding constraints, end up isomorphic to each other.
Why does it matter that odds ratios are just as legitimate as probabilities? Probabilities as ordinarily written are between 0 and 1, and both 0 and 1 look like they ought to be readily reachable quantities—it’s easy to see 1 zebra or 0 unicorns. But when you transform probabilities onto odds ratios, 0 goes to 0, but 1 goes to positive infinity. Now absolute truth doesn’t look like it should be so easy to reach.
A representation that makes it even simpler to do Bayesian updates is the log odds—this is how E. T. Jaynes recommended thinking about probabilities. For example, let’s say that the prior probability of a proposition is 0.0001—this corresponds to a log odds of around -40 decibels. Then you see evidence that seems 100 times more likely if the proposition is true than if it is false. This is 20 decibels of evidence. So the posterior odds are around -40 dB + 20 dB = -20 dB, that is, the posterior probability is ~0.01.
When you transform probabilities to log odds, 0 goes to negative infinity and 1 goes to positive infinity. Now both infinite certainty and infinite improbability seem a bit more out-of-reach.
In probabilities, 0.9999 and 0.99999 seem to be only 0.00009 apart, so that 0.502 is much further away from 0.503 than 0.9999 is from 0.99999. To get to probability 1 from probability 0.99999, it seems like you should need to travel a distance of merely 0.00001.
But when you transform to odds ratios, 0.502 and 0.503 go to 1.008 and 1.012, and 0.9999 and 0.99999 go to 9,999 and 99,999. And when you transform to log odds, 0.502 and 0.503 go to 0.03 decibels and 0.05 decibels, but 0.9999 and 0.99999 go to 40 decibels and 50 decibels.
When you work in log odds, the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other. That is, the log odds gives us a natural measure of spacing among degrees of confidence.
Using the log odds exposes the fact that reaching infinite certainty requires infinitely strong evidence, just as infinite absurdity requires infinitely strong counterevidence.
Furthermore, all sorts of standard theorems in probability have special cases if you try to plug 1s or 0s into them—like what happens if you try to do a Bayesian update on an observation to which you assigned probability 0.
So I propose that it makes sense to say that 1 and 0 are not in the probabilities; just as negative and positive infinity, which do not obey the field axioms, are not in the real numbers.
The main reason this would upset probability theorists is that we would need to rederive theorems previously obtained by assuming that we can marginalize over a joint probability by adding up all the pieces and having them sum to 1.
However, in the real world, when you roll a die, it doesn’t literally have infinite certainty of coming up some number between 1 and 6. The die might land on its edge; or get struck by a meteor; or the Dark Lords of the Matrix might reach in and write “37” on one side.
If you made a magical symbol to stand for “all possibilities I haven’t considered,” then you could marginalize over the events including this magical symbol, and arrive at a magical symbol “T” that stands for infinite certainty.
But I would rather ask whether there’s some way to derive a theorem without using magic symbols with special behaviors. That would be more elegant. Just as there are mathematicians who refuse to believe in the law of the excluded middle or infinite sets, I would like to be a probability theorist who doesn’t believe in absolute certainty.
1I should note for the more sophisticated reader that they do not need to write me with elaborate explanations of, say, the difference between ordinal numbers and cardinal numbers. I’m familiar with the different set-theoretic notions of infinity, but I don’t see a good use for them in probability theory.
This confuses object level and meta level. In probability theory, P(-A|A) = 0 and P(A|A) = 1, however uncertain you may be about Cox's theorem, or about whether you are actually thinking about the same A each time it appears in those formulas. No-one, as far as I know, has ever constructed a theory of probability in which these are assigned anything else but 0 and 1. That is not to say that it cannot be done, only that it has not been done. Until that is done, 0 and 1 are probabilities.
The title of the article is a rhetorical flourish to convey the idea elaborated in its body, that to assert a probability, as a measure of belief, of 0 or 1 is to assert that no possible evidence could update that belief, that 0 and 1 are probabilities that you should not find yourself assigning to matters about which there could be any real dispute, and to suggest odds ratios or their logarithms as a better concept when dealing with practical matters associated with very low or very high probabilities. There is a very large difference between saying that the probability of winning a lottery is tiny and saying that it cannot happen at all; with enough participants it is almost certain to happen to someone. That difference is made clear by the log-odds scale, which puts the chance of a lottery ticket at 60 or more decibels below zero, not infinitely far below. In a world with 7 billion people, billion-to-1 chances happen every day.
As an example of even tinier probabilities which are still detectably different from zero, consider a typical computer. A billion transistors in its CPU, clocked a billion times a second, running for a conveniently round length of time, a million seconds, which is about 12 days. Computers these days can easily do that without a single hardware error, which means that for every one of a million billion billion switching events, a transistor opened or closed exactly as designed. A million billion billion is about 1.5 times Avogadro's number. The corresponding log-odds is -240 decibels. And yet hardware glitches can still happen.
And P(A|A) is still 1, not any finite number of decibels.