Nice example of how using a probability of exactly zero can screw you over. Two observations.
Could have done with a link to Eliezer's 0 and 1 are not probabilities from back in 2008.
I say you can get to 99.99% confidence that 1159 is prime (if it actually is; I haven't checked); probably 99.9999%. Suppose you (a) write a program to check all possible divisors, test that it gives the right answers for everything up to 100, and run it multiple times in case of cosmic rays; (b) look it up in a table of prime numbers; (c) apply, by hand, one of the fancy number-theoretical primality tests (most of these are probabilistic -- but again you can find statements of the form "If a number less than 10^12 passes these specific tests and isn't one of the following short list of exceptions, it is prime"). Then I reckon that apart from theories where what's wrong is your brain a,b,c are clearly extremely close to independent; (a) has well less than 0.001 chance of failure, (b) well less than 0.01, and (c) well less than 0.1; so the only hypothesis you need to consider that might take the probability above 10^-6 is that you're delusional in some way that specifically messes up your ability to tell whether 1159 is prime. Now (d) this is surely extraordinarily rare -- delusions in general aren't so rare, but this is something very specific and weird; and (e) if your mind is that badly screwed up then attempting to work with probabilities is probably kinda meaningless anyway. (Theorems like Cox's say that you should use probabilities as measures of confidence, and compute with them in the standard ways, if you are a rational agent. If you are known to be a catastrophically irrational agent, then what probabilities you assign is probably not the greatest of your worries.)
apart from theories where what's wrong is your brain
(just amused by the possibility)
Also, it is possible that Peano arithmetic isn't consistent; if so, either the very concept of 'primality' doesn't make any sense, or it can just mess up the primality tests which were used in creation of (b) and (c), and the connection between "1159 if prime" and "this program outputs True and halts" as well.
Of course, it screws up any application of Cox's theorem here, even worse than in delusion case.
Eliezer wrote a post warning against unrealistically confident estimates, in which he argued that you can't be 99.99% sure that 53 is prime. Chris Hallquist replied with a post arguing that you can.
That particular case is tricky. There have been many independent calculations of the first hundred prime numbers. 53 is a small enough number that I think someone would notice if Wikipedia included it erroneously. But can you be 99.99% confident that 1159 is a prime? You found it in one particular source. Can you trust that source? It's large enough that no one would notice if it were wrong. You could try to verify it, but if I write a Perl or C++ program, I can't even be 99.9% sure that the compiler or interpreter will interpret it correctly, let alone that the program is correct.
Rather than argue over the number of nines to use for a specific case, I want to emphasize the the importance of not assigning things probability zero or one. Here's a real case where approximating 99.9999% confidence as 100% had disastrous consequences.
I developed a new gene-caller for JCVI. Genes are interpreted in units of 3 DNA nucleotides called codons. A bacterial gene starts with a start codon (usually ATG, TTG, or GTG) and ends at the first stop codon (usually TAG, TGA, or TAA). Most such sequences are not genes. A gene-caller is a computer program that takes a DNA sequence and guesses which of them are genes.
The first thing I tried was to create a second-order Markov model on codons, and train it on all of the large possible genes in the genome. (Long sequences without stop codons are unlikely to occur by chance and are probably genes.) That means that you set P = 1 and go down the sequence of each large possible gene, codon by codon, multiplying P by the probability of seeing each of the 64 possible codons in the third position given the codons in the first and second positions. Then I created a second Markov model from the entire genome. This took about one day to write, and plugging these two models into Bayes' law as shown below turned out to work better than all the other single-method gene-prediction algorithms developed over the past 30 years.
But what probability should you assign to a codon sequence that you've never seen? A bacterial genome might have 4 million base pairs, about half of which are in long possible genes and will be used for training. That means your training data for one genome has about 2 million codon triplets. Surprisingly, a little less than half of all possible codon triplets do not occur at all in that data (DNA sequences are not random). What probability do you assign to an event that occurs zero times out of 2 million?
This came up recently in an online argument. Another person said that, if the probability that X is true is below your detection threshold or your digits of accuracy, you should assign P(X) = 0, since any other number is just made up.
Well, I'd already empirically determined whether that was true for the gene caller. First, due to a coding error, I assigned such events P(X) = 1 / (64^3 * size of training set), which is too small by about 64^3. Next I tried P(X) = 0.5 / (size of training set), which is approximately correct. Finally I tried P(X) = 0. I tested the results on genomes where I had strong evidence for what where and were not genes.
How well do you think each P(X) worked?
The two non-zero probabilities gave nearly the same results, despite differing by 6 orders of magnitude. But using P(X) = 0 caused the gene-caller to miss hundreds of genes per genome, which is a disastrous result. Why?
Any particular codon triplet that was never found in the training set would have a prior of less than one in 4 million. But because a large number of triplets are in genes outside the training set, that meant some of those triplets (not most, but about a thousand of them) had true priors of being found somewhere in those genes of nearly one half. (You can work it out in more detail by assuming a Zipf law distribution of priors, but I won't get into that.)
So some of them did occur within genes in that genome, and each time one did, its assigned probability of zero annihilated all the hundreds of other pieces of evidence for the existence of that gene, making the gene impossible to detect.
You can think of this using logarithms. I computed
where P(sequence) and P(sequence | gene) are computed using the two Markov models. Each of them is the product of a sequence of Markov probabilities. Ignoring P(gene), which is constant, we can compute
You can think of this as adding the bits of information it would take to specify that triplet outside of a gene, and subtracting the bits of information it would take to specify that information inside a gene, leaving bits of evidence that it is in a gene.
If we assign P(codon3 | codon1, codon2, gene) = 0, the number of bits of information it would take to specify "codon3 | codon1, codon2" inside a gene is -log(0) = infinity. Assign P(X) = 0 is claiming to have infinite bits of information that X is false.
Going back to the argument, the accuracy of the probabilities assigned by the Markov model are quite low, probably one to three digits of accuracy in most cases. Yet it was important to assign positive probabilities to events whose probabilities were at least seven orders of magnitude below that.
It didn't matter what probability I assigned to them! Given hundreds of other bits scores to add up, changing the number of bits taken away by one highly improbable event by 10 had little impact. It just matters not to make it zero.