Today's post, 0 And 1 Are Not Probabilities was originally published on 10 January 2008. A summary (taken from the LW wiki):
In the ordinary way of writing probabilities, 0 and 1 both seem like entirely reachable quantities. But when you transform probabilities into odds ratios, or log-odds, you realize that in order to get a proposition to probability 1 would require an infinite amount of evidence.
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Infinite Certainty, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.
Well, I've a problem with attributing a non-1 probability of the laws of probabilities. Not that I couldn't conceive them to be false - but that if they are false, any reasoning done on probabilities is wrong anyway.
Or said otherwise : P(A|A) = 1 is true by definition. And I claim that when you write P(A) and apply probability theorems on it, you're in fact manipulating P(A|the laws of probabilities). So P(an axiom of probability theory) is in fact P(an axiom of probability theory|the laws of probabilities) = 1.
For theorems, you can say that P(Bayes Theorem) is not 1 because even if the axioms of probability theory are true, we may be wrong in proving Bayes Theorem from it. But as soon as you actually use Bayes Theorem to obtain a P(A) then you obtain in fact a P(A|Bayes Theorem).
Successful use would count as evidence for the laws of probabilities providing "good" values right? So if we use these laws quite a bit and they always work, we might have P(Laws of Probability do what we think they do) = .99999 We could discount our output using this. We could also be more constructive and discount based on the complexity of the derivation using the principle "long proofs are less likely to be correct" in the following way: Each derivation can be done in terms of combinations of various sub-derivations so we could ... (read more)