I'm completely flummoxed by the level of discussion here in the comments to Unknowns's post. When I wrote a post on logic and most commentors confused truth and provability, that was understandable because not everyone can be expected to know mathematical logic. But here we see people who don't know how to sum or reorder infinite series, don't know what a uniform distribution is, and talk about "the 1/infinity kind of zero". This is a rude wakeup call. If we want to discuss issues like Bayesianism, quantum mechanics or decision theory, we need to take every chance to fill the gaps in our understanding of math.
Yeah, it might help to make a list of "math recommended for reading and posting on Less Wrong" Unfortunately, the entire set is likely large enough such that even many physics majors won't have all of it (lots of physics people don't take logic or model theory classes). At this point the list of math topics seems to include Lebesque integration, Godel's theorems and basic model theory, basics of continuous and discrete probability spaces, and a little bit of theoretical compsci ranging over a lot of topics (both computability theory and complexity theory seem relevant). Some of the QM posts also require a bit of linear algebra to actually grok well but I suspect that anyone who meets most of the rest of the list will have that already. Am I missing any topics?
Am I missing any topics?
In getting to some of those things, there are some more basic subjects that would have to be mastered:
Algebra: giving names to unknown numerical quantities and then reasoning about them the way one would with actual numbers.
Calculus: the relationship between a rate of change and a total amount, and the basic differential equations of physics, e.g. Newtonian mechanics, the diffusion equation, etc.
If this all seems like a lot, many people spend until well into their twenties in school. Think where they would get to if all that time had been usefully spent!
Related to: Occam's Razor
If the Razor is defined as, “On average, a simpler hypothesis should be assigned a higher prior probability than a more complex hypothesis,” or stated in another way, "As the complexity of your hypotheses goes to infinity, their probability goes to zero," then it can be proven from a few assumptions.
1) The hypotheses are described by a language that has a finite number of different words, and each hypothesis is expressed by a finite number of these words. That this allows for natural languages such as English, but also for computer programming languages and so on. The proof in this post is valid for all cases.
2) A complexity measure is assigned to hypotheses in such a way that there are or may be some hypotheses which are as simple as possible, and these are assigned the complexity measure of 1, while hypotheses considered to be more complex are assigned higher integer values such as 2, 3, 4, and so on. Note that apart from this, we can define the complexity measure in any way we like, for example as the number of words used by the hypothesis, or in another way, as the shortest program which can output the hypothesis in a given programming language (e.g. the language of the hypotheses might be English but their simplicity measured according to a programming language; Eliezer Yudkowsky follows this way in the linked article.) Many other definitions would be possible. The proof is valid for all definitions that follow the conditions laid out.
3) The complexity measure should also be defined in such a way that there are a finite number of hypotheses given the measure of 1, a finite number given the measure of 2, a finite number given the measure of 3, and so on. Note that this condition is not difficult to satisfy; it would be satisfied by either of the definitions mentioned in condition 2, and in fact by any reasonable definition of simplicity and complexity. The proof would not be valid without this condition precisely because if simplicity were understood in such a way as to allow for an infinite number of hypotheses with minimum simplicity, the Razor would not be valid for that understanding of simplicity.
The Razor follows of necessity from these three conditions. To explain any data, there will be in general infinitely many mutually exclusive hypotheses which could fit the data. Suppose we assign prior probabilities for all of these hypotheses. Given condition 3, it will be possible to find the average probability for hypotheses of complexity 1 (call it x1), the average probability for hypotheses of complexity 2 (call it x2), the average probability for hypotheses of complexity 3 (call it x3), and so on. Now consider the infinite sum “x1 + x2 + x3…” Since all of these values are positive (and non-zero, since zero is not a probability), either the sum converges to a positive value, or it diverges to positive infinity. In fact, it will converge to a value less than 1, since if we had multiplied each term of the series by the number of hypotheses with the corresponding complexity, it would have converged to exactly 1—because probability theory demands that the sum of all the probabilities of all our mutually exclusive hypotheses should be exactly 1.
Now, x1 is a finite real number. So in order for this series to converge, there must be only a finite number of later terms in the series equal to or greater than x1. There will therefore be some complexity value, y1, such that all hypotheses with a complexity value greater than y1 have an average probability of less than x1. Likewise for x2: there will be some complexity value y2 such that all hypotheses with a complexity value greater than y2 have an average probability of less than x2. Leaving the derivation for the reader, it would also follow that there is some complexity value z1 such that all hypotheses with a complexity value greater than z1 have a lower probability than any hypothesis with a complexity value of 1, some other complexity value z2 such that all hypotheses with a complexity value greater than z2 have a lower probability than any hypothesis of complexity value 2, and so on.
From this it is clear that on average, or as the complexity tends to infinity, hypotheses with a greater complexity value have a lower prior probability, which was our definition of the Razor.
N.B. I have edited the beginning and end of the post to clarify the meaning of the theorem, according to some of the comments. However, I didn't remove anything because it would make the comments difficult to understand for later readers.