Related to: Where Recursive Justification Hits Bottom, Priors as Mathematical Objects, Probability is Subjectively Objective
Follow up to: A Proof of Occam's Razor
In my post on Occam’s Razor, I showed that a certain weak form of the Razor follows necessarily from standard mathematics and probability theory. Naturally, the Razor as used in practice is stronger and more concrete, and cannot be proven to be necessarily true. So rather than attempting to give a necessary proof, I pointed out that we learn by induction what concrete form the Razor should take.
But what justifies induction? Like the Razor, some aspects of it follow necessarily from standard probability theory, while other aspects do not.
Suppose we consider the statement S, “The sun will rise every day for the next 10,000 days,” assigning it a probability p, between 0 and 1. Then suppose we are given evidence E, namely that the sun rises tomorrow. What is our updated probability for S? According to Bayes’ theorem, our new probability will be:
P(S|E) = P(E|S)P(S)/P(E) = p/P(E), because given that the sun will rise every day for the next 10,000 days, it will certainly rise tomorrow. So our new probability is greater than p. So this seems to justify induction, showing it to work of necessity. But does it? In the same way we could argue that the probability that “every human being is less than 10 feet tall” must increase every time we see another human being less than 10 feet tall, since the probability of this evidence (“the next human being I see will be less than 10 feet tall”), given the hypothesis, is also 1. On the other hand, if we come upon a human being 9 feet 11 inches tall, our subjective probability that there is a 10 foot tall human being will increase, not decrease. So is there something wrong with the math here? Or with our intuitions?
In fact, the problem is neither with the math nor with the intuition. Given that every human being is less than 10 feet tall, the probability that “the next human being I see will be less than 10 feet tall” is indeed 1, but the probability that “there is a human being 9 feet 11 inches tall” is definitely not 1. So the math updates on a single aspect of our evidence, while our intuition is taking more of the evidence into account.
But this math seems to work because we are trying to induce a universal which includes the evidence. Suppose instead we try to go from one particular to another: I see a black crow today. Does it become more probable that a crow I see tomorrow will also be black? We know from the above reasoning that it becomes more probable that all crows are black, and one might suppose that it therefore follows that it is more probable that the next crow I see will be black. But this does not follow. The probability of “I see a black crow today”, given that “I see a black crow tomorrow,” is certainly not 1, and so the probability of seeing a black crow tomorrow, given that I see one today, may increase or decrease depending on our prior – no necessary conclusion can be drawn. Eliezer points this out in the article Where Recursive Justification Hits Bottom.
On the other hand, we would not want to draw a conclusion of that sort: even in practice we don’t always update in the same direction in such cases. If we know there is only one white marble in a bucket, and many black ones, then when we draw the white marble, we become very sure the next draw will not be white. Note however that this depends on knowing something about the contents of the bucket, namely that there is only one white marble. If we are completely ignorant about the contents of the bucket, then we form universal hypotheses about the contents based on the draws we have seen. And such hypotheses do indeed increase in probability when they are confirmed, as was shown above.
The issue is too involved to give a full justification of induction here, but I will try to give a very general idea. (This was on my mind a while back as I got asked about it in an interview.)
Even if we don't assume that we can apply statistics in the sense of using past observations to tell us about future observations, or observations about some of the members of a group to tell us about other members of a group, I suggest we are justified in doing the following.
Given a reference class of possible worlds in which we could be, in the absence of any reason for thinking otherwise, we are justified in thinking that any world from the reference class is as likely as any other to be our world. (Now, this may seem an attempt to sneak statistics in - but, really, all I said was that if we have a list of possible worlds that we could be in, and we don't know, then we our views on probability merely indicate that we don't know.)
The next issue is how this reference class is constructed - more specifically, how each member of the reference class is constructed. It may seem to make sense to construct each world by "sticking bits of space-time together", but I suggest that this itself implies an assumption. After all, many things in a world can be abstract entities: How do we know what appear to be basic things aren't? Furthermore, why build the reference class like that? What is the justification? It also forces a particular view of physics onto us. What about views of physics were space-time may not be fundamental? They would be eliminated from the referenc class.
The only justifiable way of building the reference class is to say that the world is an object, and that the reference class of worlds is "Every formal description of a world". Rather than make assumptions about what space is, what time is, etc, we should insist that the description merely describes the world, including its history as an object. Such a description is out situation at any time. At any time, we live in a world which has some description, and all I am saying is that the reference class is all possible descriptions. Now, it may seem that I am trying to sneak laws of nature and regular behavior in by the backdoor here, but I am not: If we can't demand that a world be formally describable we are being incoherent. If we can't demand that the reference class contains every such formal description, surely the most general idea we could have of building a reference class, we are imposing something more specific, with all kinds of ontological assumptions, on it.
Now, if we see regular patterns in a world, this justifies expecting those patterns to continue. For a pattern to be made by the description specifiying each element individually will take a lot of information. Therefore, the description must be highly specific and only a small proportion of possible world-descriptions in the reference class will comply. On the other hand, if the pattern is made by a small amount of information in the world-description, which describes the entire pattern, this is much less specific and a greater proportion of possible worlds will comply: We are demanding less specific information content in a possible world for it to be ours. Therefore, if we see a regular pattern, it is much more likely that our world is one of the large proportion of worlds where that pattern results from a small amount of information in the description that one of the much smaller proportion of worlds where it results from a much greater amount of information in the description.
A pattern which results from a small amount of information in the world description should be expected to be continued, because that is the very idea of a pattern generated by a small amount of information. For example, if you find yourself living in a world which looks like part of the Mandelbrot set, you should think it more likely that you live in a world where the Mandelbrot rule is part of the description of that world and expect to see more Mandelbrot pattern in every places.
Therefore, patterns should be expected to be continued.
I also suggest that Hume's problem of induction only appears in the first place because people have the misplaced idea that the reference class should be built up second by second, from the point of view of a being inside time, when it should ideally be built from the point of view of an observer not restricted in that way.
That's a great observation! Thanks!