In Probability Space & Aumann Agreement, I wrote that probabilities can be thought of as weights that we assign to possible world-histories. But what are these weights supposed to mean? Here I’ll give a few interpretations that I've considered and held at one point or another, and their problems. (Note that in the previous post, I implicitly used the first interpretation in the following list, since that seems to be the mainstream view.)
- Only one possible world is real, and probabilities represent beliefs about which one is real.
- Which world gets to be real seems arbitrary.
- Most possible worlds are lifeless, so we’d have to be really lucky to be alive.
- We have no information about the process that determines which world gets to be real, so how can we decide what the probability mass function p should be?
- All possible worlds are real, and probabilities represent beliefs about which one I’m in.
- Before I’ve observed anything, there seems to be no reason to believe that I’m more likely to be in one world than another, but we can’t let all their weights be equal.
- Not all possible worlds are equally real, and probabilities represent “how real” each world is. (This is also sometimes called the “measure” or “reality fluid” view.)
- Which worlds get to be “more real” seems arbitrary.
- Before we observe anything, we don't have any information about the process that determines the amount of “reality fluid” in each world, so how can we decide what the probability mass function p should be?
- All possible worlds are real, and probabilities represent how much I care about each world. (To make sense of this, recall that these probabilities are ultimately multiplied with utilities to form expected utilities in standard decision theories.)
- Which worlds I care more or less about seems arbitrary. But perhaps this is less of a problem because I’m “allowed” to have arbitrary values.
- Or, from another perspective, this drops another another hard problem on top of the pile of problems called “values”, where it may never be solved.
As you can see, I think the main problem with all of these interpretations is arbitrariness. The unconditioned probability mass function is supposed to represent my beliefs before I have observed anything in the world, so it must represent a state of total ignorance. But there seems to be no way to specify such a function without introducing some information, which anyone could infer by looking at the function.
For example, suppose we use a universal distribution, where we believe that the world-history is the output of a universal Turing machine given a uniformly random input tape. But then the distribution contains the information of which UTM we used. Where did that information come from?
One could argue that we do have some information even before we observe anything, because we're products of evolution, which would have built some useful information into our genes. But to the extent that we can trust the prior specified by our genes, it must be that evolution approximates a Bayesian updating process, and our prior distribution approximates the posterior distribution of such a process. The "prior of evolution" still has to represent a state of total ignorance.
These considerations lead me to lean toward the last interpretation, which is the most tolerant of arbitrariness. This interpretation also fits well with the idea that expected utility maximization with Bayesian updating is just an approximation of UDT that works in most situations. I and others have already motivated UDT by considering situations where Bayesian updating doesn't work, but it seems to me that even if we set those aside, there is still reason to consider a UDT-like interpretation of probability where the weights on possible worlds represent how much we care about those worlds.
Your getting yourself in trouble because you assume that puzzling questions must have deep answers when usually the question itself is flawed or misleading. In this case there just seems to be a need for any explanation of the kind you offer nor would be of any use anyway.
These 'explanations' you offer of probability aren't really explaining anything. Certainly we do succesfully use probability to reason about systems that behave in a deterministic classical fashion (rolling dice probably counts). No matter what sort of probability you believe in you have to explain that application. So introducing 'objective' probability merely adds things we need to explain (possible worlds etc..).
The correct approach is to step back and ask what is it that needs explaining. Well probability is really nothing but a fancy way of counting up outcomes. So once we justify describing the world in a probabilistic fashion (even when it's deterministic in some sense) the application of mathematical inference to reformulate that description in more useful ways is untroubling. In other words if it's reasonable to model rolling two six sided dice as being independent uniformly random variables on 1...6 counting up the combinations and saying there is a 1/6 chance of getting a 7 doesn't raise any new difficulties.
So the question just comes down to is it reasonable of us to model the world using random variables?. I mean one might worry that some worlds were deeply 'tricky' in that almost always when it appeared two objects behaved like independent random variables in reality there was some hidden correlation that would eventually pop out to bite you in the ass and then once you'd taken that correlation into account another one would bite you and so on and so on.
But if you think about it for awhile this isn't really so much a question about the nature of the world as it is a purely mathematical question. If we keep factoring out by our best predictions will the remaining unaccounted for variation in outcomes appear to be random, i.e., make modeling it as random variables an accurate way to make predictions? Well that's actually kinda complicated, I have a theorem (well tiny tweak of someone else's theorem plus interpratation) which I believe says that yes indeed it must work this way. I won't go into it here but let me just say one thing to convince you of it's plausibility.
Basically the argument is that things only fail to look random because we notice a more accurate way of predicting their behavior. The only evidence for a sequence of observations failing to be random according to the supposed distribution would be a pattern in the observations not captured by R so would in turn yield a more accurate distribution. So basically the claim is that we can always simply divide up any observable into the part we can predict (i.e. a distribution of outcomes) and the part we can't. Once you mod out by the part you can predict by defintion anything left is totally unpredictable to you (e.g. computable machines) and thus can't detectably fail to look random according to it's distribution since that would be a better prediction.
This isn't rigorous (it's complicatd) but the point is that Randomness is nothing but our inability to make any better predictions