Many people believe that Bayesian probability is an exact theory of uncertainty, and other theories are imperfect approximations. In this post I'd like to tentatively argue the opposite: that Bayesian probability is an imperfect approximation of what we want from a theory of uncertainty. This post won't contain any new results, and is probably very confused anyway.
I agree that Bayesian probability is provably the only correct theory for dealing with a certain idealized kind of uncertainty. But what kinds of uncertainty actually exist in our world, and how closely do they agree with what's needed for Bayesianism to work?
In a Tegmark Level IV world (thanks to pragmatist for pointing out this assumption), uncertainty seems to be either indexical or logical. When I flip a coin, the information in my mind is either enough or not enough to determine the outcome in advance. If I have enough information - if it's mathematically possible to determine which way the coin will fall, given the bits of information that I have received - then I have logical uncertainty, which is no different in principle from being uncertain about the trillionth digit of pi. On the other hand, if I don't have enough information even given infinite mathematical power, it implies that the world must contain copies of me that will see different coinflip outcomes (if there was just one copy, mathematics would be able to pin it down), so I have indexical uncertainty.
The trouble is that both indexical and logical uncertainty are puzzling in their own ways.
With indexical uncertainty, the usual example that breaks probabilistic reasoning is the Absent-Minded Driver problem. When the probability of you being this or that copy depends on the decision you're about to make, these probabilities are unusable for decision-making. Since Bayesian probability is in large part justified by decision-making, we're in trouble. And the AMD is not an isolated problem. In many imaginable situations faced by humans or idealized agents, there's a nonzero chance of returning to the same state of mind in the future, and that chance slightly depends on the current action. To the extent that's true, Bayesian probability is an imperfect approximation.
With logical uncertainty, the situation is even worse. We don't have a good theory of how logical uncertainty should work. (Though there have been several attempts, like Benja and Paul's prior, Manfred's prior, or my own recent attempt.) Since Bayesian probability is in large part justified by having perfect agreement with logic, it seems likely that the correct theory of logical uncertainty won't look very Bayesian, because the whole point is to have limited computational resources and only approximate agreement with logic.
Another troubling point is that if Bayesian probability is suspect, the idea of "priors" becomes suspect by association. Our best ideas for decision-making under indexical uncertainty (UDT) and logical uncertainty (priors over theories) involve some kind of priors, or more generally probability distributions, so we might want to reexamine those as well. Though if we interpret a UDT-ish prior as a measure of care rather than belief, maybe the problem goes away...
I have also been leaning towards the existence of a theory more general than probability theory, based on a few threads of thinking.
One thread is anthropic reasoning, where it is sometimes clear how to make decision, yet probabilities don't make sense and it feels to me that the information available in some anthropic situations just "doesn't decompose" into probabilities. Stuart Armstrong's paper on the sleeping beauty problem is, I think, valuable and greatly overlooked here.
Another thread is the limited-computation issue. We would all like to have a theory that pins down ideal reasoning, and then work out how to efficiently approximate that theory in a turing machine as a completely separate problem. My intuition is that things just don't decompose this way. I think that a complete theory of reasoning will make direct reference to models of computation.
This site has collected quite a repertoire of decision problems that challenge causal decision theory. They all share the following property (including your example in the comment above): that in a causal graph containing as a node, there are links from to that do not go via your (for newcomb-like problems) or that do not go via (anthropic problems). Or in other words, your decisions are not independent of your beliefs about the world. The UDT solution says: "instead of drawing a graph containing , draw one that contains and you will see that the independence between beliefs and decisions is restored!". This feels to me like a patch rather than a full solution, similar to saying "if your variables are correlated and you don't know how to deal with correlated distributions, try a linear change of variables -- maybe you'll find one that de-correlates them!". This only works if you're lucky enough to find a de-correlating change of variables. An alternate approach would be to work out how to deal with non-independent beliefs/decision directly.
One thought experiment I like to do is to ask probability theory to justify itself in a non-circular way. For example, let's say I propose the following Completely Stupid Theory Of Reasoning. In CSTOR, belief states are represented by a large sheet of paper where I write down everything that I have ever observed. What is my belief state at time t, you ask? Why, it is simply the contents of the entire sheet of paper. But what is my belief state about a specific event? Again, the contents of the entire sheet of paper. How does CSTOR update on new evidence? Easy! I simply add a line of writing to the bottom of the sheet. How does CSTOR marginalize? It doesn't! Marginalization is just for dummies who use probability theory, and, as you can see, CSTOR can do all the things that a theory of reasoning should do without need for silly marginalization.
So what really distinguishes CSTOR from probability theory? I think the best non-circular answer is that probability theory gives rise to a specific algorithm for making decisions, where CSTOR doesn't. So I think we should look at decision making as primary and then figure out how to decompose decision making into some abstract belief representation plus abstract notion of utility, plus some abstract algorithm for making decisions.
Can you try to come up with a situation where that independence is not restored? If we follow the analogy with correlations, it's always possible to find a linear map that decorrelates variables...