Kant thought that space being Euclidean was a priori logically necessary, hence determinable from pure thought, hence true without need for empirical fact checking... and in the end this turned out to be wrong. Einstein had the last laugh (so far).
I have wondered now and again whether it might be that Cox's Postulates are similar to Euclid's Postulates and might have similar subtle exceptional discrepancies with physical reality in practice.
It is hard to form hypotheses here, partly for a lack of vivid theoretical alternatives. I know of two claims floating around in the literature that hint at substantive alternatives to Bayes.
One approach involves abandoning at least one of Aristotle's three laws of thought (excluded middle, non-contradiction, and identity) and postulating, essentially, that reality itself might be ontologically ambiguous. If I had to pick one to drop, I think I'd drop excluded middle. Probably? Constructionist/intuitionist logic throws that one out often, and automated proof systems often leave it out by default. Under the keywords "fuzzy logic" there were attacks on these laws that directly reference Jaynes. So this is maybe one way to find a crack in the universe out of which we might wiggle.
The only other approach I know of in the literature is (for me) centrally based on later chapters in Scott Aaronson's "Quantum Computing Since Democritus" (try clicking the link and then do ^f bayes) where, via hints and aspersions, Aaronson suggests that quantum mechanics can be thought of as Bayesian... except with complex numbers for the probabilities, and thus (maybe?) Bayesianism is essentially a potentially empirically false religion? Aaronson doesn't just say this directly and at length. And his mere hints would be the place I left this summary... except that while hunting for evidence I ran across a link to what might be a larger and more direct attack on the physical reality of Bayesianism? (Looking at it: using axioms no less! With "the fifth axiom" having variations, just like Euclid?!)
So that arxiv paper by Lucien Hardy (that I missed earlier! (that was written in 2008?!?)) might just have risen to the top of my philosophy reading stack? Neat! <3
Maybe it is worth adding a third approach that I don't think really counts... When the number of variables in a belief net goes up, the difficulty of simply performing mere inference becomes very hard to compute, with relatively general assumptions the algorithms ending up in NP-hard. This "doesn't count as a real deep philosophically satisfying alternative to Bayes" for me because it seems like the practical upshot would just be that we need more CPU, and more causal isolation for the systems we care about (so their operation is more tractable to reason). Like... the practical impossibility of applying Bayes in general to large systems would almost help FIGHT the the other "possible true/deep alternatives" to Bayes, because it creates an alternative explanation for any subjective experience of sorta feeling like you had probabilities figured out, and then your probabilities came out very wrong. Like: maybe there were too many variables, and the NP-hardness just caught up with you? Would you really need to question the "laws of thought" themselves to justify your feeling of having been been in the physical world and then ended up "surprisingly surprised"? Seriously? Seriously?
Anyway. I was wondering if you, having recently looked at the pillars of pure thinking themselves, had thoughts about any cracks, or perhaps any even deeper foundations, that they might have :-)
You might be also be interested in "General Bayesian Theories and the Emergence of the Exclusivity Principle" by Chiribella et al. which claims that quantum theory is the most general theory which satisfies Bayesian consistency conditions.
By now, there are actually quite a few attempts to reconstruct quantum theory from more "reasonable" axioms besides Hardy's. You can track the refrences in the paper above to find some more of them.
Thank you for your well-thought comment. One of the desiderata used to derive the original product rule is to use real numbers to represent the degrees of plausibility. So, it will be very interesting to see if the result still holds if we relax it to be a complex numbers.
I really like this article.
It has helped me appreciate how product rules (or additivity, if we apply a log transform) arises in many contexts. One thing I hadn't appreciated when studying Cox theorem is that you do not need to respect "commutativity" to get a product rule (though obviously this restricts how you can group information). This was made very clear to me in example 3.
One thing that confused me in the first reading was that I misunderstood you as referring to the third requirement as associativity of . Rereading this is not the case; you just say that the third requirement implies that F is associative. But I wish you had spelled out the implication, ie saying that.
“We can group projects into subprojects without changing the overall return”
What if this were not true? Would that make the problem intractable?
Imagine we have a company with investment projects A, B, C,....For instance, A might be a new high-speed Internet service, B might be a new advanced computer, C might be a new inventory management software, etc. We are interested in calculating the total return from these investments at the company. This calculation could be fairly complicated since returns are context-dependent - e.g., new computer B might have higher return in the context of new Internet service A than it would without the new Internet service. But let’s assume that the returns satisfy a few reasonable properties.
Surprisingly, given just these three properties, we can conclude that returns obey a “product rule” similar to the product rule in probability theory.
w[R(A,B)]=w[R(A)]w[R(B|A)]where w is some transformation of returns (e.g., it could be log-return, return-squared, etc.)
This is essentially the first step in Cox’s Theorem, a theorem used (most notably by Jaynes) to ground the logicalist interpretation of probability. But as this post will illustrate, core ideas of Cox’s Theorem apply to many real-world systems which we don’t usually think of as “probability theory”.
Let’s unpack those assumptions a bit more for our investment return example by defining explicit variables on projects and returns. The three key properties are:
The third rule implies that F is associative. The key idea we derive here is that all one-dimensional, increasing and associative functions are either multiplication or some transformation of multiplication (e.g., addition/subtraction is log-transformation of multiplication).
Thus we get a product rule:
w[R(A,B)]=w[R(A)]w[R(B|A)]where w is some transformation (reversible) of R.
More generally, to derive the product rule, we need some objects of interest like A, B, C,..., which serves as input. We also need some kind of real-valued measurement R of those objects. Then the core requirements for the product rule are:
- R(A,B) is a function of R(A) and R(B|A):
R(A,B)=F[R(A),R(B|A)]for some F.
If
R(A′)>R(A),R(B|A′)=R(B|A),then
R(A′,B)>R(A,B).Or, alternatively, if
R(B′|A)>R(B|A),R(A)=R(A),
then
R(A,B′)>R(A,B).R(A,B,C)=F[R(A,B),R(C|A,B)]=F[R(A),R(B,C|A)]
(Note that for the last assumption, we allow systems in which objects need to be kept in the same order - i.e., A before B before C. This is actually more general than the requirement for the product rule in probability theory, in which the objects are boolean logic variables, so “A and B” = “B and A”. If reordering is allowed, then our generalized-product-rule becomes generalized-Bayes-rule.)
The third assumption implies that F is associative. The second implies that it’s increasing. The first implies that it’s one-dimensional. So, we get the generalized-product-rule.
What does this look like in the context of other real-world systems?
Example 1: Suppose I have an investment portfolio with stock A and bond B, and I want to calculate the standard deviation of portfolio return R(A,B) as a proxy for risk measurement. This calculation is not trivial due to potential correlation of returns between stocks and bonds. For instance, the risk (measured in standard deviation) of investing in stocks alone is usually higher than the risk of investing in a portfolio with stocks and bonds. Let’s assume the risks exhibit three properties:
As a result, we can apply the product rule to investment risks:
w[R(A,B)]=w[R(A)]w[R(B|A)],where w is some transformation of incremental risk (e.g., exponentiated assuming that those incremental risks add).
Example 2: Let’s look at a different system in which we’re interested in calculating the contribution in points made by basketball players A, B, C,... in a game relative to total points made by the team. For instance, R(A) could be 30%, meaning Stephen Curry contributes 30% of the total team points in a game, R(B) could be 25%, meaning Klay Thompson contributes 25% of the total points made, etc. Again, we assume three properties:
Thus, we can have the product rule applied to basketball players’ shooting percentage:
w[R(A,B)]=w[R(A)]w[R(B|A)]where w is some transformation of player contribution.
Example 3: Let’s consider a modified version of the classic traveling salesman problem in theoretical computer science and operations research. We’re interested in finding the shortest travel time from an origin to cities A, B, C, …. Presumably the shortest travel time satisfy three assumptions:
With these three assumption above, we could apply the generalized product rule to the shortest travel time problem:
w[R(A,B)]=w[R(A)]w[R(B|A)],where w is some transformation (reversible) of shortest travel time (e.g., exponentiated shortest travel time).
Summary
The product rule in probability, p(AB)=p(A)p(B|A), states that the probability p(AB) of both A and B are true can be calculated by using probability p(A) of A being true alone and probability p(B|A) of B being true given A is true. The conditions of the product rule suggest possible avenues to extend the traditional product rule to deal with things that are not restricted to logical boolean type. In particular, this post suggests continuing to use the product rule to represent real-valued measurements of objects A, B, C,... that satisfy a few fairly reasonable properties and proposes a generalized form of the product rule w[R(A,B)]=w[R(A)]w[R(B|A)]. R is some kind of real-number measurement and w is some transformation of R. For instance, in the company investment project example we have w[R(A,B)]=w[R(A)]w[R(B|A)] where R represents the project return and w can be log return.