I'd like to take Kevin's $0.02 in the coin-flipping word search.
First, I'll buy a prediction contract that I will flip Heads. This will cost $0.50 for a $1 payout.
Second, I'll buy the right to a futures contract: After the word is revealed and his search is complete, I will be given a prediction contract which pays $1 if Tails is revealed. If his expected posterior for Heads is 0.52 then the futures contract would have a value of $0.48.
In aggregate, I've paid $0.98 for a guaranteed $1.00 return.
I think your argument is quite effective.
He may claim he is not willing to sell you this futures contract for $0.48 now. He expects to be willing to sell for that price in the future on average, but might refuse to do so now.
But then, why? Why would you not sell something for $0.49 now if you think, on average, it'll be worth less than that (to you) right after?
Epistemic status: trying hard to explain, not persuade. Part of me wants to fight the good fight and protect Bayesian orthodoxy against a corrupting heresy.
MIT Professor Kevin Dorst (LessWrong username: kevin-dorst) has a new argument that, contrary to standard Bayesianism, it can be rational to predict in which directions your beliefs will change.
He explicitly argues against the Conservation of Expected Evidence, and applies this argument to a variety of situations, particularly to political polarization.
His new paper has been generating attention, including from renowed figures such as Steven Pinker, and Dorst has been invited on podcasts to defend his position. Dorst also recently made a frontpage post here on this community, named Polarization is Not (Standard) Bayesian, stating similar claims. These claims deserve to be addressed.
Reflection Violations
Here is one of his examples (from the paper):
Let s be a politically-coded belief, e.g. that guns increase safety. You are currently unsure about s, say 50-50, but if you also know that you're going to study in a liberal college. Students going to liberal colleges are likely to leave with more liberal opinions, and you don't think you are special regarding that effect. He argues that it can be rational to both hold your current belief regarding s, and to expect your future belief on s to move in the liberal direction.
He calls situations like that reflection violations. Here Dorst gives us another example:
In the LessWrong article, he gives a different example of reflection violations (also called martingale violations), arguing that even if you dont know in which direction your beliefs are likely to move (e.g. in the liberal or conservative direction), there can still be a reflection violation on the correlation of beliefs:
(Supposed) Mechanism
Dorst argues that such reflection violations can derive from rational Bayesian updating. In his theory, this happens whenever beliefs need to be updated based what he calls ambiguous evidence.
He thankfully gives a low-level example of a setting in which his "ambiguous evidence" could appear:
He argues there is an assymetry between finding a word completion and not finding it, that updating based on the former is much easier than on the later, and that this leads to ambiguity.
Here he argues that this assymetry can generate a reflection violation:
Dorst's Objection to the Standard Response, and the Bayesian Refutation
Dorst correctly identifies the standard reply:
He proceeds to calculations showing that, under this standard approach, which he calls a "model", no reflection violation occurs. He then objects to the correctness of such "model".
This objection unfortunately confuses many different concepts. He first confuses a prior over the joint distribution (Word, Found) with a model of how these variables interact. Having a prior over the joint distribution is not the same as having a "model that is always correct".
Quite the opposite, a good Bayeasian prior is supposed to reflect the uncertainty regarding which models are correct; the prior distribution is what happens when, in theory, you marginalize over all possible models, or in practice, when you use an approximation for that. Nowhere a Bayesian has to place absolute trust in the correctness of any of the models, contrary to what has been implied by Dorst.
Rather, the common intuition of not placing absolutely trust in "models" is used to the effect of arguing that one should have no prior at all over joint distributions of possible evidence. This despite his concession that one can have priors over the simpler variables we may be ultimately interested in.
Of course, priors over joint distributions are nothing but a collection of current beliefs over each possibility in the product space. If you have a current belief for (Word & Found), another for (Word & ~Found), another for (~Word & Found) (maybe close to zero if your search generates little errors), and finally one for (~Word & ~Found), then these beliefs constitute the prior you need. He himself should concede this is possible, as he doesn't deny the existance of current beliefs; but once you accept having a more complex prior over the joint distribution (including the possible evidence), the evidence is no longer ambiguous, and any supposed reflection violation becomes impossible.
Dorst continues with an intuition:
Of course, the intuition for "betting the farm" is not a good framing for thinking about probabilities. Standard risk-averseness means you probably shouldn't "bet the farm" either way! It may be particularly painful to "lose a bet" due to "missing out" on an easy word, or one you might think with benefit of hindsight should have been "easy" to spot, but such assymetry of feelings shouldn't interfere with our careful analysis of probabilities.
Moving forward, the expression “Oh! I should’ve seen that...”. can mean any of the following:
More
Ambiguity?Confusion?In the same example, which serves as the main illustration for his crucial mechanism, Dorst continues arguing against the Standard Bayesian interpretation by appealing to additional complexity in the evidence, which in his opinion exemplifies the notion of "ambiguity".
Here he adds an additional variable: besides there being a completion (Word) and Haley finding it or not (Find), he adds a further evidence-variable (Word-like) denoting whether Haley finds the string to have "subtle hints that it's completable".
The example is complicated further by adding that Haley is unsure of whether she observes word-likeness or not. The careful reader starts to feel some force acting to obscure any concrete consideration of what, if anything, is being observed by Haley. We sure can't have a proper conversation about how to update beliefs based on observations when there is no agreement on what the observations are.
He details his model further (line breaks added by myself for clarity):
At this point a Bayesian can speculate the entire concept of "ambiguous evidence" is rooted in deep confusion.
Rather than talking about whether the clues being "word-like" or not, a concept indistinguishable from the word existing (and therefore useless), and Hailey being "uncertain" of that, we should rather talk about whether Hailey observes "wordlikeness" or not (a concrete observation, even if completely subjective). We could even make it more complicated by having the "wordlikeness" be continuous. But in any case we'd have a clear understanding, at least under this model, of what Hailey's observation is.
With this powerful mental concept, we can now ask about the Hailey's prior distribution, not only of the possible observations, but also of the joint distribution of observations AND results (in this case of whether a word exists or not).
Using such concepts, the "ambiguous evidence" concept dissolves itself: ambiguity has to reside on the priors, or on the current beliefs, rather than on future evidence or on how to best interpret it. Given a detailed enough representation of the current beliefs, there can be no justification for predictable polarization or reflection violations of any sort.
We can hypothetise that ambiguity on how to interpret new evidence or update beliefs based on them come from ambiguity on what the observations are, from inconsistent or undetermined previous beliefs, or from errors in aproximation and computation.
Conclusion
This is understandable, and may be a mechanism for predictable polarization in practice. After all, it is very hard to keep a consistent set of current beliefs about all possible evidence you may be subject for, much less for the joint distribution of evidence of things you care about. It is also very hard to update this exponentially large space of beliefs according to the laws of probability; It is inevitable that approximations will be used both to represent both the current beliefs and the updates.
It therefore should not be surprising if, in practice, updates to human beliefs end up violating martingale properties. But rather than accepting such violations as "rational", one should work to recognize them for what they are: evidence that at some point, present or future, one's rationality is falling short.