Only Fools Avoid Hindsight Bias

Kevin Dorst

TLDR: You’re unsure about something. Then it happens—and you think to yourself, “I kinda expected that.” Such hindsight bias is commonly derided as irrational. But any Bayesian who (1) is unsure of exactly what they think, and (2) trusts their own judgment should exhibit hindsight bias.

Biden is old.

Ever since the disastrous debate, there’s been a flurry of discussion about whether he’s too old to campaign effectively.

Many pundits are saying that they’ve been worried about this for a long time. And—although some of them have—this often looks like hindsight bias.

Hindsight bias—aka the “I knew it all along” effect—is the the finding that when people learn that an uncertain event happened, they increase their estimate for how much they expected it.

In this case, the uncertain claim is Biden is too old to make the case against Trump. Arguably, we’ve learned that it (or something near enough) is true. Given this, it’s natural for pundits to over-emphasize how worried they were, under-emphasize their uncertainty, and end up thinking that they saw this coming.

At the same time, it’s equally natural to look at the pundits, and scoff: “That’s just hindsight bias! They’re trying to make themselves look smarter than they are.”

But that’s wrong. Unless they're fools, they should commit hindsight bias.^[1]

The finding

Hindsight bias is one of those empirical findings that’s both robust, and easy to see in yourself. We should’t worry about replication failures.

There are many variations, but here’s a classic study design:

Stage 1: present people with a series of scenarios, and ask them to predict how likely they think an uncertain outcome is in each case.
Stage 2: Distract them for a while, or let them go about their day.
Stage 3: Inform them of the true outcome in each scenario, and ask them to recall how likely they originally thought it was.

The typical finding: people exhibit a hindsight shift. On average, their Stage-3 estimates for what their Stage-1 probabilities were are higher than their Stage-1 probabilities.

In our case: at Stage 1, the average person might say they think it’s 60%-likely that Biden is too old. Then, at stage 3, they’ll say that they originally thought it was 70%-likely that he’s too old. That 70 – 60 = 10% gap is the hindsight shift.

For a concrete example, here are the results from Hoffrage et al. 2000—the relevant condition is the “Feedback” condition, with positive values indicating a positive hindsight shift:

Why it’s rational, intuitively

I say that reasonable people should commit hindsight bias—at least in most real-world and experimentally-studied conditions. Slightly more carefully: under conditions of ambiguity—when its unclear exactly what your prior opinions are—hindsight shifts are rational.

Let’s start with the intuitive argument.

First, notice that when you’re predicting something like whether Biden’s too old to campaign effectively, it makes sense to be unsure of exactly how confident you are.

To see this, contrast it with cases where it’s completely clear how confident you are: you know you’re (exactly) 50%-confident that this fair coin will land heads; you know that you’re (exactly) 25%-confident that the first card drawn from this deck will be a spade, etc.

Predicting (even a suitably sharpened^[2]) version of whether Biden’s too old is not like that. If forced, you’ll name a number—say, 75%. But you’re not completely confident that this number accurately captures your opinion. Maybe, in fact, you really think it’s slightly more likely that Biden’s too old than that a coin won’t land heads twice in a row (in which case you’re more than 75%); maybe you think it’s slightly less likely (in which case you’re less than 75%). You shouldn’t be sure.

Second, consider what happens in the third-person case, when we’re unsure how confident someone else is of something.

Forget Biden—let’s talk about something important.

Do I like broccoli? I’m guessing you’re not sure. Now—I know this’ll be difficult—but what’s your estimate for how likely my brother thinks it is that I like broccoli? Hard to say, but pick a number. Say, 60%.

Now I’ll tell you a secret: I do like broccoli.

Given your new knowledge, what’s your updated estimate for how likely my brother thought it was that I like broccoli? Obviously, your estimate should go up! (Say, to 80% or 90%.) For now you should think: “Well, there was always a good chance Kevin’s brother knew whether he likes broccoli. Now that I know that Kevin does like broccoli, there’s a good chance his brother was confident of that.”

More generally: when you think a quantity, X (e.g. my brother’s subjective probability that I like broccoli) is correlated with the truth-value of a claim e (e.g. that I like broccoli), then learning e should raise your estimate of X.

Here’s the kicker: the same is true when X is your prior probability for e.

As we’ve said: you should be unsure what exactly your prior is that Biden is too old. And so long as you trust your judgment, you should think whatever your true prior is, it’s correlated with the truth.

Upshot: if you learn that Biden is too old to campaign effectively, that should increase your estimate for how likely you thought this was to begin with. You should exhibit a hindsight shift.

Let’s make the argument more precise.

Why it’s rational, precisely

Hindsight bias captures how your estimate of a fixed quantity—the probability you assigned to e at the initial time—changes over time. When people are asked how confident they are, they give one number. When they’re later asked how confident they were, they give a higher number.

In general, for any quantity X, a Bayesian’s estimate for X is their expectation E(X)—a probability-weighted average of X’s various possible values.^[3]

For instance, if your probability distribution over how tall I am is the below bell curve, then the dashed line—the mean of the distribution—is your estimate for my height:

Prior probability distribution and estimate of my height

Your estimate for my height X after learning a proposition e is simply your expectation E(X|e) of X relative to your updated probability function P(•|e).^[4] For instance, if you learn that I wear a shoe size greater than 10, that might shift your distribution over my possible heights—and resulting estimate—like this:

Posterior probability distribution and estimate of my height

So in this case, learning e raises your estimate for my height: E(X|e) > E(X).

Now let’s translate this to hindsight bias.

Let P be your subjective probability distribution when you’re making your initial judgment. Let e be the target claim of interest—say, that Biden is too old to campaign effectively. The quantity of interest is your prior probability for that claim, P(e).

Note that—like my height, X—P(e) is a quantity or random variable. It’s the sort of thing that can vary between possibilities. After all, I am uncertain what your prior probability is that Biden is too old—so to model my uncertainty about your priors, we need to model your priors as a variable. Likewise if you are uncertain about your priors.

(Some of you are probably champing at the bit, thinking "higher-order probability is either trivial or nonsense". While it's true that many attempts to model higher-order probability either are incoherent or reduce to regular Bayesianism, it turns out that nontrivial higher-order probability does make conceptual and mathematical sense—we just need to use the tools from modal and epistemic logic. See e.g. this paper, this introductory paper, or the Appendix below, for mathematical details.)

Your prior estimate for your prior probability is simply E(P(e)).^[5] Your posterior estimate for your prior, after learning e, is E(P(e)|e). You commit hindsight bias if the latter is higher than the former: E(P(e)|e) > E(P(e)).

So, when should you commit hindsight bias? Exactly when the following two assumptions hold:

Prior Uncertainty: Your prior is uncertain what your prior credence in e is.
1. (Precisely: P(P(e) = a) < 1, for all numbers a.)
Prior Trust: You think your prior (whatever it is) is correlated with the truth.
1. (Precisely: your prior P(e) and e’s truth-value 1_e have positive covariance, Cov[P(e), 1_e] > 0, relative to your prior probability function P.)^[6]

(In fact, Prior Uncertainty follows from Prior Trust—I’m just making it explicit.)

Who would violate these assumption? Only fools.

The only way to violate Prior Uncertainty is to be completely certain of exactly how likely you think it is that Biden is too old. You’d have to be willing to bet your life that the number you name isn’t off by 0.0001 percentage points. Only fools would be so confident in their ability to articulate their true opinions.

The only way to violate Prior Trust is to think that your true probability is either uncorrelated or negatively correlated with the truth. This is a radical form of distrusting your own judgment. Only (people who have reason to think that they are) fools would do so.

Here’s the kicker: any Bayesian who satisfies Prior Uncertainty and Prior Trust will commit hindsight bias:

Fact: E(P(e)|e) > E(P(e)) if and only if Cov[P(e), 1_e] > 0. (Proof in Appendix.)
Informally: you should commit hindsight bias iff you trust yourself.

In other words: only fools avoid hindsight bias.

What to make of this?

Obviously this doesn’t show that any given instance of hindsight bias is rational.

In fact, it gives us clear conditions under which hindsight bias is not rational: if (1) you are certain of what your prior judgment was—for example, when you have clear evidence about the objective probabilities—or (2) you don’t trust your prior judgment, then you shouldn’t commit hindsight bias.

Indeed, there’s some empirical evidence that having clear objective probabilities—the sort of thing that should make you certain of what your prior was—reduces or eliminates hindsight bias. (See this paper, experiment 5.)

Moreover, nothing here shows that the degree to which people commit hindsight bias is rational—they may well trust their priors too much.

But the mere fact that people commit hindsight bias is not evidence that they are irrational. In fact, if they didn’t commit hindsight bias, that would be clear evidence that they’re irrational!

So let’s follow the pundits: we knew all along that Biden was too old. Probably.

Appendix

Here’s the Technical Appendix—including (1) an example probabilistic model illustrating hindsight bias, and (2) a proof of the Fact.

^{^}
The argument I’m going to give is inspired by (and a generalization of) Brian Hedden’s 2019 paper, “Hindsight bias is not a bias”.
^{^}
If you’re worried that “Biden’s too old to campaign effectively” is too vague to have a truth value, then replace it with a precise (but long-winded) alternative, like “Biden’s campaign will fail to recover it’s June 27 position in the polls by August 27”, or some such. The argument goes through either way.
^{^}
Precisely (assuming for simplicity that X has finite support) with P their probability function and X(w) the value of X at w:
^{^}
Again assuming X has finite support:
$E (X | e) := \sum x \in R P (X = a | e) \cdot a = \sum w \in W P (w | e) \cdot X (w)$
^{^}
As always, this is defined the same way:
$E (P (e)) = \sum a \in R P (P (e) = a) \cdot a$
This might turn out to equal your prior: E(P(e)) = P(e). But this doesn’t matter for our purposes; and I won’t assume anything about that. (In the context of higher-order uncertainty, that equation will often fail.)
You might notice that if P is a random variable, then E(P(e)) is too—it can vary from world to world. When we write something like P(e) or E(P(e)) unembedded, they take the values that they take at the actual world @ in the model.
^{^}
The truth-value of e, 1_e, is e’s indicator variable: it takes value 1 if e is true and 0 if it’s false. The covariance of two variables X and Y is a (non-normalized) measure of their correlation: Cov[X,Y] := E(XY) – E(X)•E(Y). In our case: Cov[P(e), 1_e] = E(P(e)•1_e) – E(P(e))•E(1_e). The familiar correlation coefficient is the normalized covariance.

[-]sunwillrise4mo84

Hindsight bias—aka the “I knew it all along” effect—is the the finding that when people learn that an uncertain event happened, they increase their estimate for how much they expected it.
[...]
You commit hindsight bias if the latter is higher than the former: E(P(e)|e) > E(P(e)).
[...]
In other words: only fools avoid hindsight bias.

These paragraphs do not seem compatible. Throughout the latter two thirds of your post, you are presenting the idea of "hindsight bias" as being about "how your estimate of a fixed quantity—the probability you assigned to e at the initial time—changes over time."

So how is it not epistemically incorrect to say that the probability you assigned to an event at some earlier date (when you did not have access to the observation e) is in fact the new, conditional probability of e given the observation? Like, yes, your estimate of a fixed quantity changes over time, and if someone asks you what your estimate was at some earlier point in time, and you instead use your current estimate (or anything that builds on information which wasn't available to you at the time), that's about as far from "rational" as you can get. The fact that this often happens subconsciously, with the person being discussed generally not even consciously realizing they are making this substitution, makes it a bias.

In 2022, I ask somebody to predict who will be the next United States president, and they tell me what their probability distribution is. Then, today, I ask them what their remember their prediction to have been, and instead of giving the correct answer (which literally means copying the probability distribution they gave earlier)^[1], they give a different answer by taking into account observations from events after 2022.

Of course this is epistemically broken and biased! This procedure, as described above, will almost always (under the two assumptions you mentioned) output a higher probability on the actual event than the forecaster did at the beginning. So this systematically overrates the accuracy of the prediction in a manner that is unduly self-serving; if you do not take this effect into account, you will not be able to properly assess whether the quality of this forecaster in the long run.

^{^}
up to a set of measure 0

[-]Kevin Dorst4mo1-1

I get where you're coming from, but where do you get off the boat? The result is a theorem of probability: if (1) you update by conditioning on e, and (2) you had positive covariance for your own opinion and the truth, then you commit hindsight bias. So to say this is irrational we need to either say that (1) you don't update by conditioning, or (2) you don't have positive covariance between your opinion and the truth. Which do you deny, and why?

The standard route is to deny (2) by implicitly assuming that you know exactly what your prior probability was, at both the prior and future time. But that's a radical idealization.

Perhaps more directly to your point: the shift only results in over-estimation if your INITIAL estimate is accurate. Remember we're eliciting (i) E(P(e)) and (ii) E(P(e) | e), not (iii) P(e) and (ii) E(P(e) | e). If (i) always equaled (iii) (you always accurately estimated what you really thought at the initial time), then yes hindsight bias would decrease the accuracy of your estimates. But in contexts where you're unsure what you think, you WON'T always accurately your prior.

In fact, that's a theorem. If P has higher-order uncertainty, then there must be some event q such that P(q) ≠ E(P(q)). See this old paper by Samet (https://www.tau.ac.il/~samet/papers/quantified.pdf), and this more recent one with a more elementary proof (https://philarchive.org/rec/DORHU).

[-]sunwillrise4mo30

I deny that "hindsight bias", as a term used in common and specialized parlance, has anything to do with (1). If you respond to the implicit question of "what did you expect at time t" with anything that involves updates from stuff after time t, you are likely committing hindsight bias.

If you are a Bayesian updater, you do change your credence in something by conditioning, as time passes. But it is precisely the action of changing the subjective probability distribution you are talking about from the old one to the new one that is epistemically incorrect, if you are focused on the question of what you had actually believed before obtaining new information (and thus doing the Bayesian update).

As I said earlier:

This procedure, as described above, will almost always (under the two assumptions you mentioned) output a higher probability on the actual event than the forecaster did at the beginning. So this systematically overrates the accuracy of the prediction in a manner that is unduly self-serving; if you do not take this effect into account, you will not be able to properly assess whether the quality of this forecaster in the long run.

[-]Richard_Kennaway4mo30

You are assuming that there is some "real" opinion you had, but have imperfect knowledge of, as if we all contain a perfect reasoner about probabilities but have only imperfect access to its conclusions. I see no reason to imagine this phantom. We gave a probability; assuming we were honest, if anything was our probability at the time, then that was it.

If we see something exceptionally surprising, we may legitimately in hindsight re-examine how we arrived at the low probability we had assigned to it. We might detect errors in our assessment back then of the evidence then available. That does not change the answer that we gave, only the probability that we now assign to the event given only the information we had then.

Discovering the right answers to a set of examination questions does not allow us to claim that we "really" knew the answers when sitting the exam.

[-]alexey3mo10

But you aren't asked about (your current estimate of your prior). If you want to put it in this way, it would be $E_{t_{1}} (E_{t_{0}} (P (e)))$ , your current estimate of your previous estimate. And you do have exact knowledge what that estimate was.

LESSWRONG
LW

-11