Here is an intuitively compelling principle: hearing a bad argument for a view shouldn’t change your degree of belief in the view. After all, it is possible for bad arguments to be offered for anything, even the truth. For all you know, plenty of good arguments exist, and you just happened to hear a bad one.
But this intuitive principle is wrong. If you thought there was a reasonable chance you might hear a good argument but you end up hearing a bad one, that provides some evidence against the view.
Imagine I am pretty convinced that octopus and cuttlefish developed their complex nervous systems independently, and that their last common ancestor was not at all intelligent. Let’s say my p(intelligent ancestor) is 0.1. Imagine I have a friend, Richard, who disagrees. Richard is generally a smart and reasonable person. Prior to hearing what he has to say, I think there is a moderately high chance that he will give a good argument that the last common ancestor of octopus and cuttlefish was highly intelligent. But Richard’s argument is totally unconvincing; the evidence he cites is irrelevant. Should my p(intelligent ancestor) now be:
(A) 0.1
(B) < 0.1
The correct answer is (B). To explain why requires some simple probability math. The law of total probability holds that the probability of a proposition is equal to the average of the conditional probabilities given each possible observation, weighted by the probability of making the observation. In symbols, it is:
Where the probability Richard gives you good evidence is p(e), the probability he doesn’t is p(¬e), the probability that the last common ancestor of octopus and cuttlefish was intelligent is p(h) (and the | symbol means “given”, so p(h|e) is the probability of h given that Richard gives you good evidence). The law of total probability can be straightforwardly derived from the Kolmogorov axioms and the definition of conditional probability. This brings us to another closely related theorem, the law of total expectation:
Where Y is the set of all possible observations (here, getting good evidence or getting bad evidence from Richard), and the fancy E means expectation or mean. In its application to Bayesian updating, the law of total expectation implies that the prior is equal to the expectation of the posterior. This is why the principle of “conservation of expected evidence” holds—and it is why Richard’s failure to give a good argument lowers p(intelligent ancestor). If Richard had given a good argument, I would have increased my degree of belief in h: p(h|e) > p(h). p(h) is a weighted average of p(h|e) and p(h|¬e). When you average a list of numbers and one is higher than the average, the average of the others must be lower. So if p(h|e) > p(h), then p(h|¬e) < p(h).
So whenever you hear a bad argument when you previously thought you might hear a good one, you should conclude the view is less likely to be true than you had previously thought. The update should be proportional to how likely you previously thought it was that you would hear a good argument. The more weight is given to the p(h|e) term, the smaller the p(h|¬e) term has to be for the weighted average to equal p(h). If Richard is the world expert on cephalopod evolution, his failure to give a good argument would be more informative than if he is a layman known to be bad at arguing for his views.
Lots of people say that Bayesian formalism adds nothing to informal arguments. But I don’t agree. I think that it is both non-obvious and extremely important that if p(h|e) > p(h), then p(h|¬e) < p(h).[1]
- ^
Thanks to Pablo Stafforini and Karthik Tadepalli. Pablo explained this point to me, and Karthik reviewed an earlier draft.
No wait, the order of these two things matters. Is P(intelligent ancestor|just my background information) = 0.1 or is P(intelligent ancestor|my background information + the fact that Richard disagrees) = 0.1? I agree that if the latter holds, conservation of expected evidence comes into play and gives the conclusion you assert. But the former doesn't imply the latter.
I agree that the order matters, and I should have discussed that in the post, but I think the conclusion will hold either way. In the case where P(intelligent ancestor|just my background information) = 0.1, and I learn that Richard disagrees, the probability then goes above 0.1. But then when I learn that Richard's argument is bad it goes back down. And I think it should still go below 0.1, assuming you antecedently knew that there were some smart people who disagreed. You've learned that, for at least some smart intelligent ancestor believers, the arguments were worse than you expected.