Unknown unknowns

Paul Crowley

Sorry if this seems incomplete - thought I'd fire this off as a discussion post now and hope to return to it with a more well-rounded post later.

Less Wrongers are used to thinking of uncertainty as best represented as a probability - or perhaps as a log odds ratio, stretching from minus infinity to infinity. But when I argue with people about for example cryonics, it appears most people consider that some possibilities simply don't appear on this scale at all: that we should not sign up for cryonics because no belief about its chances of working can be justified. Rejecting this category seems to me one of the key foundational ideas of this community, but as far as I know the only article specifically discussing it is "I don't know", which doesn't make a devastatingly strong case. What other writing discusses this idea?

I think there are two key arguments against this. First, you have to make a decision anyway, and the "no belief" uncertainty doesn't help with that. Second, "no belief" is treated as disconnected from the probability line; so at some point evidence causes a discontinuous jump from "no belief" to some level of confidence. This discontinuity seems very unnatural. How can evidence add up to a discontinuous jump - what happened to all the evidence before the jump?

Sorry if this seems incomplete - thought I'd fire this off as a discussion post now and hope to return to it with a more well-rounded post later.

I don't think entropy quite works that way. For notational convenience, let Q(p) denote the entropy of p. Then just because Q(p) > Q(q), does not mean that q is strictly more informative than p. In other words, it is not the case that there is some total ordering on distributions, such that for any p,q with Q(p) > Q(q), I can get from p to q with Q(p)-Q(q) bits of information. The closest statement you can make would be in terms of KL divergence, but it is important to note that both KL(p||q) and KL(q||p) are positive, so KL is providing a distance, not an ordering.

Also note that entropy does not in fact decrease with more information. It decreases in expectation, and even then only relative to the subjective belief distribution. But this isn't even a particularly special property. Jensen's inequality together with conservation of expected evidence implies that, instead of Q(p) = E[-log(p(x))], we could have taken any concave function Q over the space of probability distributions, which would include functions of the form Q(p) = E[f(p(x))] as long as 2f'(z)+zf''(z) <= 0 for all z.

[Proof of the statement about Jensen: Let p2 be the distribution we get from p after updating. Then E[f(p2) | p] <= f(E[p2 | p]) = f(p), where <= is Jensen applied to f and E[p2 | p] = p by conservation of expected evidence.]

EDIT: For the interested reader, this is also strongly related to Doob's martingale convergence theorem, as your beliefs are a martingale and any concave function of them is a supermartingale.

19

Unknown unknowns

19

19

19

Unknown unknowns

19

19