jsteinhardt comments on Unknown unknowns - Less Wrong

11 Post author: ciphergoth 05 August 2011 12:55PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 06 August 2011 09:53:08AM *  2 points [-]

I don't think entropy quite works that way. For notational convenience, let Q(p) denote the entropy of p. Then just because Q(p) > Q(q), does not mean that q is strictly more informative than p. In other words, it is not the case that there is some total ordering on distributions, such that for any p,q with Q(p) > Q(q), I can get from p to q with Q(p)-Q(q) bits of information. The closest statement you can make would be in terms of KL divergence, but it is important to note that both KL(p||q) and KL(q||p) are positive, so KL is providing a distance, not an ordering.

Also note that entropy does not in fact decrease with more information. It decreases in expectation, and even then only relative to the subjective belief distribution. But this isn't even a particularly special property. Jensen's inequality together with conservation of expected evidence implies that, instead of Q(p) = E[-log(p(x))], we could have taken any concave function Q over the space of probability distributions, which would include functions of the form Q(p) = E[f(p(x))] as long as 2f'(z)+zf''(z) <= 0 for all z.

[Proof of the statement about Jensen: Let p2 be the distribution we get from p after updating. Then E[f(p2) | p] <= f(E[p2 | p]) = f(p), where <= is Jensen applied to f and E[p2 | p] = p by conservation of expected evidence.]

EDIT: For the interested reader, this is also strongly related to Doob's martingale convergence theorem, as your beliefs are a martingale and any concave function of them is a supermartingale.