Another PT:LoS question. In Chapter 8 ("Sufficiency, Ancillarity and all that"), there's a section Fisher information. I'm very interested in understanding it, because the concept has come up in improtant places in my statistics classes, without any conceptual discussion of it - it's in the Cramer-Rao bound and the Jeffreys prior, but it looks so arbitrary to me.
Jaynes's explanation of it as a difference in the information different parameter values give you about large samples is really interesting, but there's one step of the math that I just can't follow. He does what looks like a second-order taylor approximation of log p(x|theta), but there's no first-order term and the second-order term is negative for some reason?! What happened there?
but there's no first-order term and the second-order term is negative for some reason?! What happened there?
There's no first-order term because you are expanding around a maximum of the log posterior density. Similarly, the second-order term is negative (well, negative definite) precisely because the posterior density falls off away from the mode. What's happening in rough terms is that each additional piece of data has, in expectation, the effect of making the log posterior curve down more sharply (around the true value of the parameter) by the amount ...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.