You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

witzvo comments on Open Thread, September 30 - October 6, 2013 - Less Wrong Discussion

4 Post author: Coscott 30 September 2013 05:18AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (295)

You are viewing a single comment's thread. Show more comments above.

Comment author: witzvo 05 October 2013 09:51:44PM *  1 point [-]

but there's no first-order term and the second-order term is negative for some reason?! What happened there?

There's no first-order term because you are expanding around a maximum of the log posterior density. Similarly, the second-order term is negative (well, negative definite) precisely because the posterior density falls off away from the mode. What's happening in rough terms is that each additional piece of data has, in expectation, the effect of making the log posterior curve down more sharply (around the true value of the parameter) by the amount of one copy of the Fisher information matrix (this is all assuming the model is true, etc.). You might also be interested in the concept of "observed information," which represents the negative of the Hessian of the (actual not expected) log-likelihood around the mode.

Comment author: alex_zag_al 07 October 2013 03:03:42AM *  0 points [-]

ah, thank you! It makes me so happy to finally see why that first term disappears.

But now I don't see why you subtract the second-order terms.

I mean, I do see that since you're at a maximum, the value of the function has to decrease as you move away from it.

But, in the single-parameter case, Jaynes's formula becomes

But that second derivative there is negative. And since we're subtracting it, the function is growing as we move away from the minimum!

Comment author: witzvo 07 October 2013 05:00:19AM 1 point [-]

Yes, that formula doesn't make sense (you forgot the 1/2, by the way). I believe 8.52/8.53 should not have a minus there and 8.54 should have a minus that it's missing. Also 8.52 should have expected values or big-O probability notation. This is a frequentist calculation so I'd suggest a more standard reference like Ferguson