You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

PhilGoetz comments on Open thread, Mar. 9 - Mar. 15, 2015 - Less Wrong Discussion

5 Post author: MrMind 09 March 2015 07:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (109)

You are viewing a single comment's thread. Show more comments above.

Comment author: PhilGoetz 11 March 2015 11:22:25PM *  2 points [-]

I think I was wrong to say that 1 bit evidence = likelihood multiplier of 2.

IF you have a signal S, and P(x|S) = 1 while P(x|~S) = .5, then the likelihood multiplier is 2 and you get 1 bit of information, as computed by KL-divergence. That signal did in fact require an infinite amount of evidence to make P(x|S) = 1, I think, so it's a theoretical signal found only in math problems, like a frictionless surface in physics.

If you have a signal S, and P(x|S) = .5 while P(x|~S) = .25, then the likelihood multiplier is 2, but you get only .2075 bits of information.

There's a discussion of a similar question on stats.stackexchange.com . It appears that the sum, over a series of observations x, of

log(likelihood ratio = P(x | model 2) / P(x | model 1))

approximates the information gain from changing from model 1 to model 2, but not on a term-by-term basis. The approximation relies on the frequency of the observations in the entire observation series being drawn from a distribution close to model 2.