wnoise comments on Open Thread: March 2010, part 2 - Less Wrong

4 Post author: RobinZ 11 March 2010 05:25PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (334)

You are viewing a single comment's thread. Show more comments above.

Comment author: wnoise 16 March 2010 09:16:55PM *  1 point [-]

Doing some simulations on a similar problem (1-d, with p(x) = x), I'm getting results indicating that this isn't working well at all. Reducing the entropy by means of having large numbers in one bin seems to override the reduction in entropy by having the probabilities be more skewed, at least for this case. I am surprised, and a bit perplexed.

EDIT: I was hoping that a better measure, such as the mutual information I(x; y) between bin (x) and probability of being the same function (y) would work. But this boils down to the measure I suggested last time: I(x; y) = H(y) - H(y|x). H(y) is fixed just by the distribution of same vs. not. H(Y|X) = - sum x p(x) int p(y|x) log p(y|x) dy, and so maximizing the mutual information is the same as minimizing the measure I suggested last time.

Comment author: PhilGoetz 21 March 2010 08:43:19PM 0 points [-]

What's the similar problem? "1-d, with p(x) = x)" doesn't mean much to me. It sounds like you're looking for bins on the region 1 to d, with p(x) = x. I think that if you used the plogp -qlogq entropy, it would work fine.

Comment author: wnoise 22 March 2010 12:29:48AM 0 points [-]

"1-d", meaning one-dimensional. n bins between 0 and 1, samples drawn uniformly in the space X = [0,1], with probability p(x) = x of being considered the same.