If we observe X = x for some fixed x, we get exactly -log_2 p(x) bits of information, and entropy is the average of this information gain taken over the distribution p(x). For example, suppose X ~ Bern(0.5). Then we have a 50% chance of getting a 0, and thus gaining -log2(0.5) = 1 bit of information, and a 50% chance of getting a 1, and thus gaining -log2(0.5) = 1 bits of information, meaning we will necessarily gain 1 bit of information upon finding the... (read more)

1

0

LESSWRONG
LW

LESSWRONG
LW

Nina Katz-Christy

Nina Katz-Christy

Nina Katz-Christy

Nina Katz-Christy

Nina Katz-Christy

Nina Katz-Christy