Intuitive explanation of why entropy maximizes in a uniform distribution?

roland

What is the best mathematical, intuitive explanation of why entropy maximizes in a uniform distribution? I'm looking for a short proof using the most elementary mathematics possible.

Please no explanation like "because entropy was designed in this way", etc...

https://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Definition

What is the best mathematical, intuitive explanation of why entropy maximizes in a uniform distribution? I'm looking for a short proof using the most elementary mathematics possible.

Please no explanation like "because entropy was designed in this way", etc...

https://en.wikipedia.org/wiki/Entropy_%28information_theory%29#Definition

Imagine you have two unknown bits, generated uniformly at random. What's the fastest way to learn them by asking yes or no questions?

1) Ask whether the first bit is 0, then do the same for the second bit. This way you always spend two questions.

2) First ask whether the bits are 00. If yes, you win. Otherwise ask the remaining questions in any way you like. This works out to 9/4 questions on average, which is worse than the first method.

Moral of the story: if you want to learn information as fast as possible, you must split the search space in equal parts. That's the same as saying that a uniform distribution has maximum self-information, i.e. maximum entropy.