Portions of this are taken directly from Three Things I've Learned About Bayes' Rule.
One time, someone asked me what my name was. I said, “Mark Xu.” Afterward, they probably believed my name was “Mark Xu.” I’m guessing they would have happily accepted a bet at 20:1 odds that my driver’s license would say “Mark Xu” on it.
The prior odds that someone’s name is “Mark Xu” are generously 1:1,000,000. Posterior odds of 20:1 implies that the odds ratio of me saying “Mark Xu” is 20,000,000:1, or roughly 24 bits of evidence. That’s a lot of evidence.
Seeing a Wikipedia page say “X is the capital of Y” is tremendous evidence that X is the capital of Y. Someone telling you “I can juggle” is massive evidence that they can juggle. Putting an expression into Mathematica and getting Z is enormous evidence that the expression evaluates to Z. Vast odds ratios lurk behind many encounters.
One implication of the Efficient Market Hypothesis (EMH) is that is it difficult to make money on the stock market. Generously, maybe only the top 1% of traders will be profitable. How difficult is it to get into the top 1% of traders? To be 50% sure you're in the top 1%, you only need 200:1 evidence. This seemingly large odds ratio might be easy to get.
On average, people are overconfident, but 12% aren't. It only takes 50:1 evidence to conclude you are much less overconfident than average. An hour or so of calibration training and the resulting calibration plots might be enough.
Running through Bayes’ Rule explicitly might produce a bias towards middling values. Extraordinary claims require extraordinary evidence, but extraordinary evidence might be more common than you think.
Isn't that the information density for sentences? With all the conjunctions, and with the limitness of the number of different words that can appear in different places of the sentence, it's not that surprising we only get 1.1 bits per letter. But names should be more information dense - maybe not the full 4.7 (because some names just don't make sense) but at least 2 bits per letter, maybe even 3?
I don't know where to find (or how to handle) a big list of full names, so I'm settling for the (probably partial) lists of first names from https://www.galbithink.org/names/us200.htm (picked because the plaintext format is easy to process). I wrote a small script: https://gist.github.com/idanarye/fb75e5f813ddbff7d664204607c20321
When I run it on the list of female names from the 1990s I get this:
And when I run it on the list of male names from the 1990s I get this:
So the information density is about 1.3 bits per letter. Higher than 1.1, but not nearly as high as I expected. But - the rarest names in these list are about 1:14k - not 1:1m like OP's estimation. Then again - I'm only looking at given names - surnames tend to be more diverse. But that would also give them higher entropy, so instead of to figure out how to scale everything let's just go with the given names, which I have numbers for (for simplicity, assume these lists I found are complete)
So - the rare names are about half as long as the number of letters required to represent them. The frequent names are anywhere between the number of letters required to represent them and twice that amount. I guess that is to be expected - names are not optimized to be an ideal representation, after all. But my point is that the amount of evidence needed here is not orders of magnitude bigger than the amount of information you gain from hearing the name.
Actually, due to what entropy is supposed to represent, on average the amount of information needed is exactly the amount of information contained in the name.