Vladimir_Nesov comments on What's In A Name? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (132)
That is all quite fascinating, in a "fancy that!" fashion, but whenever I see correlational data reported I wonder about the magnitude of the effect, and a measure of that magnitude in terms of bits of information. The first result they report is that if there were no influence between name and state of residence, the proportion of coincidences would be 0.1664, while the observed level is 0.1986. How large an influence does this represent?
I am not quite sure what the correct calculation to make is -- perhaps someone more versed in these matters can say -- but when I calculate the Kullback-Leibler divergence between two binary distributions, one with p=0.1664 and the other with p=0.1986, I get about 0.005 bits. When I estimate the mutual information between name and state, making various assumptions about the data I'd need for a precise calculation, I get a similar figure.
In short, if you want to predict someone's name from their state, or vice versa, the result is completely useless. Of course, making such a prediction was not the authors' purpose. But then, what was? What can you do with less than a hundredth of a bit?
How justifiable is it to report the finding in these words (quotes from the paper):
and
I have just found where Andrew Gelman has blogged about this (search his blog for "Pelham"). I don't have time to read what he says at the moment, but his headlines indicate he doesn't rate it.
Blog posts by Andrew Gelman:
Why it's not so weird that so many dentists are named Dennis: a story of conditional probability
How many people choose careers based on their names?
Is there a reason NOT to link to the posts directly and have the readers repeat the search?