You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Qiaochu_Yuan comments on Open thread, January 25- February 1 - Less Wrong Discussion

8 Post author: NancyLebovitz 25 January 2014 02:52PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (316)

You are viewing a single comment's thread. Show more comments above.

Comment author: Qiaochu_Yuan 27 January 2014 07:33:13PM *  4 points [-]

I don't think that's really what means are. That intuition might fit the median better. One reason means are nice is that they have really nice properties, e.g. they're linear under addition of random variables. That makes them particularly easy to compute with and/or prove theorems about. Another reason means are nice is related to betting and the interpretation of a mean as an expected value; the theorem justifying this interpretation is the law of large numbers.

Nevertheless in many situations the mean of a random variable is a very bad description of it (e.g. mean income is a terrible description of the income distribution and median would be much more appropriate).

Edit: On the other hand, here's one very undesirable property of means: they're not "covariant under increasing changes of coordinates," which on the other hand is true of medians. What I mean is the following: suppose you decide to compute the mean population of all cities in the US, but later decide this is a bad idea because there are some really big cities. If you suspect that city populations grow multiplicatively rather than additively (e.g. the presence of good thing X causes a city to be 1.2x bigger than it otherwise would, as opposed to 200 people bigger), you might decide that instead of looking at population you should look at log population. But the mean of log population is not the log of mean population!

On the other hand, because log is an increasing function, the median of log population is still the log of median population. So taking medians is in some sense insensitive to these sorts of decisions, which is nice.