You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gwern comments on Stupid Questions December 2014 - Less Wrong Discussion

16 Post author: Gondolinian 08 December 2014 03:39PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (341)

You are viewing a single comment's thread. Show more comments above.

Comment author: Grothor 10 December 2014 05:31:19AM 16 points [-]

It seems like we suck at using scales "from one to ten". Video game reviews nearly always give a 7-10 rating. Competitions with scores from judges seem to always give numbers between eight and ten, unless you crash or fall, and get a five or six. If I tell someone my mood is a 5/10, they seem to think I'm having a bad day. That is, we seem to compress things into the last few numbers of the scale. Does anybody know why this happens? Possible explanations that come to mind include:

  • People are scoring with reference to the high end, where "nothing is wrong", and they do not want to label things as more than two or three points worse than perfect

  • People are thinking in terms of grades, where 75% is a C. People think most things are not worse than a C grade (or maybe this is just another example of the pattern I'm seeing)

  • I'm succumbing to confirmation bias and this isn't a real pattern

Comment author: wadavis 10 December 2014 10:18:02PM 3 points [-]

I tried to change out the 10 rating for a z-score rating in my own conversations. It failed due to my social circles not being familiar with the normal bell curve.

Comment author: gwern 11 December 2014 12:00:11AM 4 points [-]

If you wanted to maximize the informational content of your ratings, wouldn't you try to mimick a uniform distribution?

Comment author: wadavis 12 December 2014 03:54:32PM 1 point [-]

The intent was to communicate one piece of information without confusion: where on the measurement spectrum the item fits relative to others in its group. As opposed to delivering as much information as possible, for which there are more nuanced systems.

Most things I am rating do not have a uniform distribution, I tried to follow a normal distribution because it would fit the greater majority of cases. We lose information and make assumptions when we measure data on the wrong distribution, did you fit to uniform by volume or by value? It was another source of confusion.

As mentioned, this method did fail. I changed my methods to saying 'better than 90% of the items in its grouping' and had moderate success. While solving the uniform/normal/Chi-squared distribution problem it is still too long winded for my tastes.

Comment author: Lumifer 12 December 2014 04:00:23PM 2 points [-]

Most things I am rating do not have a uniform distribution

The distribution of your ratings does not need to follow the distribution of what you are rating. For maximum information your (integer) rating should point to a quantile -- e.g. if you're rating on a 1-10 scale your rating should match the decile into which the thing being rated falls. And if your ratings correspond to quantiles, the ratings themselves are uniformly distributed.

Comment author: wadavis 12 December 2014 04:30:35PM 1 point [-]

We have different goals. I want to my rating to reflect the items relative position in its group, you want a rating to reflect the items value independent of the group.

Is this accurate?

Comment author: Lumifer 12 December 2014 04:56:48PM *  2 points [-]

Doesn't seem so. If you rate by quintiles your rating effectively indicates the rank of the bucket to which the thing-being-rated belongs. This reflects "the item's relative position in its group".

If you want your rating to reflect not a rank but something external, you can set up a variety of systems, but I would expect that for max information your rating would have to point a quintile of that external measure of the "value independent of the group".

Comment author: wadavis 12 December 2014 06:47:30PM 0 points [-]

Trying to stab at the heart of the issue: I want the distribution of the ratings to follow the distribution of the rated because when looking at the group this provides an additional piece of information.

Comment author: Lumifer 12 December 2014 08:31:51PM 4 points [-]

Well, at this point the issue becomes who's looking at your rating. This "additional piece of information" exists only for people who have a sufficiently large sample of your previous ratings so they understand where the latest rating fits in the overall shape of all your ratings.

Consider this example: I come up to you and ask "So, how was the movie?". You answer "I give it a 6 out of 10". Fine. I have some vague idea of what you mean. Now we wave a magic wand and bifurcate reality.

In branch 1 you then add "The distribution of my ratings follows the distribution of movie quality, savvy?" and let's say I'm sufficiently statistically savvy to understand that. But... does it help me? I don't know the distribution of movie quality. it's probably bell-shaped, maybe, but not quite normal if only because it has to be bounded, I have no idea if its skewed, etc.

In branch 2 you then add "The rating of 6 means I rate the movie to be in the sixth decile". Ah, that's much better. I now know that out of 10 movies that you've seen five were probably worse and three were probably better. That, to me, is a more useful piece of information.

Comment author: wadavis 15 December 2014 03:35:13PM 0 points [-]

I understand and concede to the better logic. This provides greater insight on why the original attempt to use these ratings failed.

Comment author: ChristianKl 12 December 2014 04:42:22PM 0 points [-]

Quite often the difference between the top 10 percent is higher than the difference of the people between 45% and 55%.

IQ scales have more people in the middle than on the edges.

Comment author: Lumifer 12 December 2014 04:59:25PM 2 points [-]

As far as I remember, IQs are normalized ranks so to answer the question which 10% is "wider" you need to define by which measure.