someonewrongonthenet comments on Stupid Questions December 2014 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (341)
It seems like we suck at using scales "from one to ten". Video game reviews nearly always give a 7-10 rating. Competitions with scores from judges seem to always give numbers between eight and ten, unless you crash or fall, and get a five or six. If I tell someone my mood is a 5/10, they seem to think I'm having a bad day. That is, we seem to compress things into the last few numbers of the scale. Does anybody know why this happens? Possible explanations that come to mind include:
People are scoring with reference to the high end, where "nothing is wrong", and they do not want to label things as more than two or three points worse than perfect
People are thinking in terms of grades, where 75% is a C. People think most things are not worse than a C grade (or maybe this is just another example of the pattern I'm seeing)
I'm succumbing to confirmation bias and this isn't a real pattern
That's not an explanation, just a symptom of the problem. People of mediocre talent and high talent both get A - that's part of the reason why we have to use standardized tests with a higher ceiling.
My intuition is that the top few notches are satisficing, whereas all lower ratings are varying degrees of non-satisficing. The degree to which everything tends to cluster at the top represents the degree to which everything is satisfactory for practical purposes. In situations where the majority of the rated things are not satisfactory (like the Putnam - nothing less than a correct proof is truly satisfactory), the ratings will cluster near the bottom.
For example, compare motels to hotels. Motels always have fewer stars, because motels in general are worse. Whereas, say, video games will tend to cluster at the top because video games in general are satisfactorily fun.
Or, think Humanities vs. Engineering grades. Humanities students in general satisfy the requirements to be historians and writers or liberal-arts-educated-white-collar workers more than Engineering students satisfy the requirements to be engineers.
This is what I was trying to convey when I said it might be another example of the problem.
I think it's reasonable, in many contexts, to say that achieving 75% of the highest possible score on an exam should earn you what most people think of as a C grade (that is, good enough to proceed with the next part of your education, but not good enough to be competitive).
I would say that games are different. There is not, as far as I know, a quantitative rubric for scoring a game. A 6/10 rating on a game does not indicate that the game meets 60% of the requirements for a perfect game. It really just means that it's similar in quality to other games that have received the same score, and usually a 6/10 game is pretty lousy. I found a histogram of scores on metacritic:
http://www.giantbomb.com/profile/dry_carton/blog/metacritic-score-distribution-graphs/82409/
The peak of the distributions seems to be around 80%, while I'd eyeball the median to be around 70-75%. There is a long tail of bad games. You may be right that this distribution does, in some sense, reflect the actual distribution of game quality. My complaint is that this scoring system is good at resolving bad games from truly awful games from comically terrible games, but it is bad at resolving a good game from a mediocre game.
What I think it should be is a percentile-based score, like Lumifer describes:
Then again, maybe it's difficult to discern a difference in quality between a 60th percentile game and an 80th percentile game.
Oh right, I didn't read carefully sorry.