komponisto comments on Goals for which Less Wrong does (and doesn't) help - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (101)
It is easy for a math literate person to over-estimate how obvious certain jargon is to people. Like 'sum of squared differences' for example. Squared differences is just what is involved when you are calculating things like standard deviation. It's what you use when looking at, say, a group of people and deciding whether they all have about the same height or if some are really tall but others are really short. How different they are.
For those who have never had to manually calculate the standard deviation and similar statistics the term would just be meaningless. (Which makes your example a good demonstration of your point!)
Never mind that; just parse the damn phrase! All you need to know is what a "difference" is, and what "to square" means.
Why, I wonder, do people assume that words lose their individual meanings when combined, so that something like "squared differences" registers as "[unknown vocabulary item]" rather than "differences that have been squared"?
Because quite often sophisticated people will punish you socially if you don't take special care to pay homage to whatever extra meaning the combined phrase has taken on. Caution in such cases is a practical social move.
Good observation; I had been subliminally aware of it but nobody had ever pointed it out to me explicitly.
It's also very helpful to know things like why someone might go around squaring differences and then summing them, and what kinds of situations that makes sense in. That way you can tell when you make errors of interpretation. For example, "differences pertaining to the squared" is a plausible but less likely interpretation of "squared differences", but knowing that people commonly square differences and then sum them in order to calculate an L₂ norm, often because they are going to take the derivative of the result so as to solve for a local minimum, makes that a much less plausible interpretation.
And for a Bayesian to be rational in the colloquial sense, they must always remember to assign some substantial probability weight to "other". For example, you can't simply assume that words like "sum" and "differences" are being used with one of the meanings you're familiar with; you must remember that there's always the possibility that you're encountering a new sense of the word.