Priors as Mathematical Objects
Followup to: "Inductive Bias"
What exactly is a "prior", as a mathematical object? Suppose you're looking at an urn filled with red and white balls. When you draw the very first ball, you haven't yet had a chance to gather much evidence, so you start out with a rather vague and fuzzy expectation of what might happen - you might say "fifty/fifty, even odds" for the chance of getting a red or white ball. But you're ready to revise that estimate for future balls as soon as you've drawn a few samples. So then this initial probability estimate, 0.5, is not repeat not a "prior".
An introduction to Bayes's Rule for confused students might refer to the population frequency of breast cancer as the "prior probability of breast cancer", and the revised probability after a mammography as the "posterior probability". But in the scriptures of Deep Bayesianism, such as Probability Theory: The Logic of Science, one finds a quite different concept - that of prior information, which includes e.g. our beliefs about the sensitivity and specificity of mammography exams. Our belief about the population frequency of breast cancer is only one small element of our prior information.
"Inductive Bias"
(Part two in a series on "statistical bias", "inductive bias", and "cognitive bias".)
Suppose that you see a swan for the first time, and it is white. It does not follow logically that the next swan you see must be white, but white seems like a better guess than any other color. A machine learning algorithm of the more rigid sort, if it sees a single white swan, may thereafter predict that any swan seen will be white. But this, of course, does not follow logically - though AIs of this sort are often misnamed "logical". For a purely logical reasoner to label the next swan white as a deductive conclusion, it would need an additional assumption: "All swans are the same color." This is a wonderful assumption to make if all swans are, in reality, the same color; otherwise, not so good. Tom Mitchell's Machine Learning defines the inductive bias of a machine learning algorithm as the assumptions that must be added to the observed data to transform the algorithm's outputs into logical deductions.
A more general view of inductive bias would identify it with a Bayesian's prior over sequences of observations...
Useful Statistical Biases
Friday's post on statistical bias and the bias-variance decomposition discussed how the squared error of an estimator equals the directional error of the estimator plus the variance of the estimator. All else being equal, bias is bad - you want to get rid of it. But all else is not always equal. Sometimes, by accepting a small amount of bias in your estimator, you can eliminate a large amount of variance. This is known as the "bias-variance tradeoff".
"Statistical Bias"
(Part one in a series on "statistical bias", "inductive bias", and "cognitive bias".)
"Bias" as used in the field of statistics refers to directional error in an estimator. Statistical bias is error you cannot correct by repeating the experiment many times and averaging together the results.
The famous bias-variance decomposition states that the expected squared error is equal to the squared directional error, or bias, plus the squared random error, or variance. The law of large numbers says that you can reduce variance, not bias, by repeating the experiment many times and averaging the results.
You Are Not Hiring the Top 1%
Today's statistical fallacy (slightly redacted by editor) comes from Joel on Software:
Everyone thinks they're hiring the top 1%. Martin Fowler said, "We are still working hard to hire only the very top fraction of software developers (the target is around the top 0.5 to 1%)." I hear this from almost every software company. "We hire the top 1% or less," they all say. Could they all be hiring the top 1%? Where are all the other 99%? General Motors?
When you get 200 resumes, and hire the best person, does that mean you're hiring the top 0.5%? Think about what happens to the other 199 that you didn't hire. They go look for another job.
View more: Prev
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)