Oscar_Cunningham comments on A Fervent Defense of Frequentist Statistics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (125)
Suppose there are two experts, and two horses. Expert 1 always predicts horse A, expert 2 always predicts horse B, the truth is that the winning horse cycles ABABABABABA... The frequentist randomizes choice of expert according to weights; the Bayesian always chooses the expert who currently has more successes, and flips a coin when the experts are tied. (Disclaimer: I am not saying that this is the only possible strategies consistent with these philosophies, I am just saying that that these seem like the simplest possible instantiations of "when I actually choose which person to follow on a given round, I randomize according to my weights, whereas a Bayesian would always want to deterministically pick the person with the highest expected value.")
If the frequentist starts with weights (1,1), then we will have weights (1/2^k, 1/2^k) half the time and (1/2^k, 1/2^{k+1}) half the time. In the former case, we will guess A and B with equal probability and have a success rate of 50%; in the latter case, we will guess A (the most recent winner) with probability 2/3 and B with probability 1/3, for a success rate of 33%. On average, we achieve 5/12 = 41.7% success. Note that 0.583/0.5 = 1.166... < 1.39, as predicted.
On half the other horses, expert 1 has one more correct guess than expert 2, so the Bayesian will lose everyone of these bets. In addition, the Bayesian is guessing at random on the other horses, so his or her total success rate is 25%. Note that the experts are getting 50% success rates. Note that 0.75/0.5 = 1.5 < 2.41, as we expect.
As is usually the case, reducing the penalty from 1/2 to (for example) 0.9, would yield to slower convergence but better ultimate behavior. In the limit where the penalty goes to 1, the frequentist is achieving the same 50% that the "experts" are, while the Bayesian is still stuck at 25%.
Now, of course, one would hope that no algorithm would be so Sphexish as to ignore pure alternation like this. But the point of the randomized weighted majority guarantee is that it holds (up to the correctness of the random number generator) regardless of how much more complicated reality may be than the experts' models.
A monkey who picked randomly between the experts would do better than both the "frequentist" and the "bayesian". Maybe that should worry us...
Well, I was trying to make the simplest possible example. We can of course add the monkey to our pool of experts. But part of the problem of machine learning is figuring out how long we need to watch an expert fail before we go to the monkey.