I recently started watching an interesting lecture by Michael Jordan on Bayesians and frequentists; he's a pretty successful machine learning expert that takes both views in his work. You can watch it here: http://videolectures.net/mlss09uk_jordan_bfway/. I found it interesting because his portrayal of frequentism is much different than the standard portrayal on lesswrong. It isn't about whether probabilities are frequencies or beliefs, it's about trying to get a good model versus trying to get rigorous guarantees of performance in a class of scenarios. So I wonder why the meme on lesswrong is that frequentists think probabilities are frequencies; in practice it seems to be more about how you approach a given problem. In fact, frequentists seem more "rational", as they're willing to use any tool that solves a problem instead of constraining themselves to methods that obey Bayes' rule.
In practice, it seems that while Bayes is the main tool for epistemic rationality, instrumental rationality should oftentimes be frequentist at the top level (with epistemic rationality, guided by Bayes, in turn guiding the specific application of a frequentist algorithm).
For instance, in many cases I should be willing to, once I have a sufficiently constrained search space, try different things until one of the works, without worrying about understanding why the specific thing I did worked (think shooting a basketball, or riffle shuffling a deck of cards). In practice, it seems like epistemic rationality is important for constraining a search space, and after that some sort of online learning algorithm can be applied to find the optimal action from within that search space. Of course, this isn't true when you only get one chance to do something, or extreme precision is required, but this is not often true in everyday life.
The main point of this thread is to raise awareness of the actual distinction between Bayesians and frequentists, and why it's actually reasonable to be both, since it seems like lesswrong is strongly Bayesian and there isn't even a good discussion of the fact that there are other methods out there.
I think the fundamental insight of Bayesianism is that Bayes' Theorem is the law of inference, not (just) a normative law but a descriptive law — that frequentist methods and other statistical algorithms that make no mention of Bayes aren't cleverly circumventing it, they're implicitly using it. Any time you use some data to generate a belief about some proposition, if you use a method whose output is systematically correlated with reality at all, then you are using Bayes, just with certain assumptions and simplifications mixed in.
The failing of frequentism is not in the specific methods it uses — it is perfectly true that we need simplified methods in order to do much useful inference — but in its claim of "objectivity" that really consists of treating its assumptions and simplifications as though they don't exist, and in its reliance on experimenters' intuition in deciding which methods should be used (considering that different methods make different assumptions that lead to different results). Frequentist methods aren't (all) bad, frequentist epistemology is.
If I remember correctly, it is perfectly possible to create Bayesian formulations of most frequentist methods; of course, they will often still talk about things that Bayesians don't usually care about, like P-values, but they will nevertheless reveal the deductively-valid Bayes-structure of the path from your data to that result. Revealing frequentist methods' hidden structure is important because it lets us understand why they work — when they do work — and it lets us predict when they won't be as useful.
From what I understand, in order to apply Bayesian approaches in practical situations it is necessary to make assumptions which have no formal justification, such as the distribution of priors or the local similarity of analogue measures (so that similar but not exact predictions can be informative). This changes the problem without necessarily solving it. In addition, it doesn't address the issue of AI problems not based on repeated experience, e.g. automated theorem proving. The advantage of statistical approaches such as SVMs is that they produce practi... (read more)