Interesting talk on Bayesians and frequentists

jsteinhardt

I recently started watching an interesting lecture by Michael Jordan on Bayesians and frequentists; he's a pretty successful machine learning expert that takes both views in his work. You can watch it here: http://videolectures.net/mlss09uk_jordan_bfway/. I found it interesting because his portrayal of frequentism is much different than the standard portrayal on lesswrong. It isn't about whether probabilities are frequencies or beliefs, it's about trying to get a good model versus trying to get rigorous guarantees of performance in a class of scenarios. So I wonder why the meme on lesswrong is that frequentists think probabilities are frequencies; in practice it seems to be more about how you approach a given problem. In fact, frequentists seem more "rational", as they're willing to use any tool that solves a problem instead of constraining themselves to methods that obey Bayes' rule.

In practice, it seems that while Bayes is the main tool for epistemic rationality, instrumental rationality should oftentimes be frequentist at the top level (with epistemic rationality, guided by Bayes, in turn guiding the specific application of a frequentist algorithm).

For instance, in many cases I should be willing to, once I have a sufficiently constrained search space, try different things until one of the works, without worrying about understanding why the specific thing I did worked (think shooting a basketball, or riffle shuffling a deck of cards). In practice, it seems like epistemic rationality is important for constraining a search space, and after that some sort of online learning algorithm can be applied to find the optimal action from within that search space. Of course, this isn't true when you only get one chance to do something, or extreme precision is required, but this is not often true in everyday life.

The main point of this thread is to raise awareness of the actual distinction between Bayesians and frequentists, and why it's actually reasonable to be both, since it seems like lesswrong is strongly Bayesian and there isn't even a good discussion of the fact that there are other methods out there.

Model selection is definitely one of the biggest conceptual problems in GAI right now (I would say that planning once you have a model is of comparable importance / difficulty). I think the way to solve this sort of problem is by having humans carefully pick a really good model (flexible enough to capture even unexpected situations while still structured enough to make useful predictions). Even with SVMs you are implicitly assuming some sort of structure on the data, because you usually transform your inputs into some higher-dimensional space consisting of what you see as useful features in the data.

Even though picking the model is the hard part, using Bayes by default seems like a good idea because it is the only general method I know of for combining all of my assumptions without having to make additional arbitrary choices about how everything should fit together. If there are other methods, I would be interested in learning about them.

What would the "really good model" for a GAI look like? Ideally it should capture our intuitive notions of what sorts of things go on in the world without imposing constraints that we don't want. Examples of these intuitions: superficially similar objects tend to come from the same generative process (so if A and B are similar in ways X and Y, and C is similar to both A and B in way X, then we would expect C to be similar to A and B in way Y, as well); temporal locality and spatial locality underly many types of causality (so if we are trying to infer an input-output relationship, it should be highly correlated over inputs that are close in space/time); and as a more concrete example, linear momentum tends to persist over short time scales. A lot of work has been done in the past decade on formalizing such intuitions, leading to nonparametric models such as Dirichlet processes and Gaussian processes. See for instance David Blei's class on Bayesian nonparametrics (http://www.cs.princeton.edu/courses/archive/fall07/cos597C/index.html) or Michael Jordan's tutorial on Dirichlet processes (http://www.cs.berkeley.edu/~jordan/papers/pearl-festschrift.pdf).

I'm beginning to think that a top-level post on how Bayes is actually used in machine learning would be helpful. Perhaps I will make on when I have a bit more time. Also, does anyone happen to know how to collapse URLs in posts (e.g. the equivalent of test in HTML).

Is model selection really a big problem? I thought that there was a conceptually simple way to incorporate this into a model (just add a model index parameter), though it might be computationally tricky sometimes. As JohnDavidBustard points out below, the real difficulty seems like model creation. Though I suppose you can frame this as model selection if you have some prior over a broad enough category of models (say all turing machines).

1JohnDavidBustard16y

A high level post on its use would be very interesting. I think my main criticism of the Bayes approach is that it leads to the kind of work you are suggesting i.e. have a person construct a model and then have a machine calculate its parameters. I think that much of what we value in intelligent people is their ability to form the model themselves. By focusing on parameter updating we aren't developing the AI techniques necessary for intelligent behavior. In addition, because correct updating does not guarantee good performance (because the model properties dominate) then we will always have to judge methods based on experimental results. Because we always come back to experimental results, whatever general AI strategy we develop its structure is more likely to be one that searches for new ways to learn (with bayesian model updating and SVMs as examples) and validates these strategies using experimental data (replicating the behaviour of the AI field as a whole). I find it useful to think about how people solve problems and examine the huge gulf between specific learning techniques and these approaches. For example, to replicate a Bayesian AI researcher an AI needs to take a small amount of data, an incomplete informal model of the process that generates it (e.g. based on informal metaphors of physical processes the author is familiar with) and then find a way of formalising this informal model (so that its behaviour under all conditions can be calculated) and possibly doing some theorem proving to investigate properties of the model. They then apply potentially standard techniques to determine the models parameters and judge its worth based on experiment (potentially repeating the whole process if it doesn't work). By focusing on Bayesian approaches we aren't developing techniques that can replicate these kinds of lateral and creative thinking behaviour. Saying there is only one valid form of inference is absurd because it doesn't address these problems. I fe

1CronoDAS16y

Click the "Help" link that appears to the right of the "comment" and "Cancel" buttons for directions.

11

Interesting talk on Bayesians and frequentists

11

11

11

Interesting talk on Bayesians and frequentists

11

11