Leslie Valiant discusses the the "probably approximately correct," or PAC, model of machine learning.

InquilineKea

7 Leslie Valiant discusses the the "probably approximately correct," or PAC, model of machine learning.

by InquilineKea

25th Jun 2011

2 min read

7

http://cacm.acm.org/magazines/2011/6/108655-qa-a-lifelong-learner/fulltext

Wow, this is quite interesting. What are your thoughts?

You made a number of important contributions to parallel computing in the 1980s. Can you tell me about your more recent work in that arena?

The root problem with parallel machines has not been that any one is inherently bad, but more that it is difficult to make sense of them as a totality. The most striking fact about sequential machines is that they are all the same—they emulate the von Neumann abstraction. This sameness has been vital to the growth of computing since it has enabled us to write portable software for this abstraction and make machines to emulate it. Thus the von Neumann model has served as the vital "bridging model" between the software and hardware.

I have been long interested in the question of whether a similarly effective bridging model exists for parallel machines and what that might be. Parallelism has now arrived with multiple cores in our laptops. My view is that the irresistible driving forces that have brought this about have been the imperatives of physics. It follows, I think, that we have to look to the driving physical limitations to see what is most fundamental about parallel computing. Computation, communication, synchronization, and memory all need various resources, which in turn are limited by the physical arrangements of the devices. The sought-after bridging model has to account for these costs. We have been spoiled over the last 60-odd years by having an abstraction as simple as the von Neumann model suffice. Life will become a little more complicated now. The absence of an accepted bridging model is evident even at the theoretical level of algorithms.

My most recent work addresses these issues via a particular proposed bridging model, called the Multi-BSP, which tries to capture the physical limitations of machines as simply as possible. It uses barrier synchronization in a hierarchical manner and idealizes a machine as a set of numerical parameters that specify a point in the space of possible machines. Mention of the many architectural details of current machines that are not inevitably suggested by physics is deliberately omitted.

Let's talk about your most recent work in computational neuroscience. Can you explain the "neuroidal model" you developed in your book Circuits of the Mind?

The neuroidal model is designed to describe how basic computational tasks are carried out in the brain at the level of neurons. We do not know what changes the brain undergoes when it memorizes a new word or makes a new association. We need some language for describing the alternative algorithms that a network of neurons may be implementing. Memorizing a new word is easy enough for a computer, but it is still mysterious how the brain does it. Devising algorithms that do not obviously contradict biology is not easy, because biology appears very constrained. Each neuron is connected to only a small fraction of the others, and its influence on each of those is believed to be weak on the average.

You also developed a formal system called robust logics that seeks to reconcile what you've described as "the apparent logical nature of reasoning and the statistical nature of learning."

Robust logic is aimed at tasks which we do not understand how to do on any computational device, let alone the brain. The tasks in question are those of reasoning using information that is learned from experience and therefore uncertain. Humans manage to make effective decisions even in an uncertain world and even with only partial knowledge. This skill is close to the heart of intelligence, I believe, and understanding it better raises some fundamental scientific questions.

Personal Blog

7

New Comment

Rendering 0/9 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:21 PM

Moderation Log

7 Leslie Valiant discusses the the "probably approximately correct," or PAC, model of machine learning.

by InquilineKea

25th Jun 2011

2 min read

7

http://cacm.acm.org/magazines/2011/6/108655-qa-a-lifelong-learner/fulltext

Wow, this is quite interesting. What are your thoughts?

You made a number of important contributions to parallel computing in the 1980s. Can you tell me about your more recent work in that arena?

The root problem with parallel machines has not been that any one is inherently bad, but more that it is difficult to make sense of them as a totality. The most striking fact about sequential machines is that they are all the same—they emulate the von Neumann abstraction. This sameness has been vital to the growth of computing since it has enabled us to write portable software for this abstraction and make machines to emulate it. Thus the von Neumann model has served as the vital "bridging model" between the software and hardware.

I have been long interested in the question of whether a similarly effective bridging model exists for parallel machines and what that might be. Parallelism has now arrived with multiple cores in our laptops. My view is that the irresistible driving forces that have brought this about have been the imperatives of physics. It follows, I think, that we have to look to the driving physical limitations to see what is most fundamental about parallel computing. Computation, communication, synchronization, and memory all need various resources, which in turn are limited by the physical arrangements of the devices. The sought-after bridging model has to account for these costs. We have been spoiled over the last 60-odd years by having an abstraction as simple as the von Neumann model suffice. Life will become a little more complicated now. The absence of an accepted bridging model is evident even at the theoretical level of algorithms.

My most recent work addresses these issues via a particular proposed bridging model, called the Multi-BSP, which tries to capture the physical limitations of machines as simply as possible. It uses barrier synchronization in a hierarchical manner and idealizes a machine as a set of numerical parameters that specify a point in the space of possible machines. Mention of the many architectural details of current machines that are not inevitably suggested by physics is deliberately omitted.

Let's talk about your most recent work in computational neuroscience. Can you explain the "neuroidal model" you developed in your book Circuits of the Mind?

The neuroidal model is designed to describe how basic computational tasks are carried out in the brain at the level of neurons. We do not know what changes the brain undergoes when it memorizes a new word or makes a new association. We need some language for describing the alternative algorithms that a network of neurons may be implementing. Memorizing a new word is easy enough for a computer, but it is still mysterious how the brain does it. Devising algorithms that do not obviously contradict biology is not easy, because biology appears very constrained. Each neuron is connected to only a small fraction of the others, and its influence on each of those is believed to be weak on the average.

You also developed a formal system called robust logics that seeks to reconcile what you've described as "the apparent logical nature of reasoning and the statistical nature of learning."

Robust logic is aimed at tasks which we do not understand how to do on any computational device, let alone the brain. The tasks in question are those of reasoning using information that is learned from experience and therefore uncertain. Humans manage to make effective decisions even in an uncertain world and even with only partial knowledge. This skill is close to the heart of intelligence, I believe, and understanding it better raises some fundamental scientific questions.

Personal Blog

7

New Comment

Rendering 0/9 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:21 PM

Moderation Log

More from InquilineKea

Curated and popular this week

9Comments

Comment Permalink

DanB15y60

It's interesting how terms lose their original relevance. Nowadays it would seem silly to name a subfield of machine learning "probably approximately correct", since that describes all of machine learning.

From the article:

Successive contributions by Kearns, Schapire, and Freund not only resolved this question in the very unexpected positive direction, but yielded the boosting technique, perhaps the most powerful general learning method now known.

Valiant is talking specifically about the AdaBoost algorithm. I'm not sure boosting is the most powerful general learning method now known - the support vector machine seems more powerful - but it certainly has the best ratio of power:simplicity. Anyone with a passing interest in machine learning should implement AdaBoost and play around with it a bit.

Fred Guth7y10

I disagree with you. PAC is about how to analyse learning algorithms in terms of sample complexity. It is not about AdaBoost, nor Neural networks, or whatever. Actually, the learning algorithm is a black box in the PAC model (or metamodel if you prefer).

I think the name is perfect and evergreen. The same way we want to analyse the correctness of algorithms and know its time (or resource) complexity, for learning algorithms you also have another dimension which is how much data you need (sample complexity).

PAC is trying to answer what is the bound on the size of training dataset needed to achieve a certain prediction correctness with a certain confidence (probability).

3[anonymous]15y

Valiant is not talking specifically about AdaBoost, although AdaBoost was the first of these algorithms and is well known due to wide proliferation. See this which succinctly describes the differences of some of the different boosters out there. In particular the linked paper by Philip Long at Google is really nice for showing the limitations of boosters and also understanding the fact that boosters are really nothing more than a specialized gradient descent if you recast them in the right way. [...] This is not what is meant here. SVM is a classification rule itself, whereas boosting is a metarule that operates on classification rules themselves and attempts to make coherent use of multiple decision rules each with different degrees of confidence and error. It makes no sense to compare the usefulness of SVMs to the usefulness of boosting, boosting operates on SVMs. To boot, generalized kernel learning methods, sparse dictionary coding, bag-of-words, and Reproducing Kernel Hilbert Space methods all have many cases where they are vastly superior to SVM. For that matter, even simpler methods like Fisher Linear Discriminant can outperform SVM in a lot of practical cases. And SVM lacks much extension to fully unsupervised learning. I think Valiant, whose office sits down the hall from my adviser's and who I have frequent conversations with, is on the money with this stuff.

2jsteinhardt15y

I think PAC learning refers specifically to algorithms with rigorous bounds on their probability of error. There are theorems saying that Bayesian updaters get this automatically, but as far as I know they assume that (1) nature's distribution coincides with your subjective distribution and (2) you are implementing inference exactly. Since both of these are pretty dubious, I don't think most statistical machine learning is considered PAC. The examples that would come to mind in my view are things like boosting, multi-armed bandit, and sparse recovery, but I'm not really an expert in this area. ETA: There are also still tons of people adopting the symbolic approach to AI, which is another reason why I'd be hesitant to label all of machine learning as PAC (even if the symbolic approach seems likely to be a dead end, I'd rather not pretend that it's not still in use).

See in context