Marvin_Minsky - LessWrong

Umm, It looks like he did not read the book "Perceptrons," because he repeats a lot of misinformation from others who also did not read it.

First, none of the theorems in that book are changed or 'refuted' by the use of back-propagation. This is because almost all the book is about whether, in various kinds of connectionist networks, there exist any sets of coefficients to enable the net to recognize various kinds of patterns.
Anyway, because BP is essentially a gradient climbing process, it has all the consequent problems -- such as getting stuck on local peaks.
Those who read the book will see (on page 56) that we did not simple show that the "parity function" is not linearly separable. What we showed is that for a perceptron (with one layer of weighted-threshold neurons that are all connected to a single weighted-threshold output cell), there must be many neuron, each of which have inputs from every point in the retina!
That result is fairly trivial. However, in chapter 9 proves a much deeper limitation: such networks cannot recognize any topological features of a pattern unless either there is one all-seeing neuron that does it, or exponentially many cells with smaller input sets.

A good example of this is: try to make a neural network that looks at a large two-dimensional retina, and decides wither the image contain more than one connected set. That is, whether it is seeing just one object, or more than one object. I don't yet have a decent proof of this (and I'd very much like to see one) but it is clear from the methods in the book that even a multilayer neural network cannot recognize such patterns--unless the number of layers is of the order of the number of points in the retina!!!! This is because a loop-free network cannot do the needed recursion.

The popular rumor is that these limitations are overcome by making networks with more layers. And in fact, networks with more layers can recognize more patterns, but at an exponentially high price in complexity. (One can make networks with loops that can compute some topological features. However, there is no reason to suspect that back-propagation will work on such networks.)

The writer has been sucked in by propaganda. Yes, neural nets with back-propagation can recognize many useful patterns, indeed, but cannot learn to recognize many other important ones—such as whether two different things in a picture share various common features, etc.

Now, you readers should ask about why you have not heard about such problems! Here is the simple,incredible answer: In Physics, if you show that a certain popular theory cannot explain important phenomenon, you're likely to win a Nobel Prize, as when Yang and Lee showed that the standard theory could not explain a certain violation of Parity. Whereas, in the Connectionist Community, if your network cannot recognize a certain type of pattern, you'll simply refrain from announcing this fact, and pretend that nothing has happened--perhaps because you fear that your investors will withdraw their support. So yes, you can indeed connectionist networks that learn which movies a citizen is likely to like, and yes, that can make you some money. And if your robot connectionist robot can't count, so what! Just find a different customer!

LESSWRONG
LW

Posts

Wiki Contributions

Comments