Artificial Neural Networks (ANNs) are based around the backpropagation algorithm. The backpropagation algorithm allows you to perform gradient descent on a network of neurons. When we feed training data through an ANNs, we use the backpropagation algorithm to tell us how the weights should change.
ANNs are good at inference problems. Biological Neural Networks (BNNs) are good at inference too. ANNs are built out of neurons. BNNs are built out of neurons too. It makes intuitive sense that ANNs and BNNs might be running similar algorithms.
There is just one problem: BNNs are physically incapable of running the backpropagation algorithm.
We do not know quite enough about biology to say it is impossible for BNNs to run the backpropagation algorithm. However, "a consensus has emerged that the brain cannot directly implement backprop, since to do so would require biologically implausible connection rules"[1].
The backpropagation algorithm has three steps.
- Flow information forward through a network to compute a prediction.
- Compute an error by comparing the prediction to a target value.
- Flow the error backward through the network to update the weights.
The backpropagation algorithm requires information to flow forward and backward along the network. But biological neurons are one-directional. An action potential goes from the cell body down the axon to the axon terminals to another cell's dendrites. An axon potential never travels backward from a cell's terminals to its body.
Hebbian theory
Predictive coding is the idea that BNNs generate a mental model of their environment and then transmit only the information that deviates from this model. Predictive coding considers error and surprise to be the same thing. Hebbian theory is specific mathematical formulation of predictive coding.
Predictive coding is biologically plausible. It operates locally. There are no separate prediction and training phases which must be synchronized. Most importantly, it lets you train a neural network without sending axon potentials backwards.
Predictive coding is easier to implement in hardware. It is locally-defined; it parallelizes better than backpropagation; it continues to function when you cut its substrate in half. (Corpus callosotomy is used to treat epilepsy.) Digital computers break when you cut them in half. Predictive coding is something evolution could plausibly invent.
Unification
The paper Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs[1:1] "demonstrate[s] that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules." The authors have unified predictive coding and backpropagation into a single theory of neural networks. Predictive coding and backpropagation are separate hardware implementations of what is ultimately the same algorithm.
There are two big implications of this.
- This paper permanently fuses artificial intelligence and neuroscience into a single mathematical field.
- This paper opens up possibilities for neuromorphic computing hardware.
Source is available on arxiv. ↩︎ ↩︎
I have not thought about these issues too much in the intervening time. Re-reading the discussion, it sounds plausible to me that the evidence is compatible with roughly brain-sized NNs being roughly as data-efficient as humans. Daniel claims:
I think the human observation-reaction loop is closer to ten times that fast, which results in a 3 OOM difference. This sounds like a gap which is big, but could potentially be explained by architectural differences or other factors, thus preserving a possibility like "human learning is more-or-less gradient descent". Without articulating the various hypotheses in more detail, this doesn't seem like strong evidence in any direction.
Not before now. I think the comment had a relatively high probability in my world, where we still have a poor idea of what algorithm the brain is running, and a low probability in Daniel's world, where evidence is zooming in on predictive coding as the correct hypothesis. Some quotes which I think support my hypothesis better than Daniel's:
This illustrates how we haven't pinned down the mechanical parts of algorithms. What this means is that speculation about the algorithm of the brain isn't yet causally grounded -- it's not as if we've been looking at what's going on and can build up a firm abstract picture of the algorithm from there, the way you might successfully infer rules of traffic by watching a bunch of cars. Instead, we have a bunch of different kinds of information at different resolutions, which we are still trying to stitch together into a coherent picture.
This directly addresses the question of how clear-cut things are right now, while also pointing to many concrete problems the predictive coding hypothesis faces. The comment continues on that subject for several more paragraphs.
This paragraph supports my picture that hypotheses about what the brain is doing are still largely being pulled from ML, which speaks against the hypothesis of a growing consensus about what the brain is doing, and also illustrates the lack of direct looking-at-the-brain-and-reporting-what-we-see.
On the other hand, it seems quite plausible that this particular person is especially enthusiastic about analogizing ML algorithms and the brain, since that is what they work on; in which case, this might not be so much evidence about the state of neuroscience as a whole. Some neuroscientist could come in and tell us why all of this stuff is bunk, or perhaps why Predictive Coding is right and all of the other ideas are wrong, or perhaps why the MCMC thing is right and everything else is wrong, etc etc.
But I take it that Daniel isn't trying to claim that there is a consensus in the field of neuroscience; rather, he's probably trying to claim that the actual evidence is piling up in favor of predictive coding. I don't know. Maybe it is. But this particular domain expert doesn't seem to think so, based on the SSC comment.