Artificial Neural Networks (ANNs) are based around the backpropagation algorithm. The backpropagation algorithm allows you to perform gradient descent on a network of neurons. When we feed training data through an ANNs, we use the backpropagation algorithm to tell us how the weights should change.
ANNs are good at inference problems. Biological Neural Networks (BNNs) are good at inference too. ANNs are built out of neurons. BNNs are built out of neurons too. It makes intuitive sense that ANNs and BNNs might be running similar algorithms.
There is just one problem: BNNs are physically incapable of running the backpropagation algorithm.
We do not know quite enough about biology to say it is impossible for BNNs to run the backpropagation algorithm. However, "a consensus has emerged that the brain cannot directly implement backprop, since to do so would require biologically implausible connection rules"[1].
The backpropagation algorithm has three steps.
- Flow information forward through a network to compute a prediction.
- Compute an error by comparing the prediction to a target value.
- Flow the error backward through the network to update the weights.
The backpropagation algorithm requires information to flow forward and backward along the network. But biological neurons are one-directional. An action potential goes from the cell body down the axon to the axon terminals to another cell's dendrites. An axon potential never travels backward from a cell's terminals to its body.
Hebbian theory
Predictive coding is the idea that BNNs generate a mental model of their environment and then transmit only the information that deviates from this model. Predictive coding considers error and surprise to be the same thing. Hebbian theory is specific mathematical formulation of predictive coding.
Predictive coding is biologically plausible. It operates locally. There are no separate prediction and training phases which must be synchronized. Most importantly, it lets you train a neural network without sending axon potentials backwards.
Predictive coding is easier to implement in hardware. It is locally-defined; it parallelizes better than backpropagation; it continues to function when you cut its substrate in half. (Corpus callosotomy is used to treat epilepsy.) Digital computers break when you cut them in half. Predictive coding is something evolution could plausibly invent.
Unification
The paper Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs[1:1] "demonstrate[s] that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules." The authors have unified predictive coding and backpropagation into a single theory of neural networks. Predictive coding and backpropagation are separate hardware implementations of what is ultimately the same algorithm.
There are two big implications of this.
- This paper permanently fuses artificial intelligence and neuroscience into a single mathematical field.
- This paper opens up possibilities for neuromorphic computing hardware.
Source is available on arxiv. ↩︎ ↩︎
Daniel Kokotajlo was the person who originally pointed me to this article. Thank you!
There is no question that human brains have tons of instincts built-in. But there is a hard limit on how much information a single species' instincts can contain. It is implausible that human beings' cognitive instincts contain significantly more information than the human genome (750 megabytes). I expect our instincts contain much less.
Human brains definitely have special architectures too, like the hippocampus. The critical question is how important these special architectures are. Are our special architectures critical to general intelligence or are they just speed hacks? If they are speed hacks then we can outrace them by building a bigger computer or writing more efficient algorithms.
There is no doubt that humans transmit more cultural knowledge than other animals. This has to do with language. (More specifically, I think our biology underpinning language hit a critical point around 50,000 years ago.) Complex grammar is not present in any non-human animal. Wernicke's area is involved. Wernicke's area could be a special architecture.
How important are the above human advantages? I believe that taking a popular ANN architecture and merely scaling it up will not enable a neural network to compete with humans at StarCraft with equal quantities of training data. If, in addition, the ANN is not allowed to utilize transfer learning then I am willing to publicly bet money on this prediction. (The ANN must be restricted to a human rate of actions-per-second. The ANN does not get to play via an API or similar hand-coded preprocessor. If the ANN watches videos of other players then that counts towards its training data.)