V_V comments on The Brain as a Universal Learning Machine - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (166)
Thanks, I was waiting for at least one somewhat critical reply :)
The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I've defined that hypothesis, at least for the cortex. The EMH posits that the specific cortical regions rely on complex innate circuitry specialized for specific tasks. The evidence disproves that hypothesis.
Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn't differentiate the two hypotheses.
Yes - and I described what is known about that basic architecture. The extent to which a particular brain relies on learning vs innate behaviour depends on various tradeoffs such as organism lifetime and brain size. Small brained and short-living animals have much less to gain from learning (less time to acquire data, less hardware power), so they rely more on innate circuitry, much of which is encoded in the oldbrain and the brainstem. This is all very much evidence for the ULH. The generic learning structures - the cortex and cerbellum, generally grow in size with larger organisms and longer lifespans.
This has also been tested via decortication experiments and confirms the general ULH - rabbits rely much less on their cortex for motor behavior, larger primates rely on it almost exclusively, cats and dogs are somewhere in between, etc.
This evidence shows that the cortex is general purpose, and acquires complex circuitry through learning. Recent machine learning systems provide further evidence in the form of - this is how it could work.
As I mentioned in the article, backprop is not really biologically plausible. Targetprop is, and there are good reasons to suspect the brain is using something like targetprop - as that theory is the latest result in a long line of work attempting to understand how the brain could be doing long range learning. Investigating and testing the targetprop theory and really confirming it could take a while - even decades. On the other hand, if targetprop or some variant is proven to work in a brain-like AGI, that is something of a working theory that could then help accelerate neuroscience confirmation.
I did not say deep learning is "how the brain works". I said instead the brain is - roughly - a specific biological implementation of a ULH, which itself is a very general model which also will include any practical AGIs.
I said that DL helps indirectly confirm the ULH of the brain, specifically by showing how the complex task specific circuitry of the cortex could arise through a simple universal learning algorithm.
Computational modeling is key - if you can't build something, you don't understand it. To the extent that any AI model can functionally replicate specific brain circuits, it is useful to neuroscience. Period. Far more useful than psychological theorizing not grounded in circuit reality. So computational neuroscience and deep learning (which really is just the neuroscience inspired branch of machine learning) naturally have deep connections.
Biological plausibility was one of the heavily discussed aspects of RELUs.
From the abstract:
"While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of . . "
Weight sharing is unbiological: true. It is also an important advantage that von-neumman (time-multiplexed) systems have over biological (non-multiplexed). The neuromorphic hardware approaches largely cannot handle weight-sharing. Of course convnents still work without weight sharing - it just may require more data and or better training and regularization. It is interesting to speculate how the brain deals with that, as is comparing the details of convent learning capability vs bio-vision. I don't have time to get into that at the moment, but I did link to at least one article comparing convents to bio vision in the OP.
Sure - so just taboo it then. When I use the term "deep learning", it means something like "the branch of machine learning which is more related to neuroscience" (while still focused on end results rather than emulation).
Comparing two learning systems trained on completely different datasets with very different objective functions is complicated.
In general though, CNNs are a good model of fast feedforward vision - the first 150ms of the ventral stream. In that domain they are comparable to biovision, with the important caveat that biovision computes a larger and richer output parameter map than most any CNNs. Most CNNs (there are many different types) are more narrowly focused, but also probably learn faster because of advantages like weight sharing. The amount of data required to train the CNN up to superhuman performance on narrow tasks is comparable or less than that required to train a human visual system up to high performance. (but again the cortex is doing something more like transfer learning, which is harder)
Past 150 ms or so and humans start making multiple saccades and also start to integrate information from a larger number of brain regions, including frontal and temporal cortical regions. At that point the two systems aren't even comparable, humans are using more complex 'mental programs' over multiple saccades to make visual judgements.
Of course, eventually we will have AGI systems that also integrate those capabilities.
That's actually extremely impressive - superhuman learning speed.
In that case, I would say you may want to read up more on the field. If you haven't yet, check out the original sparse coding paper (over 3000 citations), to get an idea of how crucial new computational models have been for advancing our understanding of cortex.
But none of these works as well as using the original task-specific regions, and anyway in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.
But then why doesn't universal learning just co-opt some other brain region to perform the task of the damaged one? In the cases where there is a congenital malformation, that makes the usual task-specific region missing or dysfunctional, why isn't the task allocated to some other region?
And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset from different random initializations each time the hidden nodes will specialize in a different way: at least ANNs have permutation symmetry between nodes in the same layer, and as long as nodes operate in the linear region of the activation function, there is also redundancy between layers. This means that many sets of weights specify the same or similar function, and the training process chooses one of them randomly depending on the initialization (and minibatch sampling, dropout, etc.).
If, as you claim, the basal ganglia and the cortex in the brain make up a sort of cpu-memory system, then there should be substantial permutation symmetry. After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals.
Instead we observe substantial consistency, even in the left-right specialization patterns which is remarkable since at macroscopic level the brain has substantial lateral symmetry.
Decortication experiments only show that certain species rely on the cortex more than others, they don't show that that cortex is general purpose and acquires complex circuitry through learning.
Horses, for instance, are large animals with a long lifespan and a large brain (encephalization coefficient similar to that of cats and dogs), and yet a newborn horse is able to walk, run and follow their mother within a few hours from birth.
Targetprop is still highly speculative. It has not shown to work well in artificial neural networks and the evidence of biological plausibility is handwavy.
Ok.
In principle yes, but trivially so as they are universal approximators. In practice, weight sharing enables these systems to easily learn translational invariance.
Humans get tired after continuously playing for a few hours, but in terms of overall playtime they learn faster.
No - these studies involve direct measurements (electrodes for the ferret rewiring, MRI for echolocation). They know the rewired auditory cortex is doing vision, etc.
It can, and this does happen all the time. Humans can recover from serious brain damage (stroke, injury, etc). It takes time to retrain and reroute circuitry - similar to relearning everything that was lost all over again.
Current ANN's assume a fixed module layout, so they aren't really comparable in module-task assignment.
Much of the specialization pattern could just be geography - V1 becomes visual because it is closest to the visual input. A1 becomes auditory because it is closest to the auditory input. etc.
This should be the default hypothesis, but there also could be some element of prior loading, perhaps from pattern generators in the brainstem. (I have read a theory that there is a pattern generator for faces that pretrains the visual cortex a little bit in the womb, so that it starts with a vague primitive face detector).
I said the BG is kind-of-like the CPU, the cortex is kind-of-like a big FPGA, but that is an anlogy. The are huge differences between slow bio-circuitry and fast von neumman machines.
Firstly the brain doesn't really have a concept of 'swapping memory'. The closest thing to that is retraining, where the hippocampus can train info into the cortex. It's a slow complex process that is nothing like swapping memory.
Finally the brain is much more optimized at the wiring/latency level. Functionality goes in certain places because that is where it is best for that functionality - it isn't permutation symmetric in the slightest. Every location has latency/wiring tradeoffs. In a von neumman memory we just abstract that all away. Not in the brain. There is an actual optimal location for every concept/function etc.
That is fast for mammals - I know first hand that it can take days for deer. Nonetheless, as we discussed, the brainstem provides a library of innate complex motor circuitry in particular, which various mammals can rely on to varying degrees, depending on how important complex early motor behavior is.
I agree that there is still more work to be done understanding the brain's learning machinery. Targetprop is useful/exciting in ML, but it isn't the full picture yet.
Not at all. The Atari agent becomes semi-superhuman by day 3 of it's life. When humans start playing atari, they already have trained vision and motor systems, and Atari is designed for these systems. Even then your statement is wrong - in that I don't think any children achieve playtester levels of skill in just even a few days.
Well, the eyes are at the front of the head, but the optic nerves connect to the brain at the back, and they also cross at the optic chiasm. Axons also cross contralaterally in the spinal cord and if I recall correctly there are various nerves that also don't take the shortest path.
This seems to me as evidence that the nervous system is not strongly optimized for latency.
This is a total misconception, and it is a good example of the naive engineer fallacy (jumping to the conclusion that a system is poorly designed when you don't understand how the system actually works and why).
Remember the distributed software modules - including V1 - have components in multiple physical modules (cortex, cerebellum, thalamus, BG). Not every DSM has components in all subsystems, but V1 definitely has a thalamic relay component (VGN).
The thalamus/BG is in the center of the brain, which makes sense from wiring minimization when you understand the DPM system. Low freq/compressed versions of the cortical map computations can interact at higher speeds inside the small compact volume of the BG/thalamus. The BG/thalamus basically contains a microcosm model of the cortex within itself.
The thalamic relay comes first in sequential processing order, so moving cortical V1 closer to the eyes wouldn't help in the slightest. (Draw this out if it doesn't make sense)