Agreed - see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4236403/ and my writeup at https://slatestarcodex.com/2018/10/22/cognitive-enhancers-mechanisms-and-tradeoffs/ .
Related, Acetylcholine has been hypothesized to signal to the rest of the brain that unfamiliar/uncertain things are about to happen
https://www.sciencedirect.com/science/article/pii/S0896627305003624
http://www.gatsby.ucl.ac.uk/~dayan/papers/yud2002.pdf
Thanks! Yeah that would seem to be consistent with "This is a good time to set your learning algorithm to have a higher-than-usual learning rate.", not to mention being alert and paying attention to that part of your sensory input.
What is “learning rate”, and why should we expect a learning-rate-modulation mechanism in the brain?
What is “learning rate”? As many readers know, learning algorithms involve a large number of parameters (“weights” in ML, or synapse locations and strengths in brains) that get changed while the learning algorithm runs. These parameters store the information that the algorithm has learned. “Learning rate” is a multiplier on this change process—the higher the learning rate, the more aggressively you change the parameters each step.
So if the learning rate is zero, you don’t change the parameters at all (and it’s no longer a learning algorithm!). If the learning rate is really high, you remember information more reliably with fewer repetitions. But it’s not all good: if the learning rate is too high, you get problems like (depending on the algorithm) instability, more tendency to overwrite old memories, more tendency to overfit (i.e., to “learn” things that are just random noise rather than robust patterns in the environment), and so on.
(Learning rate is an exact synonym of “plasticity”, as far as I can tell.)
Why might an organism benefit from using different learning rates in different situations? Well, as above, the learning rate involves a tradeoff between different considerations, and it would be awfully surprising if the all-things-considered best learning rate winds up being exactly the same for someone sitting by a campfire vs someone fighting a lion. For example, intuitively, when fighting a lion, you’re in a situation where you’re making lots of life-or-death decisions. You probably want an unusually high learning rate here, so that next time you’re fighting a lion you’ll do a much better job of understanding what’s going on. By contrast, sitting by the campfire, maybe it’s not so important that you remember everything, and the balance of considerations pushes towards a low learning rate, which again has better properties in avoiding overwriting old memories, avoiding overfitting, etc.
(I think there’s a connection to pedagogy here. Everyone knows that students retain information better when doing something arousing, like arguing with someone, vs when they’re bored and inactive. I bet the brain sets its learning rate to a higher setting in the first case! But I think there are other things going on here too—for example, emotional memories will get replayed more often, and “more replays” is functionally similar to “higher learning rate”.)
Doesn’t dopamine (reward prediction error) control learning rate / induce plasticity? Well, yes and no. Here’s an algorithm:
If that’s the algorithm (and I do think the neocortex has a mechanism like this—see my later post Big Picture Of Phasic Dopamine), and if dopamine is the reward prediction error signal, then more learning will happen when there’s more dopamine. But the dopamine is not the “learning rate” here, it’s just a different input into the learning algorithm. For example, phasic dopamine has a baseline level from which it can swing positive or negative, whereas a learning rate ought to be always nonnegative. As another example, predictive learning (a.k.a. self-supervised learning) has a learning rate, but does not have a reward prediction error.
Biological evidence that acetylcholine sets learning rate
Acetylcholine (abbreviation: “ACh”; adjective form: "cholinergic") is a neurotransmitter. I am by no means an acetylcholine expert, and don’t have time to become one, but from a quick skim of the literature, the glove seems to fit:
(Again, I’m not an ACh expert. I tried not to cherry-pick in the above list, but I dunno.)
Does acetylcholine do other things too?
Yes! For one thing, there are tons of little subcortical structures in the brain, and they have specialized mechanisms to do specialized things, and I would not be surprised in the slightest if some of those things involved acetylcholine in a role that has nothing to do with learning rate.
More significantly, in the cortex, I just think evolution is not likely to set up a signaling mechanism which causes one and only one thing to happen. Signals communicate information, and whatever that information is, there are probably multiple processes that "care about" that information, and can thus be improved by having some response to that signaling mechanism. (And then neuroscientists would do experiments and announce that the signal "modulates" whatever that process is.)
Or in this case: if ACh is a signal for how high to set the learning rate, and there’s some other function F such that evolution tends to want high F at more-or-less the same times and same places that evolution tends to want high learning rate, then we should expect ACh to control F too!
The obvious candidates for that other function F are “whatever neuron or network changes are appropriate under conditions of attention and arousal”, since that’s presumably the main condition where you want a higher learning rate. I’m not sure exactly what those changes are. One possible example: This paper (already mentioned above) found that subjects on an ACh-blocking drug did worse on reaction time. Makes sense to me! You can probably save energy by having a slower reaction time most of the time, but you want to speed it up under conditions of attention and arousal. There also seem to be other network-level changes that happen under conditions of attention and arousal (e.g. see here).
You might ask: “Is ACh fundamentally an attention-and-arousal mechanism, and learning-rate-change is piggybacking on that signal? Or is it fundamentally a learning-rate-change mechanism, and other attention-and-arousal-related things are piggybacking on that?” My answer is: I’m not sure that question even has an answer, and if it does, I don’t think it matters.
(Thanks Adam Marblestone for comments on a draft.)