The idea that the cortex or cerebellum, for example, can be described as "general purpose re-programmable hardware" is lacking in both clarity and support.
"General purpose learning hardware" is perhaps better. I used "re-programmable" as an analogy to an FPGA.
However, in a literal sense the brain can learn to use simpe paper + pencil tools as an extended memory, and can learn to emulate a turing machine. Given huge amounts of time, the brain could literally run windows.
And more to the point, programmers ultimately rely on the ability of our brain to simulate/run little sections of code. So in a more practical literal sense, all of the code of windows first ran on human brains.
You seem to be saying that the cortex is a universal reinforcement learning machine
You seem to be hung up reinforcement learning. I use some of that terminology to define a ULM because it is just the most general framework - utility/value functions, etc. Also, there is some pretty strong evidence for RL in the brain, but the brain's learning mechanisms are complex - moreso than any current ML system. I hope I conveyed that in the article.
Learning in the lower sensory cortices in particular can also be modeled well by unsupervised learning, and I linked to some articles showing how UL models can reproduce sensory cortex features. UL can be viewed as a potentially reasonable way to approximate the ideal target update, especially for lower sensory cortex that is far (in a network depth sense) from any top down signals from the reward system. The papers I linked to about approximate bayesian learning and target propagation in particular can help put it all into perspective.
clear evidence that we have found evidence for a reinforcement learning machine in the brain already.
Well, the article summarizes the considerable evidence that the brain is some sort of approximate universal learning machine. I suspect that you have a particular idea of RL that is less than fully general.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Upon consideration, I changed my own usage of "Universal Reinforcement Learning Machine" to "Universal Learning Machine".
The several remaining uses of "reinforcement learning" are contained now to the context of the BG and the reward circuitry.
Again we are probably talking about very different RL conceptions. So to be clear, I summarized my general viewpoint of an ULM. I believe it is an extremely general model, that basically covers any kind of universal learning agent. The agent optimizes/steers the future according to some sort of utility function (which is extremely general), and self-optimization emerges naturally just by including the agent itself as part of the system to optimize.
Do you have a conception of a learning agent which does not fit into that framework?
The evidence for RL in the brain - of the extremely general form I described - is indeed very strong, simply because any type of learning is just a special case of universal learning. Taboo 'reinforcement' if you want, and just replace with "utility driven learning".
AIXI specifically has a special reward channel, and perhaps you are thinking of that specific type of RL which is much more specific than universal learning. I should perhaps clarify and or remove the mention of AIXI.
A ULM - as I described - does not have a reward channel like AIXI. It just conceptually has a value and or utility function initially defined by some arbitrary function that conceptually takes the whole brain/model as input. In the case of the brain, the utility function is conceptual, in practice it is more directly encoded as a value function.
About the universality or otherwise of RL. Big topic.
There's no need to taboo "RL" because switching to utility-based learning does not solve the issue (and the issue I have in mind covers both).
See, this is the problem. It is hard for me to fight the idea that RL (or utility-driven learning) works, because I am forced to fight a negative; a space where something should be, but which is empty ....... namely, the empirical fact that Reinforcement Learning has never been made to work in the absence of some surrounding machinery that prepares or simplifies the ground for the RL mechanism.
It is a naked fact about traditional AI that it puts such an emphasis on the concept of expected utility calculations without any guarantees that a utility function can be laid on the world in such a way that all and only the intelligent actions in that world are captured by a maximization of that quantity. It is a scandalously unjustified assumption, made very hard to attack by the fact that it is repeated so frequently that everyone believes it be true just because everyone else believes it.
If anyone ever produced a proof why it should work, there would be a there there, and I could undermine it. But .... not so much!
About AIXI and my conversation with Marcus: that was actually about the general concept of RL and utility-driven systems, not anything specific to AIXI. We circled around until we reached the final crux of the matter, and his last stand (before we went to the conference banquet) was "Yes, it all comes down to whether you believe in the intrinsic reasonableness of the idea that there exists a utility function which, when maximized, yields intelligent behavior .......... but that IS reasonable, .... isn't it?"
My response was "So you do agree that that is where the buck stops: I have to buy the reasonableness of that idea, and there is no proof on the table for why I SHOULD buy it, no?"
Hutter: "Yes."
Me: "No matter how reasonable it seems, I don't buy it"
His answer was to laugh and spread his arms wide. And at that point we went to the dinner and changed to small talk. :-)