eli_sennesh comments on The Brain as a Universal Learning Machine - Less Wrong

82 Post author: jacob_cannell 24 June 2015 09:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (166)

You are viewing a single comment's thread. Show more comments above.

Comment author: jacob_cannell 22 June 2015 08:34:19PM *  2 points [-]

I am not sure why you say I am hung up on RL: you quoted that as the only mechanism to be discussed in the context, so I went with that.

Upon consideration, I changed my own usage of "Universal Reinforcement Learning Machine" to "Universal Learning Machine".

The several remaining uses of "reinforcement learning" are contained now to the context of the BG and the reward circuitry.

And you are (like many people) not correct to say that RL is the most general framework,

Again we are probably talking about very different RL conceptions. So to be clear, I summarized my general viewpoint of an ULM. I believe it is an extremely general model, that basically covers any kind of universal learning agent. The agent optimizes/steers the future according to some sort of utility function (which is extremely general), and self-optimization emerges naturally just by including the agent itself as part of the system to optimize.

Do you have a conception of a learning agent which does not fit into that framework?

or that there is good evidence for RL in the brain. That is a myth: the evidence is very poor indeed.

The evidence for RL in the brain - of the extremely general form I described - is indeed very strong, simply because any type of learning is just a special case of universal learning. Taboo 'reinforcement' if you want, and just replace with "utility driven learning".

AIXI specifically has a special reward channel, and perhaps you are thinking of that specific type of RL which is much more specific than universal learning. I should perhaps clarify and or remove the mention of AIXI.

A ULM - as I described - does not have a reward channel like AIXI. It just conceptually has a value and or utility function initially defined by some arbitrary function that conceptually takes the whole brain/model as input. In the case of the brain, the utility function is conceptual, in practice it is more directly encoded as a value function.

Comment author: Richard_Loosemore 23 June 2015 02:41:54AM 5 points [-]

About the universality or otherwise of RL. Big topic.

There's no need to taboo "RL" because switching to utility-based learning does not solve the issue (and the issue I have in mind covers both).

See, this is the problem. It is hard for me to fight the idea that RL (or utility-driven learning) works, because I am forced to fight a negative; a space where something should be, but which is empty ....... namely, the empirical fact that Reinforcement Learning has never been made to work in the absence of some surrounding machinery that prepares or simplifies the ground for the RL mechanism.

It is a naked fact about traditional AI that it puts such an emphasis on the concept of expected utility calculations without any guarantees that a utility function can be laid on the world in such a way that all and only the intelligent actions in that world are captured by a maximization of that quantity. It is a scandalously unjustified assumption, made very hard to attack by the fact that it is repeated so frequently that everyone believes it be true just because everyone else believes it.

If anyone ever produced a proof why it should work, there would be a there there, and I could undermine it. But .... not so much!

About AIXI and my conversation with Marcus: that was actually about the general concept of RL and utility-driven systems, not anything specific to AIXI. We circled around until we reached the final crux of the matter, and his last stand (before we went to the conference banquet) was "Yes, it all comes down to whether you believe in the intrinsic reasonableness of the idea that there exists a utility function which, when maximized, yields intelligent behavior .......... but that IS reasonable, .... isn't it?"

My response was "So you do agree that that is where the buck stops: I have to buy the reasonableness of that idea, and there is no proof on the table for why I SHOULD buy it, no?"

Hutter: "Yes."

Me: "No matter how reasonable it seems, I don't buy it"

His answer was to laugh and spread his arms wide. And at that point we went to the dinner and changed to small talk. :-)

Comment author: [deleted] 27 June 2015 12:06:21AM 1 point [-]

Wait wait wait. You didn't head to the dinner, drink some fine wine, and start raucously debating the same issue over again?

Bah, humbug!

Also, how do I get invited to these conferences again ;-)?

It is a scandalously unjustified assumption, made very hard to attack by the fact that it is repeated so frequently that everyone believes it be true just because everyone else believes it.

Very true, at least regarding AI. Personally, my theory is that the brain does do reinforcement learning, but the "reward function" isn't a VNM-rational utility function, it's just something the body signals to the brain to say, "Hey, that world-state was great!" I can't imagine that Nature used something "mathematically coherent", but I can imagine it used something flagrantly incoherent but really dead simple to implement. Like, for instance, the amount of some chemical or another coming in from the body, to indicate satiety, or to relax after physical exertion, or to indicate orgasm, or something like that.

Comment author: Richard_Loosemore 30 June 2015 05:53:21PM 1 point [-]

Hey, ya pays yer money and walk in the front door :-) AGI conferences run about $400 a ticket I think. Plus the airfare to Berlin (there's one happening in a couple of weeks, so get your skates on).

Re the possibility that the human system does do reinforcement learning .... fact is, if one frames the meaning of RL in a sufficiently loose way, the human cogsys absolutely DOES do RL, no doubt about it. Just as you described above.

But if you sit down and analyze what it means to make the claim that a system uses RL, it turns out that there is a world of difference between the two positions:

The system CAN BE DESCRIBED in such a way that there is reinforcement of actions/internal constructs that lead to positive outcomes in some way,

and

The system is controlled by a mechanism that explicitly represents (A) actions/internal constructs, (B) outcomes or expected outcomes, and (C) scalar linkages between the A and B entities .... and behavior is completely dominated by a mechanism that browses the A, B and C in such a way as to modify one of the C linkages according to the cooccurrence of a B with an A.

The difference is that the second case turns the descriptive mechanism into an explicit mechanism.

It's like Ptolemy's Epicycle model of the solar system. Was Ptolemy's fancy little wheels-within-wheels model a good descriptive model of planetary motion? You bet ya! Would it have been appropriate to elevate that model and say that the planets actually DID move on top of some epicycle-like mechanism? Heck no! As a functional model it was garbage, and it held back a scientific understanding of what was really going on for over a thousand years.

Same deal with RL. Our difficulty right now is that so many people slip back and forth between arguing for RL as a descriptive model (which is fine) and arguing for it as a functional model (which is disastrous, because that was tried in psychology for 30 years, and it never worked).