Johnicholas comments on Reinforcement Learning: A Non-Standard Introduction (Part 2) - Less Wrong

9 Post author: royf 02 August 2012 08:17AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (7)

You are viewing a single comment's thread.

Comment author: Johnicholas 02 August 2012 03:25:12PM 6 points [-]

It might be valuable to point out that nothing about this is reinforcement learning yet.

Comment author: royf 03 August 2012 10:02:29PM *  1 point [-]

I'm not sure why you say this.

Please remember that this introduction is non-standard, so you may need to be an expert on standard RL to see the connection. And while some parts are not in place yet, this post does introduce what I consider to be the most important part of the setting of RL.

So I hope we're not arguing over definitions here. If you expand on your meaning of the term, I may be able to help you see the connection. Or we may possibly find that we use the same term for different things altogether.

I should also explain why I'm giving a non-standard introduction, where a standard one would be more helpful in communicating with others who may know it. The main reason is that this will hopefully allow me to describe some non-standard and very interesting conclusions.

Comment author: RichardKennaway 04 August 2012 07:35:48AM *  1 point [-]

Please remember that this introduction is non-standard, so you may need to be an expert on standard RL to see the connection.

But since we are not, we cannot.

And while some parts are not in place yet, this post does introduce what I consider to be the most important part of the setting of RL.

Well, there you are. The setting. Not actual RL. So that's two purely preliminary posts so far. When does the main act come on -- the R and the L?

Comment author: Johnicholas 03 August 2012 10:19:20PM 0 points [-]

As I understand it, you're dividing the agent from the world; once you introduce a reward signal, you'll be able to call it reinforcement learning. However, until you introduce a reward signal, you're not doing specifically reinforcement learning - everything applies just as well to any other kind of agent, such as a classical planner.

Comment author: royf 04 August 2012 01:20:33AM *  0 points [-]

That's an excellent point. Of course one cannot introduce RL without talking about the reward signal, and I've never intended to.

To me, however, the defining feature of RL is the structure of the solution space, described in this post. To you, it's the existence of a reward signal. I'm not sure that debating this difference of opinion is the best use of our time at this point. I do hope to share my reasons in future posts, if only because they should be interesting in themselves.

As for your last point: RL is indeed a very general setting, and classical planning can easily be formulated in RL terms.