royf comments on Reinforcement, Preference and Utility - Less Wrong

7 Post author: royf 08 August 2012 06:23AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (5)

You are viewing a single comment's thread. Show more comments above.

Comment author: royf 09 August 2012 03:06:55PM *  5 points [-]

Clearly you have some password I'm supposed to guess.

This post is not preliminary. It's supposed to be interesting in itself. If it's not, then I'm doing something wrong, and would appreciate constructive criticism.

Comment author: kjmiller 09 August 2012 07:10:06PM 4 points [-]

You have presented a very clear and very general description of the Reinforcement Learning problem.

I am excited to read future posts that are similarly clear and general and describe various solutions to RL. I'm imagining the kinds of things that can be found in the standard introduction, and hoping for a nonstandard perspective that might deepen my understanding.

Perhaps this is what Richard is waiting for as well?

Comment author: RichardKennaway 09 August 2012 07:04:02PM 2 points [-]

Clearly you have some password I'm supposed to guess.

Only the one in the title of these posts: "reinforcement learning". Both words have indeed appeared in this post, but I don't see you talking about reinforcement, learning, or reinforcement learning yet.

This post is not preliminary. It's supposed to be interesting in itself. If it's not, then I'm doing something wrong, and would appreciate constructive criticism.

I can't say any more than the above. I don't see the main act on stage yet.

Comment author: MaoShan 13 August 2012 04:43:40AM *  2 points [-]

The process he is describing is the precursor to an important aspect of machine learning, as he mentioned at the beginning. I guess he could make the connection more obvious from the start to hold interest for those of us who don't immediately see it. I suspect that you feel like I do, that pressing on the feeder bar labeled "Followed by" will bring the main point closer, after three times it is frustrating. My one suggestion would be to combine all three into one article and stop teasing us. I'm hooked, but they were short enough to put them all together.