You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gwern comments on A toy model of the control problem - Less Wrong Discussion

19 Post author: Stuart_Armstrong 16 September 2015 02:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (24)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 16 September 2015 04:20:09PM *  3 points [-]

Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn't be hard at all to implement this problem...

(Incidentally, I think the easiest way to 'fix' the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)