You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Arenamontanus comments on A toy model of the control problem - Less Wrong Discussion

19 Post author: Stuart_Armstrong 16 September 2015 02:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (24)

You are viewing a single comment's thread.

Comment author: Arenamontanus 16 September 2015 03:10:31PM 2 points [-]

It would be neat to actually make an implementation of this to show sceptics. It seems to be within the reach of a MSc project or so. The hard part is representing 2-5.

Comment author: gwern 16 September 2015 04:20:09PM *  3 points [-]

Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn't be hard at all to implement this problem...

(Incidentally, I think the easiest way to 'fix' the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)

Comment author: Stuart_Armstrong 16 September 2015 03:12:00PM 2 points [-]

Why, Anders, thank you for volunteering! ;-)

Comment author: Stuart_Armstrong 16 September 2015 03:13:48PM *  0 points [-]

I would suggest modelling it as "B outputs 'down' -> B goes down iff B active", and similarly for other directions (up, left, and right), "A output 'sleep' -> B inactive", and "A sees block in lower right: output 'sleep'" or something like that.