Arenamontanus comments on A toy model of the control problem - Less Wrong

19 Post author: Stuart_Armstrong 16 September 2015 02:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (24)

You are viewing a single comment's thread.

Comment author: Arenamontanus 16 September 2015 03:10:31PM 2 points [-]

It would be neat to actually make an implementation of this to show sceptics. It seems to be within the reach of a MSc project or so. The hard part is representing 2-5.

Comment author: gwern 16 September 2015 04:20:09PM *  3 points [-]

Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn't be hard at all to implement this problem...

(Incidentally, I think the easiest way to 'fix' the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)

Comment author: Stuart_Armstrong 16 September 2015 03:12:00PM 2 points [-]

Why, Anders, thank you for volunteering! ;-)

Comment author: Stuart_Armstrong 16 September 2015 03:13:48PM *  0 points [-]

I would suggest modelling it as "B outputs 'down' -> B goes down iff B active", and similarly for other directions (up, left, and right), "A output 'sleep' -> B inactive", and "A sees block in lower right: output 'sleep'" or something like that.