Arenamontanus comments on A toy model of the control problem - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (24)
It would be neat to actually make an implementation of this to show sceptics. It seems to be within the reach of a MSc project or so. The hard part is representing 2-5.
Since this is a Gridworld model, if you used Reinforce.js, you could demonstrate it in-browser, both with tabular Q-learning but also with some other algorithms like Deep Q-learning. It looks like if you already know JS, it shouldn't be hard at all to implement this problem...
(Incidentally, I think the easiest way to 'fix' the surveillance camera is to add a second conditional to the termination condition: simply terminate on line of sight being obstructed or a block being pushed into the hole.)
Why, Anders, thank you for volunteering! ;-)
I would suggest modelling it as "B outputs 'down' -> B goes down iff B active", and similarly for other directions (up, left, and right), "A output 'sleep' -> B inactive", and "A sees block in lower right: output 'sleep'" or something like that.