Stuart_Armstrong comments on A toy model of the control problem - Less Wrong

19 Post author: Stuart_Armstrong 16 September 2015 02:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (24)

You are viewing a single comment's thread. Show more comments above.

Comment author: CarlShulman 17 September 2015 03:02:48AM 2 points [-]

Of course, with this model it's a bit of a mystery why A gave B a reward function that gives 1 per block, instead of one that gives 1 for the first block and a penalty for additional blocks. Basically, why program B with a utility function so seriously out of whack with what you want when programming one perfectly aligned would have been easy?

Comment author: Stuart_Armstrong 17 September 2015 06:34:54AM *  5 points [-]

Maybe the easiest way of generalising this is programming B to put 1 block in the hole, but, because B was trained in a noisy environment, it gives only a 99.9% chance of the block being in the hole if it observes that. Then six blocks in the hole is higher expected utility, and we get the same behaviour.

Comment author: CarlShulman 17 September 2015 06:02:50PM *  1 point [-]

That still involves training it with no negative feedback error term for excess blocks (which would overwhelm a mere 0.1% uncertainty).

Comment author: Stuart_Armstrong 18 September 2015 12:01:22PM 0 points [-]

This is supposed to be a toy model of excessive simplicity. Do you have suggestions for improving it (for purposes of presenting to others)?

Comment author: CarlShulman 18 September 2015 03:31:48PM 1 point [-]

Maybe explain how it works when being configured, and then stops working when B gets a better model of the situation/runs more trial-and-error trials?

Comment author: Stuart_Armstrong 18 September 2015 03:56:55PM 0 points [-]

Ok.