CarlShulman comments on A toy model of the control problem - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (24)
Of course, with this model it's a bit of a mystery why A gave B a reward function that gives 1 per block, instead of one that gives 1 for the first block and a penalty for additional blocks. Basically, why program B with a utility function so seriously out of whack with what you want when programming one perfectly aligned would have been easy?
Maybe the easiest way of generalising this is programming B to put 1 block in the hole, but, because B was trained in a noisy environment, it gives only a 99.9% chance of the block being in the hole if it observes that. Then six blocks in the hole is higher expected utility, and we get the same behaviour.
That still involves training it with no negative feedback error term for excess blocks (which would overwhelm a mere 0.1% uncertainty).
This is supposed to be a toy model of excessive simplicity. Do you have suggestions for improving it (for purposes of presenting to others)?
Maybe explain how it works when being configured, and then stops working when B gets a better model of the situation/runs more trial-and-error trials?
Ok.