You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Kyre comments on A toy model of the control problem - Less Wrong Discussion

19 Post author: Stuart_Armstrong 16 September 2015 02:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (24)

You are viewing a single comment's thread. Show more comments above.

Comment author: CarlShulman 17 September 2015 03:02:48AM 2 points [-]

Of course, with this model it's a bit of a mystery why A gave B a reward function that gives 1 per block, instead of one that gives 1 for the first block and a penalty for additional blocks. Basically, why program B with a utility function so seriously out of whack with what you want when programming one perfectly aligned would have been easy?

Comment author: Kyre 17 September 2015 05:42:58AM 7 points [-]

It's a trade-off. The example is simple enough that the alignment problem is really easy to see, but it also means that it is easy to shrug it off and say "duh, just the use obvious correct utility function for B".

Perhaps you could follow it up with an example with more complex mechanics (and or more complex goal for A) where the bad strategy for B is not so obvious. You then invite the reader to contemplate the difficulty of the alignment problem as the complexity approaches that of the real world.