You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Eliezer_Yudkowsky comments on A toy model of the control problem - Less Wrong Discussion

19 Post author: Stuart_Armstrong 16 September 2015 02:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (24)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 18 September 2015 07:57:18PM 2 points [-]

I assume the point of the toy model is to explore corrigibility or other mechanisms that are supposed to kick in after A and B end up not perfectly value-aligned, or maybe just to show an example of why a non-value-aligning solution for A controlling B might not work, or maybe specifically to exhibit a case of a not-perfectly-value-aligned agent manipulating its controller.