You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Stuart_Armstrong comments on Values at compile time - Less Wrong Discussion

7 Post author: Stuart_Armstrong 26 March 2015 12:25PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (17)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 27 March 2015 03:01:09PM 2 points [-]

The module is supposed to be a predictive model of what humans mean or expect, rather than something that "convinces" or does anything like that.

Comment author: tailcalled 27 March 2015 04:35:16PM 1 point [-]

I know, but my point is that such a model might be very perverse, such as "Humans do not expect to find out that you presented misleading information." rather than "Humans do not expect that you present misleading information."

Comment author: Stuart_Armstrong 30 March 2015 02:13:02PM 0 points [-]

You're right. This thing can come up in terms of "predicting human behaviour", if the AI is sneaky enough. It wouldn't come up in "compare human models of the world to reality". So there are subtle nuances there to dig into...