You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

jacob_cannell comments on Concept Safety: Producing similar AI-human concept spaces - Less Wrong Discussion

31 Post author: Kaj_Sotala 14 April 2015 08:39PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

You are viewing a single comment's thread. Show more comments above.

Comment author: jacob_cannell 17 April 2015 05:36:06PM 0 points [-]

This is not how IRL works at all. The utility function does not come from a special reward channel controlled by a human. There is no button.

To reiterate my description earlier, IRL is based on inferring the unknown utility function of an agent given examples of the agent's behaviour in terms of observations and actions. The utility function is entirely an internal component of the model.