You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

jacob_cannell comments on How do humans assign utilities to world states? - Less Wrong Discussion

2 Post author: Dorikka 31 May 2015 08:40PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (11)

You are viewing a single comment's thread. Show more comments above.

Comment author: RichardKennaway 01 June 2015 09:41:51AM 1 point [-]

What does this method produce if there is no utility function that accurately models the agent's decisions?

Comment author: jacob_cannell 01 June 2015 05:06:12PM 2 points [-]

I'm not sure, but I'd guess it wouldn't produce much. For example, if the agent is just making random decisions, well you won't be able to learn from that.

The IRL research so far has used training data provided by humans, and can infer human goal shaped utility functions for at least the fairly simple problem domains tested so far. Most of this research was done almost a decade ago and hasn't been as active recently. In particular if you scaled it up with modern tech, I bet that IRL techniques could learn the score function of Atari from watching human play - for example.