You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Manfred comments on Approval-directed agents - Less Wrong Discussion

9 Post author: paulfchristiano 12 December 2014 10:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread.

Comment author: Manfred 13 December 2014 07:53:38AM 2 points [-]

Commenting with Medium feels like it would be reverse anonymity - if you merely see my real name and facebook profile, you won't know who I am :P

It's tempting to drag in utility functions over actions. So I will. VNM proved that VNM-rational agents have them, after all. Rather than trying to learn my utility function over outcomes, you seem to be saying, why not try to learn my utility function over actions?

These seem somewhat equivalent - one should be a transform of the other. And what seems odd is that you're arguing (reasonably) that using limited resources to learn the utility function over actions performs better than using those resources to learn the utility function over outcomes - even according to the utility function over outcomes!

I wonder if there's a theorem here.

Comment author: paulfchristiano 20 December 2014 03:08:05AM 0 points [-]

Note that the agent is never faced with a gamble over actions---it can choose to deterministically take whatever action it desires. So while VNM gives you a utility function over actions, it is probably uninteresting.

The broader point---that we are learning some transform of preferences, rather than learning preferences directly---seems true. I think this is an issue that people in AI have had some (limited) contacted with. Some algorithms learn "what a human would do" (e.g. learning to play go by predicting human go moves and doing what you think a human would do). Other algorithms, (inverse reinforcement learning) learn what values explain what a human would do, and then pursue those. I think the conventional view is that inverse reinforcement learning is harder, but can yield more robust policies that generalize better. Our situation seems to be somewhat different, and it might be interesting to understand why and to explore the comparison more thoroughly.