You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

paulfchristiano comments on Approval-directed agents - Less Wrong Discussion

9 Post author: paulfchristiano 12 December 2014 10:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread. Show more comments above.

Comment author: paulfchristiano 15 December 2014 05:41:08AM *  2 points [-]

I wrote a follow-up partly addressing the issue of actions vs. outcomes. (Or at least, covering one technical isssue I omtitted from the original post for want of space.)

I agree that Hugh must reason about how well different actions satisfy Hugh's goals, and the AI must reason (or make implicit generalizations about) these judgments. Where am I moving the values complexity problem? The point was to move it into the AI's predictions about what actions Hugh would approve of.

What part of the argument in particular do you think I am being imprecise about? There are particular failure modes, like "deceiving Hugh" or especially "resisting correction" which I would expect to avoid via this procedure. I see no reason why the system would resist correction, for example. I don't see how this is due to confusion about outcomes vs. actions.