You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Toggle comments on AI-created pseudo-deontology - Less Wrong Discussion

6 Post author: Stuart_Armstrong 12 February 2015 09:11PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (34)

You are viewing a single comment's thread. Show more comments above.

Comment author: Toggle 16 February 2015 08:23:24PM 0 points [-]

In the more frequently considered case of a non-stable utility function, my understanding is that the agent will not try to identify the terminal attractor and then act according to that- it doesn't care about what 'it' will value in the future, except instrumentally. Rather, it will attempt to maximize its current utility function, given a future agent/self acting according to a different function. Metaphorically, it gets one move in a chess game against its future selves.

I don't see any reason for a temporarily uncertain agent to act any differently. If there is no function that is, right now, motivating it to maximize paperclips, why should it care that it will be so motivated in the future? That would seem to require a kind of recursive utility function, one in which it gains utility from maximizing its utility function in the abstract.

Comment author: Stuart_Armstrong 17 February 2015 02:52:14PM 1 point [-]

In this case, the AI has a stable utility function - it just doesn't know yet what it is.

For instance, it could be "in worlds where a certain coin was heads, maximise paperclips; in other worlds, minimise them", and it has no info yet on the coin flip. That's a perfectly consistent and stable utility function.