Stuart_Armstrong comments on Heroin model: AI "manipulates" "unmanipulatable" reward - All

6 Post author: Stuart_Armstrong 22 September 2016 10:27AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 23 September 2016 09:48:04AM 2 points [-]
  1. Don't worry, I'm going to be adding depth to the model. But note that the AI's predictive accuracy is never in doubt. This is sort of a reverse "can't derive an ought from as is"; here, you can't derive a wants from a did. The learning agent will only get the correct human motivation (if such a thing exists) if it has the correct model of what counts as desires for a human. Or some way of learning this model, which is what I'm looking at (again, there's a distinction between learning a model that gives correct prediction of human actions, and learning a mode that gives what we would call a correct model of human motivation).

  2. According to its model, the AI is not being manipulative here, simply doing what the human desires indicate it should.