TheAncientGeek comments on Heroin model: AI "manipulates" "unmanipulatable" reward - All

6 Post author: Stuart_Armstrong 22 September 2016 10:27AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread.

Comment author: TheAncientGeek 22 September 2016 02:46:09PM *  2 points [-]
  1. The idea of that more information can make an AI's inferences worse is surprising. But the idea that the assumption that humans have a unchanging, neatly hierarchical UF is known to be a bad idea, so it is not so surprising that it leads to bad results. In short, this is still a bit clown-car-ish.

  2. Would you tell an AI that Heroin is Bad, but not tell here that Manipulation is Bad?

Comment author: Stuart_Armstrong 23 September 2016 09:48:04AM 2 points [-]
  1. Don't worry, I'm going to be adding depth to the model. But note that the AI's predictive accuracy is never in doubt. This is sort of a reverse "can't derive an ought from as is"; here, you can't derive a wants from a did. The learning agent will only get the correct human motivation (if such a thing exists) if it has the correct model of what counts as desires for a human. Or some way of learning this model, which is what I'm looking at (again, there's a distinction between learning a model that gives correct prediction of human actions, and learning a mode that gives what we would call a correct model of human motivation).

  2. According to its model, the AI is not being manipulative here, simply doing what the human desires indicate it should.