You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Kaj_Sotala comments on [paper] [link] Defining human values for value learners - Less Wrong Discussion

5 Post author: Kaj_Sotala 03 March 2016 09:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (6)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 05 March 2016 08:36:58AM 1 point [-]

I endorse this comment.

That said, I'm still unsure about how one could guarantee that the AI could not hack its own "human affect detector" to make it very easy for itself by forcing smiles on everyone's face under torture and defining torture as the preferred human activity.

That's a valid question, but note that it's asking a different question than the one that this model is addressing. (This model asks "what are human values and what do we want the AI to do with them", your question here is "how can we prevent the AI from wireheading itself in a way that stops it doing the things that we want it to do". "What" versus "how".)