You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Val comments on [paper] [link] Defining human values for value learners - Less Wrong Discussion

5 Post author: Kaj_Sotala 03 March 2016 09:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (6)

You are viewing a single comment's thread.

Comment author: Val 04 March 2016 06:18:01PM 0 points [-]

Let's assume such an AI could be created perfectly.

Wouldn't there be a danger of freezing human values forever to the values of the society which created it?

Imagine somehow the Victorian people (using steampunk or whatever) managed to build such an AI, and that AI would forever enforce their values. Would you be happy with every single value it enforced?

Comment author: Matthew_Opitz 04 March 2016 07:49:17PM 3 points [-]

I don't want to speak for the original author, but I imagine that presumably the AI would take into account that the Victorian society's culture was changing based on its interactions with the AI, and that the AI would try to safeguard the new, updated values...until such a time as those new values became obsolete as well.

In other words, it sounds like under this scheme the AI's conception of human values would not be hardcoded. Instead, it would observe our affect to see what sorts of new activities had become terminal in their own right that made us intrinsically happy to participate in, and the AI would adapt to this change in human culture to facilitate the achievement of those new activities.

That said, I'm still unsure about how one could guarantee that the AI could not hack its own "human affect detector" to make it very easy for itself by forcing smiles on everyone's face under torture and defining torture as the preferred human activity.

Comment author: Kaj_Sotala 05 March 2016 08:36:58AM 1 point [-]

I endorse this comment.

That said, I'm still unsure about how one could guarantee that the AI could not hack its own "human affect detector" to make it very easy for itself by forcing smiles on everyone's face under torture and defining torture as the preferred human activity.

That's a valid question, but note that it's asking a different question than the one that this model is addressing. (This model asks "what are human values and what do we want the AI to do with them", your question here is "how can we prevent the AI from wireheading itself in a way that stops it doing the things that we want it to do". "What" versus "how".)

Comment author: Kaj_Sotala 05 March 2016 08:33:36AM 2 points [-]

So in this formulation, human values are explicitly considered to be dynamic and in a constant change as people accumulate new experiences and have their environment change. Say that the Victorians invent a steampunk version of the Internet; that's going to cause them to have new kinds of experiences, which will cause changes in their values.

Both individuals and societies also have lots of different value conflicts that they will want to resolve; see e.g. the last three paragraphs of this comment. Resolving those conflicts and helping people find the most rewarding things will naturally change their values.

Now there is still a bit of a risk of value lock-in, in that the AI is postulated to use the society's existing values as the rule that determines what kinds of adjustments to values are acceptable. But I think that there's an inevitable tradeoff, in that we both want to allow for value evolution, and to make sure that we don't end up in a future that would contain nothing of value (as judged by us current-day humans). Unless we are prepared to just let anything happen (in which case why bother with Friendly AI stuff in the first place?), we need to have our existing values guide some of the development process.