Abstract:

Value alignment is a property of an intelligent agent indicating that it can only pursue goals that are beneficial to humans. Successful value alignment should ensure that an artificial general intelligence cannot intentionally or unintentionally perform behaviors that adversely affect humans. This is problematic in practice since it is difficult to exhaustively enumerated by human programmers. In order for successful value alignment, we argue that values should be learned. In this paper, we hypothesize that an artificial intelligence that can read and understand stories can learn the values tacitly held by the culture from which the stories originate. We describe preliminary work on using stories to generate a value-aligned reward signal for reinforcement learning agents that prevents psychotic-appearing behavior. 

-- Using Stories to Teach Human Values to Artificial Agents 

Comment by the lead researcher Riedl (cited on Slashdot):

"The AI ... runs many thousands of virtual simulations in which it tries out different things and gets rewarded every time it does an action similar to something in the story," said Riedl, associate professor and director of the Entertainment Intelligence Lab. "Over time, the AI learns to prefer doing certain things and avoiding doing certain other things. We find that Quixote can learn how to perform a task the same way humans tend to do it. This is significant because if an AI were given the goal of simply returning home with a drug, it might steal the drug because that takes the fewest actions and uses the fewest resources. The point being that the standard metrics for success (eg, efficiency) are not socially best." 

Quixote has not learned the lesson of "do not steal," Riedl says, but "simply prefers to not steal after reading and emulating the stories it was provided."

New Comment
4 comments, sorted by Click to highlight new comments since:

We teach children simple morality rules with stories of distinct good and evil behaviour. We protect children from disturbing movies that are not appropriate for their age. Why?

Because children might loose their compass in the world. First they have to create a settled morality compass. Fairy tales are told to widen the personal experience of children by examples of good and evil behaviour. When the morality base is settled children are ready for real life stories without these black/white distinctions. Children who experience a shocking event that changes everything in their life "age faster" than their peers. Education and stories try to prepare children for these kinds of events. Real life is the harder and faster way to learn. As these shocking events can cause traumas that exist the entire life we should take care educating our algorithms. As we do not intend to get traumatized paranoid AIs it is a good idea to introduce complexity and immorality late. The first stories should build a secure morality base. If this base is tested and solid against disruptive ideas then it is time to move to stories that brake rules of morality. Parents have it easy to observe if a child is ready for a disruptive story. If the child is overwhelmed and starts weeping it was too much.

I have never heard that algorithms can express any kind of internal emotions. To understand the way an algorithm conceives a story research should not forget about internal emotional state.

I have commented about the need of something comparable like a caregiver for an AI before: http://lesswrong.com/lw/ihx/rationality_quotes_september_2013/9r1f

I don't mean that necessarily literally but in the sense of providing a suitable learning context at the right development phase. Think training different layers of a NN with differently advanced patterns.

As we do not intend to get traumatized paranoid AIs it is a good idea to introduce complexity and immorality late.

I'd like to know in what sense you mean an AI to be traumatized. Getting stuck in a 'bad' local maximum of the search space?

For real story understanding more complex models will be necessary than off-the-shelf convolutional deep NN. If these complex network structures were subjected to a traumatic event these networks will work properly as before after some time. But if something triggers the memory of this traumatic event subnetworks will run wild: Their outputs will reach extremes and will influence all other subnetworks with biases. This biases could be: Everything you observe is the opposite of what you think - you cannot trust your teacher, you cannot trust anybody, everything around you is turning against you. Try to protect yourself against this by all means available.

The effect could be that backprop learning gradients will be inverted and learning deviates from its normal functionality.

Que it reading Superintelligence and having an idea.