I’m having a problem understanding why Stuart Russell thinks that AI learning human preferences is a good idea. I think it’s a bad idea. I assume I am wrong but I assume I don’t understand something. So, help me out here please. I’m not looking for an argument but rather to understand. Let me explain.
I have watched Stuart’s four hour lecture series of Reith lectures on the BBC. Highly recommended. I have watched several other videos that include him as well. I am currently reading his book, Human Compatible. I am not an academic. I am now retired and write science fiction about advanced social robots for a hobby.
Reading the chapter “AI: A Different Approach” in Stuart’s book I am still bothered by something about the preferences issue. My understanding of Stuart’s “new model for AI” is that it would learn what our preferences are from observing our behavior. I understand why he thinks “preferences” is a better word than “values” to describe these behaviors but, at the risk of confusing things, let me use the word values to explain my confusion.
As I understand it, humans have different kinds of values:
1) Those that are evolved and which we all share as a species, like why sugar tastes good or why glossy hair is attractive.
2) Those that reflect our own individuality, which make each of us unique including those some twin studies reveal.
3) Those our culture, family, society or what have you impose on us.
I believe the first two kinds are genetic and the third kind learned. Let me classify the first two as biological values and the third kind as social values. It would appear that the third category accounts for the majority of the recent evolution of our physical brains.
Let’s consider three values for each type just as simple examples. Biological values might be greed, selfishness and competition while social values might be trust, altruism and cooperation. Humans are a blend of all six of these values and will exhibit preferences based on them in different situations. A lot of times they are going to choose behaviors based on biological values as the nightly news makes clear.
If AI learns our preferences base on our behaviors it’s going to learn a lot of “bad” things like lying, stealing and cheating and other much worse things. From a biological point of view, these behaviors are “good” because they maximize the return on calories invested by getting others to do the work while we reap the benefits. Parasites and cuckoo birds for example.
In his Reith lecture Stuart states that an AI trained on preferences will not turn out evil but he never explains why not. There is no mention (so far) in his book regarding the issue of human preferences and anything we would consider negative, bad or evil. I simply don’t understand how an AI observing our behavior is going to end up being exclusively benevolent or “provably beneficial” to use Stuart’s term.
I think an AI learning from our preferences would be a terrible idea. What am I not understanding?
I believe that this "biological = selfish, social = cooperative" dichotomy is wrong. It is a popular mistake to make, because it provides legitimacy to all kinds of political regimes, by allowing them to take credit for everything good that humans living in them do. It also allows one to express "edgy" opinions about human nature.
But if homo sapiens actually had no biological foundations for trust, altruism, and cooperation, then... it would be extremely difficult for our societies to instill such values in humans; and most likely we wouldn't even try, because we simply wouldn't think about such things as desirable. The very idea that fundamentally uncooperative humans somehow decided to cooperate at creating a society that brainwashes humans into being capable of cooperation is... somewhat self-contradictory.
(The usual argument is that it would make sense, even for a perfectly selfish asshole, to brainwash other people into becoming cooperative altruists. The problem with this argument is that the hypothetical perfectly selfish wannabe social engineer couldn't accomplish such project alone. And when many people start cooperating on brainwashing the next generation, what makes even more sense for a perfectly selfish asshole is to... shirk their duty at creating the utopia, or even try to somehow profit from undermining this communal effort.)
Instead, I think we have dozens of instincts which under certain circumstances nudge us towards more or less cooperation. Perhaps in the ancient environment they were balanced in the way that maximized survival (sometimes by cooperation with others, sometimes at their expense), but currently the environment changes too fast for humans to adapt...
I agree with your main point: it is not obvious how training an AI on human preferences (which are sometimes "good" and sometimes "evil") would help us achieve the goal (separating the "good" from "evil" from "neutral").
Thanks for responding Viliam. Totally agree with you that “if homo sapiens actually had no biological foundations for trust, altruism, and cooperation, then... it would be extremely difficult for our societies to instill such values”.
As you say, we have a blend of values that shift as required by our environment. I appreciate your agreement that it’s not really clear how training an AI on human preferences solves the issue raised here.
Of all the things I have ever discussed in person or on-line values are the most challenging. I’ve been interested in human... (read more)