Houshalter comments on Open thread, Oct. 10 - Oct. 16, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (111)
We don't know what an AI which maximizes human values is because we don't know what human values are at the necessary level of precision. Not to mention the assumption that the AI will be a maximizer and that values can be maximized.
Who says we need to hardcode human values though? Any reasonable solution will involve an AI that learns what human values are. Or some other method to the control problem that makes AIs that don't want to harm or defy their creators.
But if you don't know what human values are, how can you be sure that the AI will learn them correctly?
So you make an AI and tell it: "Go forth and learn human values!" It goes and in a while comes back and says "Behold, I have learned them". How do you know this is true?
If I train a neural network to recognize dogs, I have no way of knowing if it learned correctly. I can't look at the weights and see if they are correct dog image recognizing weights and not something else. But I can trust the process of training and validation, that the AI has learned to recognize what dogs look like.
It's a similar principle with learning human values. Of course it's more complicated than just feeding it images of dogs, but the principle of letting AIs learn models from real world data is the important part.
Of course you do. You test it. You show it a lot of images (that it hasn't seen before) of dogs and not-dogs and check how good it is at differentiating them.
How would that process work for an AI and human values?
Right, human values: “A man's greatest pleasure is to defeat his enemies, to drive them before him, to take from them that which they possessed, to see those whom they cherished in tears, to ride their horses, and to hold their wives and daughters in his arms.”
Do you expect me to give you the complete solution to AI right here, right now? What are you even trying to say? You seem to be arguing that FAI is impossible. How can you possibly know that? Just because you can't immediately see a solution to the problem, doesn't mean a solution doesn't exist.
I think an AI will easily be able to learn human values from observations. It will be able to build a model of humans, and predict what we will do and say. It certainly won't base all it's understanding on a stupid movie quote. The AI will know what you want.
I'm saying that if you can't recognize Friendliness (and I don't think you can), trying to build a FAI is pointless as you will not be able to answer "Is it Friendly?" even when looking at it.
So if you can't build a supervised model, you think going to unsupervised learning will solve your problems? The quote I gave you is part of human values -- humans do value triumph over their enemies. Evolution taught humans to eliminate competition, it taught them to be aggressive and greedy -- all human values. Why do you think your values will be preferred by the AI to values of, say, ISIS or third-world Maoist guerrillas? They're human, too.
Why do I need to recognize Friendliness to build an FAI? I only need to know that the process used to construct it results in a friendly AI. Trying to inspect the weights of a complex neural network (or whatever) is pointless as I stated earlier. We haven't the slightest idea how alphaGo's net really works, but we can trust it to beat the best Go champions.
Evolution also taught humans to be cooperative, empathetic, and kind.
Really your objection seems to be the whole point of CEV. A CEV wouldn't just include the values of ISIS members, but also their victims. And it would be extrapolated, to not just be their current opinions on things, but what their opinions would be if they knew more. Their values if they had more time to think about and consider issues. With those two conditions, the negative parts of human values are entirely eliminated.
You are still facing the same problem. Given that you can't recognize friendliness, how will you create or choose a process which will build a FAI? Would you be able to answer "Will it be friendly?" by looking at the process?
That doesn't make much sense. What do you mean by "negative" and from which point of view? If from the point of view of the AI, that's just a trivial tautology. If from the point of view of (at least some) humans, this seems to be not so.
In general, do you treat morals/values as subjective or objective? If objective, the whole "if they knew more" part is entirely unnecessary: you're discovering empirical reality, not consulting with people on what do they like. And subjectivism here, of course, makes the whole idea of CEV meaningless.
Also, I see no evidence to support the view that as people know more, their morals improve, for pretty much any value of "improve".
This amounts to saying "because I'm right and once everyone gets to know reality better, they'll figure out I'm right."
In reality they will also figure out the places where you are wrong, and there will be many of them.
I'm not claiming that at all. I may be wrong about many things. It's irrelevant.