You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Houshalter comments on Open thread, Oct. 10 - Oct. 16, 2016 - Less Wrong Discussion

3 Post author: MrMind 10 October 2016 07:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (147)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 14 October 2016 02:26:41PM 0 points [-]

What are you even trying to say?

I'm saying that if you can't recognize Friendliness (and I don't think you can), trying to build a FAI is pointless as you will not be able to answer "Is it Friendly?" even when looking at it.

I think an AI will easily be able to learn human values from observations.

So if you can't build a supervised model, you think going to unsupervised learning will solve your problems? The quote I gave you is part of human values -- humans do value triumph over their enemies. Evolution taught humans to eliminate competition, it taught them to be aggressive and greedy -- all human values. Why do you think your values will be preferred by the AI to values of, say, ISIS or third-world Maoist guerrillas? They're human, too.

Comment author: Houshalter 15 October 2016 01:44:37AM 0 points [-]

Why do I need to recognize Friendliness to build an FAI? I only need to know that the process used to construct it results in a friendly AI. Trying to inspect the weights of a complex neural network (or whatever) is pointless as I stated earlier. We haven't the slightest idea how alphaGo's net really works, but we can trust it to beat the best Go champions.

Evolution taught humans to eliminate competition, it taught them to be aggressive and greedy -- all human values.

Evolution also taught humans to be cooperative, empathetic, and kind.

Really your objection seems to be the whole point of CEV. A CEV wouldn't just include the values of ISIS members, but also their victims. And it would be extrapolated, to not just be their current opinions on things, but what their opinions would be if they knew more. Their values if they had more time to think about and consider issues. With those two conditions, the negative parts of human values are entirely eliminated.

Comment author: entirelyuseless 15 October 2016 08:39:19PM 0 points [-]

This amounts to saying "because I'm right and once everyone gets to know reality better, they'll figure out I'm right."

In reality they will also figure out the places where you are wrong, and there will be many of them.

Comment author: Houshalter 16 October 2016 06:47:29AM 0 points [-]

I'm not claiming that at all. I may be wrong about many things. It's irrelevant.

Comment author: entirelyuseless 16 October 2016 08:21:22PM 0 points [-]

It is not irrelevant. You said, "With those two conditions, the negative parts of human values are entirely eliminated." That certainly meant that things like ISIS opinions would be eliminated. I agree in that particular case, but there are many other things that you would consider negative which will not be eliminated. I can probably guess some of them, although I won't do that here.

Comment author: Houshalter 20 October 2016 08:29:43PM 0 points [-]

See my other comment for more clarification on how CEV would eliminate negative values.

Comment author: entirelyuseless 21 October 2016 04:42:26AM 0 points [-]

I read that. You say there, "Your stated example was ISIS. ISIS is so bad because they incorrectly believe... If they knew all the arguments for and against religion, then their values would be more like ours." As I said, I agree with you in that case. But you are indeed saying, "it is because I am right and when they know better they will know I was right." And that will not always be true, even if it is true in that case.

Comment author: Houshalter 21 October 2016 04:47:37AM 0 points [-]

I never claimed I am right about everything. I don't need to be right about everything. I would love to have an AI show me what I am wrong about and show me the perfect set of values.

And most importantly, I'm saying that this process would result in the optimal set of values for everyone. Do you disagree?

Comment author: entirelyuseless 21 October 2016 12:10:22PM 0 points [-]

Yes, I disagree. I think that "babyeater values are different from human values" differs only in degree from "my values are different from your values." I do not think there is a reasonable chance that I will turn out to be wrong about this, just like there is no reasonable chance that if we measure our heights with sufficient accuracy, we will turn out to have different heights. This is still another reason why we should speak of "babyeater morality" and "human morality," namely because if morality is inconsistent with variety, then morality does not exist.

That said, I already said that I would not be willing to wipe out non-human values from the cosmos, and likewise I have no interest in imposing my personal values on everything else. I think these are really the same thing, and in that sense wanting to impose a CEV on the universe is being a "racist" in relation to human beings vs other intelligent beings.

Comment author: Houshalter 25 October 2016 06:37:30AM *  0 points [-]

People may have different values (although I think deep down we are very similar, humans sharing all the same brains and not having that much diversity.) Regardless, CEV should find the best possible compromise between our different values. That's literally the whole point.

If there is a difference in our values, the AI will find the compromise that satisfies us the most (or dissatisfies us the least.) There is no alternative, besides not compromising at all and just taking the values of a single random person. From behind the veil of ignorance, the first is definitely preferable.

I don't think this will be so bad. Because I don't think our values diverge so much, or that decent compromises are impossible between most values. I imagine that in the worst case, the compromise will be that two groups with different values will have to go their separate ways. Live on opposite sides of the world, never interact, and do their own thing. That's not so bad, and a post-singularity future will have more than enough resources to support it.

That said, I already said that I would not be willing to wipe out non-human values from the cosmos

No one is suggesting we wipe out non-human values. But we have yet to meet any intelligent aliens with different values. Once we do so, we may very well just apply CEV to them and get the best compromise of our values again. Or we may keep our own values, but still allow them to live separately and do their own thing, because we value their existence.

This reminds me a lot of the post value is fragile. It's ok to want a future that has different beings in it, that are totally different than humans. That doesn't violate my values at all. But I don't want a future that has beings die or suffer involuntarily. I don't think it's "value racist" to want to stop beings that do value that.

Comment author: Lumifer 17 October 2016 02:31:10PM *  0 points [-]

I only need to know that the process used to construct it results in a friendly AI.

You are still facing the same problem. Given that you can't recognize friendliness, how will you create or choose a process which will build a FAI? Would you be able to answer "Will it be friendly?" by looking at the process?

the negative parts of human values are entirely eliminated.

That doesn't make much sense. What do you mean by "negative" and from which point of view? If from the point of view of the AI, that's just a trivial tautology. If from the point of view of (at least some) humans, this seems to be not so.

In general, do you treat morals/values as subjective or objective? If objective, the whole "if they knew more" part is entirely unnecessary: you're discovering empirical reality, not consulting with people on what do they like. And subjectivism here, of course, makes the whole idea of CEV meaningless.

Also, I see no evidence to support the view that as people know more, their morals improve, for pretty much any value of "improve".

Comment author: Houshalter 20 October 2016 08:27:59PM 1 point [-]

how will you create or choose a process which will build a FAI?

You are literally asking me to solve the FAI problem right here and now. I understand that FAI is a very hard problem and I don't expect to solve it instantly. Just because a problem is hard, doesn't mean it can't have a solution.

First of all let me adopt some terminology from Superintelligence. I think FAI requires solving two somewhat different problems. Value Learning and Value Loading.

You seem to think Value Learning is the hard problem, getting an AI to learn what humans actually want. I think that's the easy problem, and any intelligent AI will form a model of humans and understand what we want. Getting it to care about what we want seems like the hard problem to me.

But I do see some promising ideas to approach the problem. For instance have AIs that predict what choices a human would make in each situation. So you basically get an AI which is just a human, but sped up a lot. Or have an AI which presents arguments for and against each choice, so that humans can make more informed choices. Then it could predict what choice a human would make after hearing all the arguments, and do that.

More complicated ideas were mentioned in Superintelligence. I like the idea of "motivational scaffolding".Somehow train an AI that can learn how the world works and can generate an "interpretable model". Like e.g. being able to understand English sentences and translate their meanings to representations the AI can use. Then you can explicitly program a utility function into the AI using its learned model.

That doesn't make much sense. What do you mean by "negative" and from which point of view?

From your point of view. You gave me examples of values which you consider bad, as an argument against FAI. I'm showing you that CEV would eliminate these things.

Also, I see no evidence to support the view that as people know more, their morals improve, for pretty much any value of "improve".

Your stated example was ISIS. ISIS is so bad because they incorrectly believe that God is on their side and wants them to do the things they do. That the people that die will go to heaven, so loss of life isn't so bad. If they were more intelligent, informed, and rational... If they knew all the arguments for and against religion, then their values would be more like ours. They would see how bad killing people is, and that their religion is wrong.

The second thing CEV does is average everyone's values together. So even if ISIS really does value killing people, their victims value not being killed even more. So a CEV of all of humanity would still value life, even if evil people's values are included. Even if everyone was a sociopath, their CEV would still be the best compromise possible, between everyone's values.

Comment author: Lumifer 21 October 2016 02:54:55PM 2 points [-]

You are literally asking me to solve the FAI problem right here and now.

No, I'm asking you to specify it. My point is that you can't build X if you can't even recognize X.

You seem to think Value Learning is the hard problem, getting an AI to learn what humans actually want.

Learning what humans want is pretty easy. However it's an inconsistent mess which involves many things contemporary people find unsavory. Making it all coherent and formulating a (single) policy on the basis of this mess is the hard part.

From your point of view. You gave me examples of values which you consider bad, as an argument against FAI. I'm showing you that CEV would eliminate these things.

Why would CEV eliminate things I find negative? This is just a projected typical mind fallacy. Things I consider positive and negatve are not (necessarily) things many or most people consider positive and negative. Since I don't expect to find myself in a privileged position, I should expect CEV to eliminate some things I believe are positive and impose some things I believe are negative.

Later you say that CEV will average values. I don't have average values.

If they knew all the arguments for and against religion, then their values would be more like ours. They would see how bad killing people is, and that their religion is wrong.

I see no evidence to believe this is true and lots of evidence to believe this is false.

You are essentially saying that religious people are idiots and if only you could sit them down and explain things to them, the scales would fall from their eyes and they will become atheists.This is a popular idea, but it fails real-life testing very very hard.

Comment author: Houshalter 25 October 2016 06:11:58AM *  0 points [-]

No, I'm asking you to specify it. My point is that you can't build X if you can't even recognize X.

And I don't agree with that. I've presented some ideas on how an FAI could be built, and how CEV would work. None of them require "recognizing" FAI. What would it even mean to "recognize" FAI, except to see that it values the kinds of things we value and makes the world better for us.

Learning what humans want is pretty easy. However it's an inconsistent mess which involves many things contemporary people find unsavory. Making it all coherent and formulating a (single) policy on the basis of this mess is the hard part.

I've written about one method to accomplish this, though there may be better methods.

Why would CEV eliminate things I find negative? This is just a projected typical mind fallacy. Things I consider positive and negatve are not (necessarily) things many or most people consider positive and negative.

Humans are 99.999% identical. We have the same genetics, the same brain structures, and mostly the same environments. The only reason this isn't obvious, is because we spend almost all our time focusing on the differences between people, because that's what's useful in everyday life.

I should expect CEV to eliminate some things I believe are positive and impose some things I believe are negative.

That may be the case, but that's still not a bad outcome. In the example I used, the values dropped from ISIS members were taken for 2 reasons. That they were based on false beliefs, or that they hurt other people. If you have values based on false beliefs, you should want them to be eliminated. If you have values that hurt other people then it's only fair that be eliminated. Or else you risk the values of people that want to hurt you.

Later you say that CEV will average values. I don't have average values.

Well I think it's accurate, but it's somewhat nonspecific. Specifically, CEV will find the optimal compromise of values. The values that satisfy the most people the most amount. Or at least dissatisfy the fewest people the least. See the post I just linked for more details, on one example of how that could be implemented. That's not necessarily "average values".

In the worst case, people with totally incompatible values will just be allowed to go separate ways, or whatever the most satisfying compromise is. Muslims live on one side of the dyson sphere, Christians on the other, and they never have to interact and can do their own thing.

You are essentially saying that religious people are idiots and if only you could sit them down and explain things to them, the scales would fall from their eyes and they will become atheists.This is a popular idea, but it fails real-life testing very very hard.

My exact words were "If they were more intelligent, informed, and rational... If they knew all the arguments for and against..." Real world problems of persuading people don't apply. Most people don't research all the arguments against their beliefs, and most people aren't rational and seriously consider the hypothesis that they are wrong.

For what it's worth, I was deconverted like this. Not overnight by any means. But over time I found that the arguments against my beliefs were correct and I updated my belief.

Changing world views is really really hard. There's no one piece of evidence or one argument to dispute. Religious people believe that there is tons of evidence of God. To them it just seems obviously true. From miracles, to recorded stories, to their own personal experiences, etc. It takes a lot of time to get at every single pillar of the belief and show its flaws. But it is possible. It's not like Muslims were born believing in Islam. Islam is not encoded in genetics. People deconvert from religions all the time, entire societies have even done it.

In any case, my proposal does not require literally doing this. It's just a thought experiment. To show that the ideal set of values is what you choose if you had all the correct beliefs.

Comment author: Lumifer 25 October 2016 03:01:48PM *  0 points [-]

What would it even mean to "recognize" FAI

It means that when you look an an AI system, you can tell whether it's FAI or not.

If you can't tell, you may be able to build an AI system, but you still won't know whether it's FAI or not.

I've written about one method to accomplish this

I don't see what voting systems have to do with CEV. The "E" part means you don't trust what the real, current humans say, so to making them vote on anything is pointless.

Humans are 99.999% identical.

That's a meaningless expression without a context. Notably, we don't have the same genes or the same brain structures. I don't know about you, but it is really obvious to me that humans are not identical.

...false beliefs ... it's only fair ...

How do you know what's false? You are a mere human, you might well be mistaken. How do you know what's fair? Is it an objective thing, something that exists in the territory?

The values that satisfy the most people the most amount.

Right, so the fat man gets thrown under the train... X-)

Muslims live on one side of the dyson sphere, Christians on the other

Hey, I want to live on the inside. The outside is going to be pretty gloomy and cold :-/

Real world problems of persuading people don't apply.

LOL. You're just handwaving then. "And here, in the difficult part, insert magic and everything works great!"

Comment author: Houshalter 25 October 2016 08:42:59PM 0 points [-]

It means that when you look an an AI system, you can tell whether it's FAI or not.

Look at it how? Look at it's source code? I argued that we can write source code that will result in FAI, and you could recognize that. Look at the weights of it's "brain"? Probably not, anymore than we can look at human brains and recognize what they do. Look at it's actions? Definitely, FAI is an AI that doesn't destroy the world etc.

I don't see what voting systems have to do with CEV. The "E" part means you don't trust what the real, current humans say, so to making them vote on anything is pointless.

The voting doesn't have to actually happen. The AI can predict what we would vote for, if we had plenty of time to debate it. And you can get even more abstract than that and have the FAI just figure out the details of E itself.

The point is to solve the "coherent" part. That you can find a set of coherent values from a bunch of different agents or messy human brains. And to show that mathematicians have actually extensively studied a special case of this problem, voting systems.

That's a meaningless expression without a context. Notably, we don't have the same genes or the same brain structures. I don't know about you, but it is really obvious to me that humans are not identical.

Compared to other animals, compared to aliens, yes we are incredibly similar. We do have 99.99% identical DNA, our brains all have the same structure with minor variations.

How do you know what's false?

Did I claim that I did?

How do you know what's fair? Is it an objective thing, something that exists in the territory?

I gave a precise algorithm for doing that actually.

Right, so the fat man gets thrown under the train... X-)

Which is the best possible outcome, vs killing 5 other people. But I don't think these kinds of scenarios are realistic once we have incredibly powerful AI.

LOL. You're just handwaving then. "And here, in the difficult part, insert magic and everything works great!"

I'm not handwaving anything... There is no magic involved at all. The whole scenario of persuading people is counterfactual and doesn't need to actually be done. The point is to define more exactly what CEV is. It's the values you would want if you had the correct beliefs. You don't need to actually have the correct beliefs, to give your CEV.

Comment author: Lumifer 26 October 2016 02:28:40PM 0 points [-]

<shrug> I think we have, um, irreconcilable differences and are just spinning wheels here. I'm happy to agree to disagree.

Comment author: hairyfigment 22 October 2016 11:12:46PM 0 points [-]

We typically imagine CEV asking what people would do if they 'knew what the AI knew' - let's say the AI tries to estimate expected value of a given action, with utility defined by extrapolated versions of us who know the truth, and probabilities taken from the AI's own distribution. I am absolutely saying that theism fails under any credible epistemology, and any well-programmed FAI would expect 'more knowledgeable versions of us' to become atheists on general principles. Whether or not this means they would change "if they knew all the arguments for and against religion," depends on whether or not they can accept some extremely basic premise.

(Note that nobody comes into the word with anything even vaguely resembling a prior that favors a major religion. We might start with a bias in favor of animism, but nearly everyone would verbally agree this anthropomorphism is false.)

It seems much less clear if CEV would make psychopathy irrelevant. But potential victims must object to their own suffering at least as much as real-world psychopaths want to hurt them. So the most obvious worst-case scenario, under implausibly cynical premises, looks more like Omelas than it does a Mongol invasion. (Here I'm completely ignoring the clause meant to address such scenarios, "had grown up farther together".)

Comment author: Lumifer 24 October 2016 03:15:05PM 1 point [-]

We typically imagine CEV asking what people would do if they 'knew what the AI knew'

No, we don't, because this would be a stupid question. CEV doesn't ask people, CEV tells people what they want.

any well-programmed FAI would expect 'more knowledgeable versions of us' to become atheists on general principles.

I see little evidence to support this point of view. You might think that atheism is obvious, but a great deal of people, many of them smarter than you, disagree.