wuwei comments on Only humans can have human values - Less Wrong

34 Post author: PhilGoetz 26 April 2010 06:57PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (159)

You are viewing a single comment's thread.

Comment author: wuwei 27 April 2010 12:40:16AM *  16 points [-]

I suppose I might count as someone who favors "organismal" preferences over confusing the metaphorical "preferences" of our genes with those of the individual. I think your argument against this is pretty weak.

You claim that favoring the "organismal" over the "evolutionary" fails to accurately identify our values in four cases, but I fail to see any problem with these cases.

  • I find no problem with upholding the human preference for foods which taste fatty, sugary and salty. (Note that consistently applied, the "organismal" preference would be for the fatty, sugary and salty taste and not foods that are actually fatty, sugary and salty. E.g. We like drinking diet Pepsi with Splenda almost as much as Pepsi and in a way roughly proportional to the success with which Splenda mimics the taste of sugar. We could even go one step further and drop the actual food part, valuing just the experience of [seemingly] eating fatty, sugary and salty foods.) This doesn't necessarily commit me to valuing an unhealthy diet all things considered because we also have many other preferences, e.g. for our health, which may outweigh this true human value.
  • The next two cases (fear of snakes and enjoying violence) can be dealt with similarly.
  • The last one is a little trickier but I think it can be addressed by a similar principle in which one value gets outweighed by a different value. In this case, it would be some higher-order value such as treating like cases alike. The difference here is that rather than being a competing value that outweighs the initial value, it is more like a constitutive value which nullifies the initial value. (Technically, I would prefer to talk here of principles which govern our values rather than necessarily higher order values.)

I thought your arguments throughout this post were similarly shallow and uncharitable to the side you were arguing against. For instance, you go on at length about how disagreements about value are present and intuitions are not consistent across cultures and history, but I don't see how this is supposed to be any more convincing than talking about how many people in history have believed the earth is flat.

Okay, you've defeated the view that ethics is about the values all humans throughout history unanimously agree on. Now what about views that extrapolate not from perfectly consistent, unanimous and foundational intuitions or preferences, but from dynamics in human psychology that tend to shape initially inconsistent and incoherent intuitions to be more consistent and coherent -- dynamics, the end result of which can be hard to predict when iteratively applied and which can be misapplied in any given instance in a way analogous to applications of the dynamic over beliefs of favoring the simplest hypothesis consistent with the evidence.

By the way, I don't mean to claim that your conclusion is obviously wrong. I think someone favoring my type of view about ethics has a heavy burden of proof that you hint at, perhaps even one that has been underappreciated here. I just don't think your arguments here provide any support for your conclusion.

It seems to me that when you try to provide illustrative examples of how opposing views fail, you end up merely attacking straw men. Perhaps you'd do better if you tried to establish that any opposing views must have some property in common and that such a property dooms those views to failure. Or that opposing views must go one of two mutually exclusive and exhaustive routes in response to some central dilemma and both routes doom them to failure.

I really would like to see the most precise and cogent version of your argument here as I think it could prompt some important progress in filling in the gaps present in the sort of ethical view I favor.

Comment author: PhilGoetz 27 April 2010 02:35:10AM *  3 points [-]

Voted up for thought and effort. BTW, when I started writing this last week, I thought I always preferred organismal preferences.

the "organismal" preference would be for the fatty, sugary and salty taste and not foods that are actually fatty, sugary and salty.

That's a good point. But in the context of designing a Friendly AI that implements human values, it means we have to design the AI to like fatty, sugary, and salty tastes. Doesn't that seem odd to you? Maybe not the sort of thing we should be fighting to preserve?

The next two cases (fear of snakes and enjoying violence) can be dealt with similarly.

I don't see how. Are you going to kill the snakes, or not? Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn't that seem like building an inconsistency into your utopia? Wouldn't having a large number of such inconsistencies make utopia unstable, or lacking in integrity?

The last one is a little trickier but I think it can be addressed by a similar principle in which one value gets outweighed by a different value.

That's how I said we resolve all of these cases. Only it doesn't get outweighed by a single different value (the Prime Mover model); it gets outweighed by an entire, consistent, locally-optimal energy-minimizing set of values.

... but from dynamics, the end result of which can be hard to predict when iteratively applied and which can be misapplied in any given instance in a way analogous to applications of the dynamic over beliefs of favoring the simplest hypothesis consistent with the evidence.

This seems to be at the core of your comment, but I can't parse that sentence.

Perhaps you'd do better if you tried to establish that any opposing views must have some property in common and that such a property dooms those views to failure.

My emphasis is not on defeating opposing views (except the initial "preferences are propositions" / ethics-as-geometry view), but on setting out my view, and overcoming the objections to it that I came up with. For instance, when I talked about the intuitions of humans over time not being consistent, I wasn't attacking the view that human values are universal. I was overcoming the objection that we must have an algorithm for choosing evolutionary or organismal preferences, if we seem to agree on the right conclusion in most cases.

I just don't think your arguments here provide any support for your conclusion.

Which conclusion did you have in mind? The key conclusion is that value can't be unambiguously analyzed at a finer level of detail than the behavior, in the way that communication can't be unambiguously analyzed at a finer level of detail than the proposition. You haven't said anything about that.

(I just realized this makes me a structuralist above some level of detail, but a post-structuralist below it. Damn.)

I really would like to see the most precise and cogent version of your argument here as I think it could prompt some important progress in filling in the gaps present in the sort of ethical view I favor.

I don't think I will be any more precise or cogent (at least not as long as I'm not getting paid for it), nor that most readers would have preferred an even longer post. It took me two days to write this. If you don't think my arguments provide any support for my conclusions, the gap between us is too wide for further elaboration to be worthwhile.

What is the ethical view you favor?

Comment author: MichaelVassar 27 April 2010 04:51:49AM 4 points [-]

The FAI shouldn't like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.

"I don't see how. Are you going to kill the snakes, or not?"

Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.

" Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn't that seem like building an inconsistency into your utopia? Wouldn't having a large number of such inconsistencies make utopia unstable, or lacking in integrity?"

I don't understand the problem here. I don't mean that this is the correct solution, though it is the obvious solution, but rather that I don't see what the problem is. Ancients, who endorsed violence, generally didn't understand or believe in personal death anyway.

Comment author: PhilGoetz 27 April 2010 04:01:04PM *  1 point [-]

The FAI shouldn't like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.

You're going back to Eliezer's plan to build a single OS FAI. I should have clarified that I'm speaking of a plan to make AIs that have human values, for the sake of simplicity. (Which IMHO is a much, much better and safer plan.) Yes, if your goal is to build an OS FAI, that's correct. It doesn't get around the problem. Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That's a tragic waste of a universe.

Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.

Why extrapolate over different possible environments to make a decision in this environment? What does that buy you? Do you do that today?

EDIT: I think I see what you mean. You mean construct a distribution of possible extensions of existing preferences into different environments, and weigh each one according to some function. Such as internal consistency / energy minimization. Which, I would guess, is a preferred Bayesian method of doing CEV.

My intuition is that this won't work, because what you need to make it work is prior odds over events that have never been observed. I think we need to figure out a way to do the math to settle this.

I don't understand the problem here.

It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. It also seems like a recipe for unrest. And, from an engineering perspective, it's an ugly design. It's like building a car with extra controls that don't do anything.

Comment author: RobinHanson 28 April 2010 06:00:44PM *  7 points [-]

Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That's a tragic waste of a universe.

Well a key hard problem is: what features about ourselves that we like should we try to ensure endure into the future? Yes some features seem hopelessly provincial, while others seem more universally good, but how can we systematically judge this?

Comment author: Gavin 27 April 2010 11:41:10PM *  6 points [-]

It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted.

I think you're dancing around a bigger problem: once we have a sufficiently powerful AI, you and I are just a bunch of extra meat and buggy programming. Our physical and mental effort is just not needed or relevant. The purpose of FAI is to make sure that we get put out to pasture in a Friendly way. Or, depending on your mood, you could phrase it as living on in true immortality to watch the glory that we have created unfold.

It's like building a car with extra controls that don't do anything.

I think the more important question is what, in this analogy, does the car do?

Comment author: PhilGoetz 28 April 2010 02:08:33AM *  -1 points [-]

I get the impression that's part of the SIAI plan, but it seems to me that the plan entails that that's all there is, from then on, for the universe. The FAI needs control of all resources to prevent other AIs from being made; and the FAI has no other goals than its human-value-fulfilling goals; so it turns the universe into a rest home for humans.

That's just another variety of paperclipper.

If I'm wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways, please correct me, and explain how this is possible.

Comment author: LucasSloan 29 April 2010 05:23:45AM *  2 points [-]

If you think the future would be less than it could be if the universe was tiled with "rest homes for humans", why do you expect that an AI which was maximizing human utility would do that?

Comment author: PhilGoetz 29 April 2010 06:14:02PM *  1 point [-]

It depends how far meta you want to go when you say "human utility". Does that mean sex and chocolate, or complexity and continual novelty?

That's an ambiguity in CEV - the AI extrapolates human volition, but what's happening to the humans in the meanwhile? Do they stay the way they are now? Are they continuing to develop? If we suppose that human volition is incompatible with trilobite volition, that means we should expect the humans to evolve/develop new values that are incompatible with the AI's values extrapolated from humans.

Comment author: LucasSloan 29 April 2010 11:25:53PM 4 points [-]

If for some reason humans who liked to torture toddlers became very fit, future humans would evolve to possess values that resulted in many toddlers being tortured. I don't want that to happen, and am perfectly happy constraining future intelligences (even if they "evolve" from humans or even me) so they don't. And as always, if you think that you want the future to contain some value shifting, why don't you believe that an AI designed to fulfill the desires of humanity will cause/let that happen?

Comment author: Peter_de_Blanc 29 April 2010 02:28:09AM 3 points [-]

If I'm wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways

If you want the universe to develop in interesting ways, then why not explicitly optimize it for interestingness, however you define that?

Comment author: PhilGoetz 29 April 2010 04:10:36AM -1 points [-]

I'm not talking about what I want to do, I'm talking about what SIAI wants to do. What I want to do is incompatible with constructing a singleton and telling it to extrapolate human values and run the universe according to them; as I have explained before.

Comment author: Gavin 28 April 2010 05:01:05AM 0 points [-]

I think your article successfully argued that we're not going to find some "ultimate" set of values that is correct or can be proven. In the end, the programmers of an FAI are going to choose a set of values that they like.

The good news is that human values can include things like generosity, non-interference, personal development, and exploration. "Human values" could even include tolerance of existential risk in return for not destroying other species. Any way that you want an FAI to be is a human value. We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.

But no matter how noble and farsighted the programmers are, to those who don't share the programmers' values, the FAI will be a paperclipper.

We're all paperclippers, and in the true prisoners' dilemma, we always defect.

Comment author: PhilGoetz 28 April 2010 01:40:10PM *  3 points [-]

Upvoted, but -

We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.

Eliezer needs to say whether he wants to do this, or to save humans. I don't think you can have it both ways. The OS FAI does not have ambitions or curiousity of its own.

But no matter how noble and farsighted the programmers are, to those who don't share the programmers' values, the FAI will be a paperclipper.

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.

I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely. (More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

Comment author: Nick_Tarleton 29 April 2010 01:24:39AM *  7 points [-]

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals.

It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can't tell if this is what you mean. (When you point out that "trying to take over the universe isn't utility-maximizing under many circumstances", it sounds like you're thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)

I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely.

I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?

(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

This sounds plausible and bad. Can you think of some other examples?

Comment author: Matt_Simpson 28 April 2010 10:49:55PM 7 points [-]

(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

This is probably just availability bias. These scenarios are easy to recall because we've read about them, and we're psychologically primed for them just by coming to this website.

Comment author: thomblake 28 April 2010 01:53:07PM *  4 points [-]

Eliezer needs to say whether he wants to do this

He did. FAI should not be a person - it's just an optimization process.

ETA: link

Comment author: PhilGoetz 29 April 2010 01:23:48AM -1 points [-]

Thanks! I'll take that as definitive.

Comment author: Gavin 28 April 2010 10:32:42PM *  2 points [-]

The assumption of a single AI comes from an assumption that an AI will have zero risk tolerance. It follows from that assumption that the most powerful AI will destroy or limit all other sentient beings within reach.

There's no reason that an AI couldn't be programmed to have tolerance for risk. Pursuing a lot of the more noble human values may require it.

I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.

Comment author: mattnewport 28 April 2010 11:04:36PM *  -1 points [-]

I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.

If I thought they had settled on this and that they were likely to succeed I would probably feel it was very important to work to destroy them. I'm currently not sure about the first and think the second is highly unlikely so it is not a pressing concern.

Comment author: thomblake 28 April 2010 02:19:48PM 1 point [-]

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.

It is, however, necessary for an AI to do something of the sort if it's trying to maximize any sort of utility. Otherwise, risk / waste / competition will cause the universe to be less than optimal.

Comment author: PhilGoetz 29 April 2010 01:19:47AM *  0 points [-]

Trying to take over the universe isn't utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources, or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.

By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world - is it purely stupidity that it doesn't?

The modern world is more peaceful, more enjoyable, and richer because we've learned that utility is better maximized by cooperation than by everyone trying to rule the world. Why does this lesson not apply to AIs?

Comment author: PhilGoetz 28 April 2010 01:33:48PM -1 points [-]

I'm getting lost in my own argument.

If Michael was responding to the problem that human preference systems can't be unambiguously extended into new environments, then my chronologically first response applies, but needs more thought; and I'm embarrassed that I didn't anticipate that particular response.

If he was responding to the problem that human preferences as described by their actions, and as described by their beliefs, are not the same, then my second response applies.

Comment author: PhilGoetz 28 April 2010 02:37:18AM *  -1 points [-]

Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.

If a person could label each preference system "evolutionary" or "organismal", meaning which value they preferred, then you could use that to help you extrapolate their values into novel environments.

The problem is that the person is reasoning only over the propositional part of their values. They don't know what their values are; they know only what the contribution within the propositional part is. That's one of the main points of my post. The values they come up with will not always be the values they actually implement.

If you define a person's values as being what they believe their values are, then, sure, most of what I posted will not be a problem. I think you're missing the point of the post, and are using the geometry-based definition of identity.

If you can't say whether the right value to choose in each case is evolutionary or organismal, then extrapolating into future environments isn't going to help. You can't gain information to make a decision in your current environment by hypothesizing an extension to your environment, making observations in that imagined environment, and using them to refine your current-environment estimates. That's like trying to refine your estimate of an asteroid's current position by simulating its movement into the future, and then tracking backwards along that projected trajectory to the present. It's trying to get information for free. You can't do that.

(I think what I said under "Fuzzy values and fancy math don't help" is also relevant.)