MichaelVassar comments on Only humans can have human values - Less Wrong

34 Post author: PhilGoetz 26 April 2010 06:57PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (159)

You are viewing a single comment's thread. Show more comments above.

Comment author: MichaelVassar 27 April 2010 04:51:49AM 4 points [-]

The FAI shouldn't like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.

"I don't see how. Are you going to kill the snakes, or not?"

Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.

" Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn't that seem like building an inconsistency into your utopia? Wouldn't having a large number of such inconsistencies make utopia unstable, or lacking in integrity?"

I don't understand the problem here. I don't mean that this is the correct solution, though it is the obvious solution, but rather that I don't see what the problem is. Ancients, who endorsed violence, generally didn't understand or believe in personal death anyway.

Comment author: PhilGoetz 27 April 2010 04:01:04PM *  1 point [-]

The FAI shouldn't like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever.

You're going back to Eliezer's plan to build a single OS FAI. I should have clarified that I'm speaking of a plan to make AIs that have human values, for the sake of simplicity. (Which IMHO is a much, much better and safer plan.) Yes, if your goal is to build an OS FAI, that's correct. It doesn't get around the problem. Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That's a tragic waste of a universe.

Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.

Why extrapolate over different possible environments to make a decision in this environment? What does that buy you? Do you do that today?

EDIT: I think I see what you mean. You mean construct a distribution of possible extensions of existing preferences into different environments, and weigh each one according to some function. Such as internal consistency / energy minimization. Which, I would guess, is a preferred Bayesian method of doing CEV.

My intuition is that this won't work, because what you need to make it work is prior odds over events that have never been observed. I think we need to figure out a way to do the math to settle this.

I don't understand the problem here.

It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. It also seems like a recipe for unrest. And, from an engineering perspective, it's an ugly design. It's like building a car with extra controls that don't do anything.

Comment author: RobinHanson 28 April 2010 06:00:44PM *  7 points [-]

Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That's a tragic waste of a universe.

Well a key hard problem is: what features about ourselves that we like should we try to ensure endure into the future? Yes some features seem hopelessly provincial, while others seem more universally good, but how can we systematically judge this?

Comment author: Gavin 27 April 2010 11:41:10PM *  6 points [-]

It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted.

I think you're dancing around a bigger problem: once we have a sufficiently powerful AI, you and I are just a bunch of extra meat and buggy programming. Our physical and mental effort is just not needed or relevant. The purpose of FAI is to make sure that we get put out to pasture in a Friendly way. Or, depending on your mood, you could phrase it as living on in true immortality to watch the glory that we have created unfold.

It's like building a car with extra controls that don't do anything.

I think the more important question is what, in this analogy, does the car do?

Comment author: PhilGoetz 28 April 2010 02:08:33AM *  -1 points [-]

I get the impression that's part of the SIAI plan, but it seems to me that the plan entails that that's all there is, from then on, for the universe. The FAI needs control of all resources to prevent other AIs from being made; and the FAI has no other goals than its human-value-fulfilling goals; so it turns the universe into a rest home for humans.

That's just another variety of paperclipper.

If I'm wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways, please correct me, and explain how this is possible.

Comment author: LucasSloan 29 April 2010 05:23:45AM *  2 points [-]

If you think the future would be less than it could be if the universe was tiled with "rest homes for humans", why do you expect that an AI which was maximizing human utility would do that?

Comment author: PhilGoetz 29 April 2010 06:14:02PM *  1 point [-]

It depends how far meta you want to go when you say "human utility". Does that mean sex and chocolate, or complexity and continual novelty?

That's an ambiguity in CEV - the AI extrapolates human volition, but what's happening to the humans in the meanwhile? Do they stay the way they are now? Are they continuing to develop? If we suppose that human volition is incompatible with trilobite volition, that means we should expect the humans to evolve/develop new values that are incompatible with the AI's values extrapolated from humans.

Comment author: LucasSloan 29 April 2010 11:25:53PM 4 points [-]

If for some reason humans who liked to torture toddlers became very fit, future humans would evolve to possess values that resulted in many toddlers being tortured. I don't want that to happen, and am perfectly happy constraining future intelligences (even if they "evolve" from humans or even me) so they don't. And as always, if you think that you want the future to contain some value shifting, why don't you believe that an AI designed to fulfill the desires of humanity will cause/let that happen?

Comment author: Peter_de_Blanc 29 April 2010 02:28:09AM 3 points [-]

If I'm wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways

If you want the universe to develop in interesting ways, then why not explicitly optimize it for interestingness, however you define that?

Comment author: PhilGoetz 29 April 2010 04:10:36AM -1 points [-]

I'm not talking about what I want to do, I'm talking about what SIAI wants to do. What I want to do is incompatible with constructing a singleton and telling it to extrapolate human values and run the universe according to them; as I have explained before.

Comment author: Gavin 28 April 2010 05:01:05AM 0 points [-]

I think your article successfully argued that we're not going to find some "ultimate" set of values that is correct or can be proven. In the end, the programmers of an FAI are going to choose a set of values that they like.

The good news is that human values can include things like generosity, non-interference, personal development, and exploration. "Human values" could even include tolerance of existential risk in return for not destroying other species. Any way that you want an FAI to be is a human value. We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.

But no matter how noble and farsighted the programmers are, to those who don't share the programmers' values, the FAI will be a paperclipper.

We're all paperclippers, and in the true prisoners' dilemma, we always defect.

Comment author: PhilGoetz 28 April 2010 01:40:10PM *  3 points [-]

Upvoted, but -

We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.

Eliezer needs to say whether he wants to do this, or to save humans. I don't think you can have it both ways. The OS FAI does not have ambitions or curiousity of its own.

But no matter how noble and farsighted the programmers are, to those who don't share the programmers' values, the FAI will be a paperclipper.

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.

I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely. (More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

Comment author: Nick_Tarleton 29 April 2010 01:24:39AM *  7 points [-]

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals.

It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can't tell if this is what you mean. (When you point out that "trying to take over the universe isn't utility-maximizing under many circumstances", it sounds like you're thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)

I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely.

I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?

(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

This sounds plausible and bad. Can you think of some other examples?

Comment author: Matt_Simpson 28 April 2010 10:49:55PM 7 points [-]

(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

This is probably just availability bias. These scenarios are easy to recall because we've read about them, and we're psychologically primed for them just by coming to this website.

Comment author: thomblake 28 April 2010 01:53:07PM *  4 points [-]

Eliezer needs to say whether he wants to do this

He did. FAI should not be a person - it's just an optimization process.

ETA: link

Comment author: PhilGoetz 29 April 2010 01:23:48AM -1 points [-]

Thanks! I'll take that as definitive.

Comment author: Gavin 28 April 2010 10:32:42PM *  2 points [-]

The assumption of a single AI comes from an assumption that an AI will have zero risk tolerance. It follows from that assumption that the most powerful AI will destroy or limit all other sentient beings within reach.

There's no reason that an AI couldn't be programmed to have tolerance for risk. Pursuing a lot of the more noble human values may require it.

I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.

Comment author: mattnewport 28 April 2010 11:04:36PM *  -1 points [-]

I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.

If I thought they had settled on this and that they were likely to succeed I would probably feel it was very important to work to destroy them. I'm currently not sure about the first and think the second is highly unlikely so it is not a pressing concern.

Comment author: thomblake 28 April 2010 02:19:48PM 1 point [-]

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.

It is, however, necessary for an AI to do something of the sort if it's trying to maximize any sort of utility. Otherwise, risk / waste / competition will cause the universe to be less than optimal.

Comment author: PhilGoetz 29 April 2010 01:19:47AM *  0 points [-]

Trying to take over the universe isn't utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources, or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.

By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world - is it purely stupidity that it doesn't?

The modern world is more peaceful, more enjoyable, and richer because we've learned that utility is better maximized by cooperation than by everyone trying to rule the world. Why does this lesson not apply to AIs?

Comment author: Vladimir_Nesov 29 April 2010 05:38:37PM 4 points [-]

Just what do you think "controlling the universe" means? My cat controls the universe. It probably doesn't exert this control in a way anywhere near optimal to most sensible preferences, but it does have an impact on everything. How do we decide that a superintelligence "controls the universe", while my cat "doesn't"? The only difference is in what kind of the universe we have, which preference it is optimized for. Whatever you truly want, roughly means preferring some states of the universe to other states, and making the universe better for you means controlling it towards your preference. The better the universe, the more specifically its state is specified, the stronger the control. These concepts are just different aspects of the same phenomenon.

Comment author: MugaSofer 30 December 2012 09:15:35PM 1 point [-]

Trying to take over the universe isn't utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources

Obviously, if you can't take over the world, then trying is stupid. If you can (for example, if you're the first SAI to go foom) then it's a different story.

or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.

Taking over the world does not require you to destroy all other life if that is contrary to your utility function. I'm not sure what you mean regarding future-discounting; if reorganizing the whole damn universe isn't worth it, then I doubt anything else will be in any case.

Comment author: JoshuaZ 29 April 2010 01:29:21AM *  1 point [-]

It should apply to AIs if you think that there will be multiple AIs that are at roughly the same capability level. A common assumption here is that as soon as there is a single general AI it will quickly improve to the point where it is so far beyond everything else in capability that there capabilities won't matter. Frankly, I find this assumption to be highly questionable and very optimistic about potential fooming rates among other problems, but if one accepts the idea it makes some sense. The analogy might be to the hypothetical situation of the US instead of having just the strongest military but also having monopolies on cheap fusion power, an immortality pill, and having a bunch of superheroes on their side. The distinction between the US controlling everything and the US having direct military control might quickly become irrelevant.

Edit: Thinking about the rate of fooming issue. I'd be really interested if a fast-foom proponent would be willing to put together a top-level post outlining why fooming will happen so quickly.

Comment author: PhilGoetz 29 April 2010 04:17:12AM 1 point [-]

Eliezer and Robin had a lengthy debate on this perhaps a year ago. I don't remember if it's on OB or LW. Robin believes in no foom, using economic arguments.

The people who design the first AI could build a large number of AIs in different locations and turn them on at the same time. This plan would have a high probability of leading to disaster; but so do all the other plans that I've heard.

Comment author: CronoDAS 29 April 2010 01:40:04AM 1 point [-]

By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world - is it purely stupidity that it doesn't?

For one, the U.S. doesn't have the military strength. Russia still has enough nuclear warheads and ICBMs to prevent that. (And we suck at being occupying forces.)

Comment author: PhilGoetz 29 April 2010 04:13:03AM -2 points [-]

I think the situation of the US is similar to a hypothesized AI. Sure, Russia could kill a lot of Americans. But we would probably "win" in the end. By all the logic I've heard in this thread, and in others lately about paperclippers, the US should rationally do whatever it has to to be the last man standing.

Comment author: PhilGoetz 28 April 2010 01:33:48PM -1 points [-]

I'm getting lost in my own argument.

If Michael was responding to the problem that human preference systems can't be unambiguously extended into new environments, then my chronologically first response applies, but needs more thought; and I'm embarrassed that I didn't anticipate that particular response.

If he was responding to the problem that human preferences as described by their actions, and as described by their beliefs, are not the same, then my second response applies.

Comment author: PhilGoetz 28 April 2010 02:37:18AM *  -1 points [-]

Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves.

If a person could label each preference system "evolutionary" or "organismal", meaning which value they preferred, then you could use that to help you extrapolate their values into novel environments.

The problem is that the person is reasoning only over the propositional part of their values. They don't know what their values are; they know only what the contribution within the propositional part is. That's one of the main points of my post. The values they come up with will not always be the values they actually implement.

If you define a person's values as being what they believe their values are, then, sure, most of what I posted will not be a problem. I think you're missing the point of the post, and are using the geometry-based definition of identity.

If you can't say whether the right value to choose in each case is evolutionary or organismal, then extrapolating into future environments isn't going to help. You can't gain information to make a decision in your current environment by hypothesizing an extension to your environment, making observations in that imagined environment, and using them to refine your current-environment estimates. That's like trying to refine your estimate of an asteroid's current position by simulating its movement into the future, and then tracking backwards along that projected trajectory to the present. It's trying to get information for free. You can't do that.

(I think what I said under "Fuzzy values and fancy math don't help" is also relevant.)