PhilGoetz comments on Only humans can have human values - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (159)
You're going back to Eliezer's plan to build a single OS FAI. I should have clarified that I'm speaking of a plan to make AIs that have human values, for the sake of simplicity. (Which IMHO is a much, much better and safer plan.) Yes, if your goal is to build an OS FAI, that's correct. It doesn't get around the problem. Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That's a tragic waste of a universe.
Why extrapolate over different possible environments to make a decision in this environment? What does that buy you? Do you do that today?
EDIT: I think I see what you mean. You mean construct a distribution of possible extensions of existing preferences into different environments, and weigh each one according to some function. Such as internal consistency / energy minimization. Which, I would guess, is a preferred Bayesian method of doing CEV.
My intuition is that this won't work, because what you need to make it work is prior odds over events that have never been observed. I think we need to figure out a way to do the math to settle this.
It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. It also seems like a recipe for unrest. And, from an engineering perspective, it's an ugly design. It's like building a car with extra controls that don't do anything.
Well a key hard problem is: what features about ourselves that we like should we try to ensure endure into the future? Yes some features seem hopelessly provincial, while others seem more universally good, but how can we systematically judge this?
It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted.
I think you're dancing around a bigger problem: once we have a sufficiently powerful AI, you and I are just a bunch of extra meat and buggy programming. Our physical and mental effort is just not needed or relevant. The purpose of FAI is to make sure that we get put out to pasture in a Friendly way. Or, depending on your mood, you could phrase it as living on in true immortality to watch the glory that we have created unfold.
It's like building a car with extra controls that don't do anything.
I think the more important question is what, in this analogy, does the car do?
I get the impression that's part of the SIAI plan, but it seems to me that the plan entails that that's all there is, from then on, for the universe. The FAI needs control of all resources to prevent other AIs from being made; and the FAI has no other goals than its human-value-fulfilling goals; so it turns the universe into a rest home for humans.
That's just another variety of paperclipper.
If I'm wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways, please correct me, and explain how this is possible.
If you think the future would be less than it could be if the universe was tiled with "rest homes for humans", why do you expect that an AI which was maximizing human utility would do that?
It depends how far meta you want to go when you say "human utility". Does that mean sex and chocolate, or complexity and continual novelty?
That's an ambiguity in CEV - the AI extrapolates human volition, but what's happening to the humans in the meanwhile? Do they stay the way they are now? Are they continuing to develop? If we suppose that human volition is incompatible with trilobite volition, that means we should expect the humans to evolve/develop new values that are incompatible with the AI's values extrapolated from humans.
If for some reason humans who liked to torture toddlers became very fit, future humans would evolve to possess values that resulted in many toddlers being tortured. I don't want that to happen, and am perfectly happy constraining future intelligences (even if they "evolve" from humans or even me) so they don't. And as always, if you think that you want the future to contain some value shifting, why don't you believe that an AI designed to fulfill the desires of humanity will cause/let that happen?
If you want the universe to develop in interesting ways, then why not explicitly optimize it for interestingness, however you define that?
I'm not talking about what I want to do, I'm talking about what SIAI wants to do. What I want to do is incompatible with constructing a singleton and telling it to extrapolate human values and run the universe according to them; as I have explained before.
I think your article successfully argued that we're not going to find some "ultimate" set of values that is correct or can be proven. In the end, the programmers of an FAI are going to choose a set of values that they like.
The good news is that human values can include things like generosity, non-interference, personal development, and exploration. "Human values" could even include tolerance of existential risk in return for not destroying other species. Any way that you want an FAI to be is a human value. We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism.
But no matter how noble and farsighted the programmers are, to those who don't share the programmers' values, the FAI will be a paperclipper.
We're all paperclippers, and in the true prisoners' dilemma, we always defect.
Upvoted, but -
Eliezer needs to say whether he wants to do this, or to save humans. I don't think you can have it both ways. The OS FAI does not have ambitions or curiousity of its own.
I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan.
I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely. (More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)
It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can't tell if this is what you mean. (When you point out that "trying to take over the universe isn't utility-maximizing under many circumstances", it sounds like you're thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)
I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?
This sounds plausible and bad. Can you think of some other examples?
This is probably just availability bias. These scenarios are easy to recall because we've read about them, and we're psychologically primed for them just by coming to this website.
He did. FAI should not be a person - it's just an optimization process.
ETA: link
Thanks! I'll take that as definitive.
The assumption of a single AI comes from an assumption that an AI will have zero risk tolerance. It follows from that assumption that the most powerful AI will destroy or limit all other sentient beings within reach.
There's no reason that an AI couldn't be programmed to have tolerance for risk. Pursuing a lot of the more noble human values may require it.
I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.
If I thought they had settled on this and that they were likely to succeed I would probably feel it was very important to work to destroy them. I'm currently not sure about the first and think the second is highly unlikely so it is not a pressing concern.
It is, however, necessary for an AI to do something of the sort if it's trying to maximize any sort of utility. Otherwise, risk / waste / competition will cause the universe to be less than optimal.
Trying to take over the universe isn't utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources, or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents.
By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world - is it purely stupidity that it doesn't?
The modern world is more peaceful, more enjoyable, and richer because we've learned that utility is better maximized by cooperation than by everyone trying to rule the world. Why does this lesson not apply to AIs?
Just what do you think "controlling the universe" means? My cat controls the universe. It probably doesn't exert this control in a way anywhere near optimal to most sensible preferences, but it does have an impact on everything. How do we decide that a superintelligence "controls the universe", while my cat "doesn't"? The only difference is in what kind of the universe we have, which preference it is optimized for. Whatever you truly want, roughly means preferring some states of the universe to other states, and making the universe better for you means controlling it towards your preference. The better the universe, the more specifically its state is specified, the stronger the control. These concepts are just different aspects of the same phenomenon.
Obviously, if you can't take over the world, then trying is stupid. If you can (for example, if you're the first SAI to go foom) then it's a different story.
Taking over the world does not require you to destroy all other life if that is contrary to your utility function. I'm not sure what you mean regarding future-discounting; if reorganizing the whole damn universe isn't worth it, then I doubt anything else will be in any case.
It should apply to AIs if you think that there will be multiple AIs that are at roughly the same capability level. A common assumption here is that as soon as there is a single general AI it will quickly improve to the point where it is so far beyond everything else in capability that there capabilities won't matter. Frankly, I find this assumption to be highly questionable and very optimistic about potential fooming rates among other problems, but if one accepts the idea it makes some sense. The analogy might be to the hypothetical situation of the US instead of having just the strongest military but also having monopolies on cheap fusion power, an immortality pill, and having a bunch of superheroes on their side. The distinction between the US controlling everything and the US having direct military control might quickly become irrelevant.
Edit: Thinking about the rate of fooming issue. I'd be really interested if a fast-foom proponent would be willing to put together a top-level post outlining why fooming will happen so quickly.
Eliezer and Robin had a lengthy debate on this perhaps a year ago. I don't remember if it's on OB or LW. Robin believes in no foom, using economic arguments.
The people who design the first AI could build a large number of AIs in different locations and turn them on at the same time. This plan would have a high probability of leading to disaster; but so do all the other plans that I've heard.
http://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-Foom_Debate
For one, the U.S. doesn't have the military strength. Russia still has enough nuclear warheads and ICBMs to prevent that. (And we suck at being occupying forces.)
I think the situation of the US is similar to a hypothesized AI. Sure, Russia could kill a lot of Americans. But we would probably "win" in the end. By all the logic I've heard in this thread, and in others lately about paperclippers, the US should rationally do whatever it has to to be the last man standing.
Well, also the US isn't a single entity that agrees on all its goals. Some of us for example place a high value on human life. And we vote. Even if the leadership of the United States wanted to wipe out the rest of the planet, there would be limits to how much they could do before others would step in.
Also, most forms of modern human morality strongly disfavor large scale wars simply to impose one's views. If our AI doesn't have that sort of belief then that's not an issue. And if we restrict ourselves to just the issue of other AIs, I'm not sure if I gave a smart AI my morals and preferences it would necessarily see anything wrong with making sure that no other general smart AIs were created.