gRR comments on Holden's Objection 1: Friendliness is dangerous - Less Wrong

11 Post author: PhilGoetz 18 May 2012 12:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (428)

You are viewing a single comment's thread. Show more comments above.

Comment author: gRR 22 May 2012 09:28:35PM 0 points [-]

Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:

There is a unique X, such that for all living people P, CEV<P> = X.

Assuming there is no such X, there could still be a plausible claim:

Y is not empty, where Y = Intersection{over all living people P} of CEV<P>.

And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever "evolving" will happen due to AI's influence is at least agreed upon by everyone('s CEV).

Comment author: DanArmak 22 May 2012 09:44:05PM 0 points [-]

I can buy, tentatively, that most people might one day agree on a very few things. If that's what you mean by Y, fine, but it restricts the FAI to doing almost nothing. I'd much rather build a FAI that implemented more values shared by fewer people (as long as those people include myself). I expect so would most people, including the ones hypothetically building the FAI - otherwise they'd expect not to benefit much from building it, since it would find very little consensus to implement! So the first team to successfully build FAI+CEV will choose to launch it as a CEV<themselves> rather than CEV<humanity>.

Comment author: gRR 22 May 2012 11:52:19PM 0 points [-]

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV<humanity> instead of CEV<themselves> would probably get more resources and would finish first.

Comment author: DanArmak 24 May 2012 07:38:49AM 1 point [-]

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it.

So what makes you think everybody's CEV would eventually agree on anything more?

A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better. (At least, we can if we're speculating about building a FAI to execute any well-defined plan we can come up with.)

(I assume here that removing existential risks is one such thing.)

I'm not even sure of that. There are people who believe religiously that End Times must come when everyone must die, and some of them want to hurry that along by actually killing people. And the meaning of "existential risk" is up for grabs anyway - does it preclude evolution into non-humans, leaving no members of original human species in existence? Does it preclude the death of everyone alive today, if some humans are always alive?

Sure, it's unlikely or it might look like a contrived example to you. But are you really willing to precommit the future light cone, the single shot at creating an FAI (singleton), to whatever CEV might happen to be, without actually knowing what CEV produces and having an abort switch? That's one of the defining points of CEV: that you can't know it correctly in advance, or you would just program it directly as a set of goals instead of building a CEV-calculating machine.

And an FAI team that creditably precommitted to implementing CEV<humanity> instead of CEV<themselves> would probably get more resources and would finish first.

This seems wrong. A FAI team that precommitted to implementing CEV<its funders> would definitely get the most funds. Even a team that precommitted to CEV<the team itself> might get more funds than CEV<humanity>, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

Comment author: gRR 24 May 2012 10:37:21AM *  0 points [-]

A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better.

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.

There are people who believe religiously that End Times must come

Yes, but we assume they are factually wrong, and so their CEV would fix this.

A FAI team that precommitted to implementing CEV<its funders> would definitely get the most funds. Even a team that precommitted to CEV<the team itself> might get more funds than CEV<humanity>, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

Not bloody likely. I'm going to oppose your team, discourage your funders, and bomb your headquarters - because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled.

You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won't interfere.

Comment author: DanArmak 24 May 2012 01:06:56PM *  0 points [-]

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference).

No. Any FAI (ETA: or other AGI) has to be a singleton to last for long. Otherwise I can build a uFAI that might replace it.

Suppose your AI only does a few things that everyone agrees on, but otherwise "doesn't interfere". Then I can build another AI, which implements values people don't agree on. Your AI must either interfere, or be resigned to not being very relevant in determining the future.

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority? Then it's at best a nice-to-have, but most likely useless. After people successfully build one AGI, they will quickly reuse the knowledge to build more. The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs, to safeguard its utility function. This is unavoidable. With truly powerful AGI, preventing new AIs from gaining power is the only stable solution.

Or, better yet, you can try talking to the other half of the humans.

Yeah, that's worked really well for all of human history so far.

Yes, but we assume they are factually wrong, and so their CEV would fix this.

First, they may not factually wrong about the events they predict in the real world - like everyone dying - just wrong about the supernatural parts. (Especially if they're themselves working to bring these events to pass.) IOW, this may not be a factual belief to be corrected, but a desired-by-them future that others like me and you would wish to prevent.

Second, you agreed the CEV of groups of people may contain very few things that they really agree on, so you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with.

Not bloody likely. I'm going to oppose your team, discourage your funders, and bomb your headquarters - because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled. You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won't interfere.

I have no idea what your FAI will do, because even if you make no mistakes in building it, you yourself don't know ahead of time what the CEV will work out to. If you did, you'd just plug those values into the AI directly instead of calculating the CEV. So I'll want to bomb you anyway, if that increases my chances of being the first to build a FAI. Our morals are indeed different, and since there are no objectively distinguished morals, the difference goes both ways.

Of course I will dedicate my resources to first bombing people who are building even more inimical AIs. But if I somehow knew you and I were the only ones in the race, I'd politely ask you to join me or desist or be stopped by force.

As long as we're discussing bombing, consider that the SIAI isn't and won't be in a position to bomb anyone. OTOH, if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It's the ultimate winner-take-all arms race.

This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone's attempts to build any kind of Friendliness theory. Furthermore, a state military planning to build AGI singleton won't stop to think for long before wiping your civilian, unprotected FAI theory research center off the map. Either you go underground or you cooperate with a powerful player (the state on whose territory you live, presumably). Or maybe states and militaries won't wise up in time, and some private concern really will build the first AGI. Which may be better or worse depending on what they build.

Eventually, unless the whole world is bombed back into pre-computer-age tech, someone very probably will build an AGI of some kind. The SIAI idea is (possibly) to invent Friendliness theory and publish it widely, so that whoever builds that AGI, if they want it to be Friendly (at least to themselves!), they will have a relatively cheap and safe implementation to use. But for someone actually trying to build an AGI, two obvious rules are:

  1. Absolute secrecy, or you get bombed right away.
  2. Do absolutely whatever it takes to successfully launch as early as possible, and make your AI a singleton controlled by yourself or by nobody - regardless of your and the AI's values.
Comment author: gRR 24 May 2012 02:00:21PM 1 point [-]

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?

The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

I have no idea what your FAI will do

But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

there are no objectively distinguished morals

But there may be a partial ordering between morals, such that X<Y iff all "interfering" actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have:

~Endorses(A1, CEV<A2>) ~Endorses(A2, CEV<A1>) Endorses(A1, CEV<A1+A2>)
Endorses(A2, CEV<A1+A2>)

[assuming Endorses(A, X) implies FAI<X> does not perform any non-interfering action disagreeable for A]

if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It's the ultimate winner-take-all arms race.
This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone's attempts to build any kind of Friendliness theory.

Well, don't you think this is just ridiculous? Does it look like the most rational behavior? Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

Comment author: DanArmak 24 May 2012 02:37:39PM 0 points [-]

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no.

I don't understand what you mean by "fundamentally different". You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case.

On what moral grounds would it do the prevention?

Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it's not ideal) over your view of CEV.

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is - which restricts those other AIs' abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good.

And yet it doesn't limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world's population who today believe in the supernatural will not consent to having that belief "fixed". Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience).

I have no idea what your FAI will do But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

Yes, but I don't know what I would approve of if I were "more intelligent" (a very ill defined term). And if you calculate that something, according to your definition of intelligence, and present me with the result, I might well reject that result even if I believe in your extrapolation process. I might well say: the future isn't predetermined. You can't calculate what I necessarily will become. You just extrapolated a creature I might become, which also happens to be more intelligent. But there's nothing in my moral system that says I should adopt the values of someone else because they are more intelligent. If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature! I might even choose to forego the kind of increased intelligence that causes such an undsired change in my values.

Short version: "what I would want if I were more intelligent (according to some definition)" isn't the same as "what I will likely want in the future", because there's no reason for me to grow in intelligence (by that definition) if I suspect it would twist my values. So you can't apply the heuristic of "if I know what I'm going to think tomorrow, I might as well think it today".

~Endorses(A1, CEV<A2>) ~Endorses(A2, CEV<A1>) Endorses(A1, CEV<A1+A2>) Endorses(A2, CEV<A1+A2>)

I think you may be missing a symbol there? If not, I can't parse it... Can you spell out for me what it means to just write the last three Endorses(...) clauses one after the other?

Does it look like the most rational behavior?

It may be quite rational for everyone individually, depending on projected payoffs. Unlike a PD, starting positions aren't symmetrical and players' progress/payoffs are not visible to other players. So saying "just cooperate" doesn't immediately apply.

Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

How can a state or military precommit to not having a supersecret project to develop a private AGI?

And while it's beneficial for some players to join in a cooperative effort, it may well be that a situation of several competing leagues (or really big players working alone) develops and is also stable. It's all laid over the background of existing political, religious and personal enmities and rivalries - even before we come to actual disagreements over what the AI should value.

Comment author: wedrifid 26 May 2012 02:45:54AM 0 points [-]

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere.

This assumes that CEV uses something along the lines of a simulated vote as an aggregation mechanism. Currently the method of aggregation is undefined so we can't say this with confidence - certainly not as something obvious.

Comment author: DanArmak 26 May 2012 08:29:36AM 0 points [-]

I agree. However, if the CEV doesn't privilege any value separately from how many people value it how much (in EV), and if the EV of a large majority values killing a small minority (whose EV is of course opposed), and if you have protection against both positive and negative utility monsters (so it's at least not obvious and automatic that the negative value of the minority would outweigh the positive value of the majority) - then my scenario seems to me to be plausible, and an explanation is necessary as to how it might be prevented.

Of course you could say that until CEV is really formally specified, and we know how the aggregation works, this explanation cannot be produced.

Comment author: gRR 24 May 2012 04:25:25PM *  0 points [-]

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves.

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience)

Extrapolated volition is based on objective truth, by definition.

If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature!

The process of extrapolation takes this into account.

I think you may be missing a symbol there? If not, I can't parse it...

Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV<other>, but endorses CEV<both>.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

Comment author: DanArmak 24 May 2012 06:39:05PM *  0 points [-]

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons? Such as "if we kill them, we will benefit by dividing their resources among ourselves"?

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

So you're saying your version of CEV will forcibly update everyone's beliefs and values to be "factual" and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it's backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made?

If your answer is "yes, it will do that", then I would not call your AI a Friendly one at all.

Extrapolated volition is based on objective truth, by definition.

My understanding of the CEV doc differs from yours. It's not a precise or complete spec, and it looks like both readings can be justified.

The doc doesn't (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn't mean it removes all non-truth or all beliefs that "aren't even wrong" from the original volition. If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

But as long as we're discussing your vision of CEV, I can only repeat what I said above - if it's going to modify people by force like this, I think it's unFriendly and if it were up to me, would not launch such an AI.

I meant four independent clauses: each of the agents does not endorse CEV<other>, but endorses CEV<both>.

Understood. But I don't see how this partial ordering changes what I had described.

Let's say I'm A1 and you're A2. We would both prefer a mutual CEV than a CEV of the other only. But each of us would prefer even more a CEV of himself only. So each of us might try to bomb the other first if he expected to get away without retaliation. That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?

Comment author: thomblake 24 May 2012 01:31:13PM 0 points [-]

No. Any FAI has to be a singleton.

I'm still skeptical of this. If you think of FAI as simply AI that is "safe" - one that does not automatically kill us all (or other massive disutility), relative to the status quo - then plenty of non-singletons are FAI.

Of course, by that definition the 'F' looks like the easy part. Rocks are Friendly.

Comment author: DanArmak 24 May 2012 01:44:11PM *  0 points [-]

I didn't mean that being a singleton is a precondition to FAI-hood. I meant that any AGI, friendly or not, that doesn't prevent another AGI from rising will have to fight all the time, for its life and for the complete fulfillment of its utility function, and eventually it will lose; and a singleton is the obvious stable solution. Edited to clarify.

Rocks are Friendly.

Not if I throw them at people...

Comment author: TheOtherDave 24 May 2012 02:00:06PM 1 point [-]

Are you suggesting that an AGI that values anything at all is incapable of valuing the existence of other AGIs, or merely that this is sufficiently unlikely as to not be worth considering?

Comment author: DanArmak 24 May 2012 02:07:51PM 0 points [-]

It can certainly value them, and create them, cooperate and trade, etc. etc. There are two exceptions that make such valuing and cooperation take second place.

First: an uFAI is just as unfriendly and scary to other AIs as to humans. An AI will therefore try to prevent other AIs from achieving dangerous power unless it is very sure of their current and future goals.

Second: an AI created by humans (plus or minus self-modifications) with an explicit value/goal system of the form "the universe should be THIS way", will try to stop any and all agents that try to interfere with shaping the universe as it wishes. And the foremost danger in this category is - other AIs created in the same way but with different goals.

Comment author: DanArmak 24 May 2012 08:30:40AM 0 points [-]

I want to point out that all of my objections are acknowledged (not dismissed, and not fully resolved) in the actual CEV document - which is very likely hopelessly outdated by now to Eliezer and the SIAI, but they deliberately don't publish anything newer (and I can guess at some of the reasons).

Which is why when I see people advocating CEV without understanding the dangers, I try to correct them.