gRR comments on Holden's Objection 1: Friendliness is dangerous - Less Wrong

11 Post author: PhilGoetz 18 May 2012 12:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (428)

You are viewing a single comment's thread. Show more comments above.

Comment author: gRR 24 May 2012 02:00:21PM 1 point [-]

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?

The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

I have no idea what your FAI will do

But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

there are no objectively distinguished morals

But there may be a partial ordering between morals, such that X<Y iff all "interfering" actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have:

~Endorses(A1, CEV<A2>) ~Endorses(A2, CEV<A1>) Endorses(A1, CEV<A1+A2>)
Endorses(A2, CEV<A1+A2>)

[assuming Endorses(A, X) implies FAI<X> does not perform any non-interfering action disagreeable for A]

if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It's the ultimate winner-take-all arms race.
This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone's attempts to build any kind of Friendliness theory.

Well, don't you think this is just ridiculous? Does it look like the most rational behavior? Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

Comment author: DanArmak 24 May 2012 02:37:39PM 0 points [-]

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no.

I don't understand what you mean by "fundamentally different". You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case.

On what moral grounds would it do the prevention?

Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it's not ideal) over your view of CEV.

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is - which restricts those other AIs' abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good.

And yet it doesn't limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world's population who today believe in the supernatural will not consent to having that belief "fixed". Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience).

I have no idea what your FAI will do But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

Yes, but I don't know what I would approve of if I were "more intelligent" (a very ill defined term). And if you calculate that something, according to your definition of intelligence, and present me with the result, I might well reject that result even if I believe in your extrapolation process. I might well say: the future isn't predetermined. You can't calculate what I necessarily will become. You just extrapolated a creature I might become, which also happens to be more intelligent. But there's nothing in my moral system that says I should adopt the values of someone else because they are more intelligent. If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature! I might even choose to forego the kind of increased intelligence that causes such an undsired change in my values.

Short version: "what I would want if I were more intelligent (according to some definition)" isn't the same as "what I will likely want in the future", because there's no reason for me to grow in intelligence (by that definition) if I suspect it would twist my values. So you can't apply the heuristic of "if I know what I'm going to think tomorrow, I might as well think it today".

~Endorses(A1, CEV<A2>) ~Endorses(A2, CEV<A1>) Endorses(A1, CEV<A1+A2>) Endorses(A2, CEV<A1+A2>)

I think you may be missing a symbol there? If not, I can't parse it... Can you spell out for me what it means to just write the last three Endorses(...) clauses one after the other?

Does it look like the most rational behavior?

It may be quite rational for everyone individually, depending on projected payoffs. Unlike a PD, starting positions aren't symmetrical and players' progress/payoffs are not visible to other players. So saying "just cooperate" doesn't immediately apply.

Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

How can a state or military precommit to not having a supersecret project to develop a private AGI?

And while it's beneficial for some players to join in a cooperative effort, it may well be that a situation of several competing leagues (or really big players working alone) develops and is also stable. It's all laid over the background of existing political, religious and personal enmities and rivalries - even before we come to actual disagreements over what the AI should value.

Comment author: gRR 24 May 2012 04:25:25PM *  0 points [-]

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves.

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience)

Extrapolated volition is based on objective truth, by definition.

If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature!

The process of extrapolation takes this into account.

I think you may be missing a symbol there? If not, I can't parse it...

Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV<other>, but endorses CEV<both>.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

Comment author: DanArmak 24 May 2012 06:39:05PM *  0 points [-]

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons? Such as "if we kill them, we will benefit by dividing their resources among ourselves"?

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

So you're saying your version of CEV will forcibly update everyone's beliefs and values to be "factual" and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it's backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made?

If your answer is "yes, it will do that", then I would not call your AI a Friendly one at all.

Extrapolated volition is based on objective truth, by definition.

My understanding of the CEV doc differs from yours. It's not a precise or complete spec, and it looks like both readings can be justified.

The doc doesn't (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn't mean it removes all non-truth or all beliefs that "aren't even wrong" from the original volition. If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

But as long as we're discussing your vision of CEV, I can only repeat what I said above - if it's going to modify people by force like this, I think it's unFriendly and if it were up to me, would not launch such an AI.

I meant four independent clauses: each of the agents does not endorse CEV<other>, but endorses CEV<both>.

Understood. But I don't see how this partial ordering changes what I had described.

Let's say I'm A1 and you're A2. We would both prefer a mutual CEV than a CEV of the other only. But each of us would prefer even more a CEV of himself only. So each of us might try to bomb the other first if he expected to get away without retaliation. That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?

Comment author: gRR 24 May 2012 07:35:19PM 0 points [-]

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons?

I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering.

"if we kill them, we will benefit by dividing their resources among ourselves"

If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.

So you're saying your version of CEV will forcibly update everyone's beliefs

No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.

If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane).

That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.

But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable).

If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?

Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.

Comment author: DanArmak 24 May 2012 07:59:15PM 0 points [-]

If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest. The CEVs of 20% obviously don't want to be killed. Because there's no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.

No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.

I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them. Sorry for the confusion.

As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane).

A case could be made that many millions of religious "true believers" have un-updatable 0/1 probabilities. And so on.

Your solution is to not give them a voice in the CEV at all. Which is great for the rest of us - our CEV will include some presumably reduced term for their welfare, but they don't get to vote on things. This is something I would certainly support in a FAI (regardless of CEV), just as I would support using CEV<few people + me> or even CEV<few people like me in crucial respects> to CEV<everyone>.

The only difference between us then is that I estimate there to be many such people. If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?

PD reasoning says you should cooperate (assuming cooperation is precommittable).

As I said before, this reasoning is inapplicable, because this situation is nothing like a PD.

  1. The PD reasoning to cooperate only applies in case of iterated PD, whereas creating a singleton AI is a single game.
  2. Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario. (E.g., minor/weak players are more likely to cooperate than big ones that are more likely to succeed if they defect.)
  3. The game is not instantaneous, so players can change their strategy based on how other players play. When they do so they can transfer value gained by themselves or by other players (e.g. join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2).
  4. It is possible to form alliances, which gain by "defecting" as a group. In PD, players cannot discuss alliances or trade other values to form them before choosing how to play.
  5. There are other games going on between players, so they already have knowledge and opinions and prejudices about each other, and desires to cooperate with certain players and not others. Certain alliances will form naturally, others won't.

adoption of total transparency for everybody of all governmental and military matters.

This counts as very weak evidence because it proves it's at least possible to achieve this, yes. (If all players very intensively inspect all other players to make sure a secret project isn't being hidden anywhere - they'd have to recruit a big chunk of the workforce just to watch over all the rest.)

But the probability of this happening in the real world, between all players, as they scramble to be the first to build an apocalyptic new weapon, is so small it's not even worth discussion time. (Unless you disagree, of course.) I'm not convinced by this that it's an easier problem to solve than that of building AGI or FAI or CEV.

Comment author: gRR 24 May 2012 09:51:55PM 1 point [-]

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest.

The resources are not scarce, yet the CEV-s want to kill? Why?

I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them.

It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.

If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?

People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.

The PD reasoning to cooperate only applies in case of iterated PD

Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?

Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario

This doesn't really matter for a broad range of possible payoff matrices.

join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2

Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let's assume it's solved.

I'm not convinced by this that it's an easier problem to solve than that of building AGI or FAI or CEV.

Maybe you're right. But IMHO it's a less interesting problem :)

Comment author: DanArmak 25 May 2012 08:50:24PM 1 point [-]

The resources are not scarce, yet the CEV-s want to kill? Why?

Sorry for the confusion. Let's taboo "scarce" and start from scratch.

I'm talking about a scenario where - to simplify only slightly from the real world - there exist some finite (even if growing) resources such that almost everyone, no matter how much they already have, want more of. A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources. Would the AI prevent this, althogh there is no consensus against the killing?

If you still want to ask whether the resource is "scarce", please specify what that means exactly. Maybe any finite and highly desireable resource, with returns diminishing weakly or not at all, can be considered "scarce".

It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.

People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.

As I said - this is fine by me insofar as I expect the CEV not to choose to ignore me. (Which means it's not fine through the Rawlsian veil of ignorance, but I don't care and presumably neither do you.)

The question of definition, who is to be included in the CEV? or - who is considered sane? becomes of paramount importance. Since it is not itself decided by the CEV, it is presumably hardcoded into the AI design (or evolves within that design as the AI self-modifies, but that's very dangerous without formal proofs that it won't evolve to include the "wrong" people.) The simplest way to hardcode it is to directly specify the people to be included, but you prefer testing on qualifications.

However this is realized, it would give people even more incentive to influence or stop your AI building process or to start their own to compete, since they would be afraid of not being included in the CEV used by your AI.

The PD reasoning to cooperate only applies in case of iterated PD

Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?

TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.

Which arguments of Hofstadter and Yudkowsky do you mean?

Cooperating in this game would mean there is exactly one global research alliance.

Why? What prevents several competing alliances (or single players) from forming, competing for the cooperation of the smaller players?

Comment author: gRR 26 May 2012 03:15:22AM 0 points [-]

A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

The question of definition, who is to be included in the CEV? or - who is considered sane?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Comment author: DanArmak 26 May 2012 08:40:05AM 1 point [-]

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist.

Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.

Not that I'm opposed to this decision (if you must have CEV at all).

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves.

There's a symmetry, but "first person to complete AI wins, everyone 'defects'" is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style "decide for everyone", you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome "have the AI obey ME!", which is not the same.

If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses.

Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there's a player or alliance that's strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn't care about everyone else much. There are many possible and plausible outcomes besides "everybody loses".

Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.

Comment author: dlthomas 24 May 2012 08:14:36PM *  0 points [-]

Because there's no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.

There may be a distinction between "the AI will not prevent the 80% from killing the 20%" and "nothing will prevent the 80% from killing the 20%" that is getting lost in your phrasing. I am not convinced that the math doesn't make them equivalent, in the long run - but I'm definitely not convinced otherwise.

Comment author: DanArmak 24 May 2012 08:24:21PM 0 points [-]

I'm assuming the 80% are capable of killing the 20% unless the AI interferes. That's part of the thought experiment. It's not unreasonable, since they are 4 times as numerous. But if you find this problematic, suppose it's 99% killing 1% at a time. It doesn't really matter.

Comment author: dlthomas 24 May 2012 08:28:40PM 1 point [-]

My point is that we currently have methods of preventing this that don't require an AI, and which do pretty well. Why do we need the AI to do it? Or more specifically, why should we reject an AI that won't, but may do other useful things?

Comment author: DanArmak 24 May 2012 08:34:02PM *  0 points [-]

There have been, and are, many mass killings of minority groups and of enemy populations and conscripted soldiers at war. If we cure death and diseases, this will become the biggest cause of death and suffering in the world. It's important and we'll have to deal with it eventually.

The AI under discussion not just won't solve the problem, it would (I contend) become a singleton and prevent me from building another AI that does solve the problem. (If it chooses not to become a singleton, it will quickly be supplanted by an AI that does try to become one.)

Comment author: thomblake 24 May 2012 07:01:09PM 0 points [-]

If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

I think you're skipping between levels hereabouts. CEV, the theoretical construct, might consider people so modified, even if a FAI based on CEV would not modify them. CEV is our values if we were better, but does not necessitate us actually getting better.

Comment author: DanArmak 24 May 2012 07:24:40PM 0 points [-]

In this thread I always used CEV in the sense of an AI implementing CEV. (Sometimes you'll see descriptions of what I don't believe to be the standard interpretation of how such an AI would behave, where gRR suggests such behaviors and I reply.)