Sewing-Machine comments on Holden's Objection 1: Friendliness is dangerous - Less Wrong

11 Post author: PhilGoetz 18 May 2012 12:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (428)

You are viewing a single comment's thread. Show more comments above.

Comment author: gRR 26 May 2012 03:15:22AM 0 points [-]

A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

The question of definition, who is to be included in the CEV? or - who is considered sane?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Comment author: DanArmak 26 May 2012 08:40:05AM 1 point [-]

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist.

Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.

Not that I'm opposed to this decision (if you must have CEV at all).

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves.

There's a symmetry, but "first person to complete AI wins, everyone 'defects'" is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style "decide for everyone", you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome "have the AI obey ME!", which is not the same.

If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses.

Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there's a player or alliance that's strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn't care about everyone else much. There are many possible and plausible outcomes besides "everybody loses".

Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.

Comment author: gRR 26 May 2012 04:04:05PM *  0 points [-]

Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'.

Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.

Hmm, we are starting to argue about exact details of extrapolation process...

There are many possible and plausible outcomes besides "everybody loses".

Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI<self>, FAI<everybody>, and FAI<other>, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself.

Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: "cooperate" or "defect". Let's compute the expected utilities for the first team:

We cooperate, opponent team cooperates: EU("CC") = Ueverybody * F(R1+R2, 0) We cooperate, opponent team defects: EU("CD") = Ueverybody * F(R1, R2) + Uother * F(R2, R1) We defect, opponent team cooperates: EU("DC") = Uself * F(R1, R2) + Ueverybody * F(R2, R1) We defect, opponent team defects: EU("DD") = Uself * F(R1, R2) + Uother * F(R2, R1)

Then, EU("CD") < EU("DD") < EU("DC"), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then:

  1. If Ueverybody <= Uself*A + Uother*B, then EU("CC") < EU("DD"), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other.

  2. If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner's dilemma.

  3. If Ueverybody >= Uself*A/(1-B), then EU("CC") >= EU("DC"), and "cooperate" is the obviously correct decision. This is my position: Ueverybody is not much less than Uself, and/or the teams are more evenly matched.

Comment author: [deleted] 26 May 2012 04:11:44PM 1 point [-]

These all have property that you only need so much of them.

All of those resources are fungible and can be exchanged for time. There might be no limit to the amount of time people desire, even very enlightened posthuman people.

Comment author: gRR 26 May 2012 04:53:13PM *  0 points [-]

I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)

Comment author: DanArmak 26 May 2012 06:55:49PM 0 points [-]

You can use some resources like computation directly and in unlimited amounts (e.g. living for unlimitedly long virtual times per real second inside a simulation). There are some physical limits on that due to speed of light limiting effective brain size, but that depends on brain design and anyway the limits seem to be pretty high.

More generally: number of configurations physically possible in a given volume of space is limited (by the entropy of a black hole). If you have a utility function unbounded from above, as it rises it must eventually map to states that describe more space or matter than the amount you started with. Any utility maximizer with unbounded utility eventually wants to expand.

Comment author: [deleted] 26 May 2012 06:04:59PM 0 points [-]

I don't know what the exchange rates are, but it does cost something (computer time, energy, negentropy) to stay alive. That holds for simulated creatures too. If the available resources to keep someone alive are limited, then I think there will be conflict over those resources.