LESSWRONG
LW

All of gRR's Comments + Replies

Harry Potter and the Methods of Rationality discussion thread, February 2015, chapters 105-107

gRR10y50

I am confused about how Philosopher's stone could help with reviving Hermione. Does QQ mean to permanently transfigure her dead body into a living Hermione? But then, would it not mean that Harry could do it now, albeit temporarily? And, he wouldn't even need a body. He could then just temporarily transfigure any object into a living Hermione. Also, now that I think of it, he could transfigure himself a Feynman and a couple of Einsteins...

2skeptical_lurker10y

Transfiguring someone into existence just so they can die a few hours later would certainly be regarded as dark arts. If Harry ends the transfiguration on Hermione's dead body, they QQ can use it as a template, whereas it might not be possible to transfigure Feynman for the same reason you can't transfigure a lightsaber.

Let's reimplement EURISKO!

gRR11y00

The AI can be adapted for other, less restricted, domains

That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.

The AI adds code that will evolve into another AI into it's output

The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.

The AI could self-modify incorrectly and result in unfriendly AI

An... (read more)

[Link] Quantum theory as the most robust description of reproducible experiments

gRR11y10

Well, I liked the paper, but I'm not knowledgeable enough to judge its true merits. It deals heavily with Bayesian-related questions, somewhat in Jayne's style, so I thought it could be relevant to this forum.

At least one of the authors is a well-known theoretical physicist with an awe-inspiring Hirsch factor, so presumably the paper would not be trivially worthless. I think it merits a more careful read.

9Mitchell_Porter11y

Someone can build a career on successfully and ingeniously applying QM, and still have personal views about why QM works, that are wrong or naive. Rather than just be annoyed with the paper, I want to identify its governing ideas. Basically, this is a research program which aims to show that quantum mechanics doesn't imply anything strikingly new or strange about reality. The core claim is that quantum mechanics is the natural formalism for describing any phenomenon which exhibits uncertainty but which is still robustly reproducible. In slightly more detail: First, there is no attempt to figure out hidden physical realities. The claim is that in any possible world where certain experimental results occur, QM will provide an apt and optimal description of events, regardless of what the real causes are. Second, there is a determination to show that QM is somehow straightforward or even banal: 'quantum theory is a “common sense” description of the vast class of experiments that belongs to category 3a.' Third, the authors are inspired by Jaynes's attempt to obtain QM from Bayes, and Frieden's attempt to get physics from Fisher information, which they think they can justify for experiments that are "robustly" reproducible. Having set out this agenda, what evidence do the authors provide? First, they describe something vaguely like an EPR experiment, make various assumptions about how the outputs behave, and then show that these assumptions imply correlations like those produced when a particular entangled state is used as input in a real EPR experiment. They also add that with different starting assumptions, they can obtain outputs like those of a different entangled state. Then, they have a similarly abstracted description of a Stern-Gerlach experiment, and here they claim that they get the Born rule as a result of their assumptions. Finally, they consider a moving particle under repeated observation, and say that they can get the Schrodinger equation by assuming th

Harry Potter and the Methods of Rationality discussion thread, part 25, chapter 96

gRR12y10

Regarding the "he's here... he is the end of the world" prophecy, in view of the recent events, it seems like it can become literally true without it being a bad thing. After all, it does not specify a time frame. So Harry may become immortal and then tear apart the very stars in heaven, some time during a long career.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

You're treating resources as one single kind, where really there are many kinds with possible trades between teams

I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.

But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying.

We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a ... (read more)

1DanArmak13y

Analytically, I don't a priori expect a succession of two-player games to have the same result as one many-player game which also has duration in time and not just a single round.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)

0DanArmak13y

You can use some resources like computation directly and in unlimited amounts (e.g. living for unlimitedly long virtual times per real second inside a simulation). There are some physical limits on that due to speed of light limiting effective brain size, but that depends on brain design and anyway the limits seem to be pretty high. More generally: number of configurations physically possible in a given volume of space is limited (by the entropy of a black hole). If you have a utility function unbounded from above, as it rises it must eventually map to states that describe more space or matter than the amount you started with. Any utility maximizer with unbounded utility eventually wants to expand.

0[anonymous]13y

I don't know what the exchange rates are, but it does cost something (computer time, energy, negentropy) to stay alive. That holds for simulated creatures too. If the available resources to keep someone alive are limited, then I think there will be conflict over those resources.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'.

... (read more)

0DanArmak13y

You're treating resources as one single kind, where really there are many kinds with possible trades between teams. Here you're ignoring a factor that might actually be crucial to encouraging cooperation (I'm not saying I showed this formally :-) But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying. I don't even care much if given two teams the correct choice is to cooperate, because I set very low probability on there being exactly two teams and no other independent players being able to contribute anything (money, people, etc) to one of the teams. You still haven't given good evidence for holding this position regarding the relation between the different Uxxx utilities. Except for the fact CEV is not really specified, so it could be built so that that would be true. But equally it could be built so that that would be false. There's no point in arguing over which possibility "CEV" really refers to (although if everyone agreed on something that would clear up a lot of debates); the important questions are what do we want a FAI to do if we build one, and what we anticipate others to tell their FAIs to do.

2[anonymous]13y

All of those resources are fungible and can be exchanged for time. There might be no limit to the amount of time people desire, even very enlightened posthuman people.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

The question of definition, who is to be included in the CEV? or - who is considered sane?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus... (read more)

2DanArmak13y

shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are? Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values. Not that I'm opposed to this decision (if you must have CEV at all). There's a symmetry, but "first person to complete AI wins, everyone 'defects'" is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style "decide for everyone", you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome "have the AI obey ME!", which is not the same. Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there's a player or alliance that's strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn't care about everyone else much. There are many possible and plausible outcomes besides "everybody loses". Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.

Holden's Objection 1: Friendliness is dangerous

gRR13y10

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest.

The resources are not scarce, yet the CEV-s want to kill? Why?

I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them.

It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.

If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?

Peo... (read more)

2DanArmak13y

Sorry for the confusion. Let's taboo "scarce" and start from scratch. I'm talking about a scenario where - to simplify only slightly from the real world - there exist some finite (even if growing) resources such that almost everyone, no matter how much they already have, want more of. A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources. Would the AI prevent this, althogh there is no consensus against the killing? If you still want to ask whether the resource is "scarce", please specify what that means exactly. Maybe any finite and highly desireable resource, with returns diminishing weakly or not at all, can be considered "scarce". As I said - this is fine by me insofar as I expect the CEV not to choose to ignore me. (Which means it's not fine through the Rawlsian veil of ignorance, but I don't care and presumably neither do you.) The question of definition, who is to be included in the CEV? or - who is considered sane? becomes of paramount importance. Since it is not itself decided by the CEV, it is presumably hardcoded into the AI design (or evolves within that design as the AI self-modifies, but that's very dangerous without formal proofs that it won't evolve to include the "wrong" people.) The simplest way to hardcode it is to directly specify the people to be included, but you prefer testing on qualifications. However this is realized, it would give people even more incentive to influence or stop your AI building process or to start their own to compete, since they would be afraid of not being included in the CEV used by your AI. TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane. Which arguments of Hofstadter and Yudkowsky do you mean? Why? What prevents several competing alliances (or single players) from forming, competing for the cooperation of the smaller players?

Holden's Objection 1: Friendliness is dangerous

gRR13y00

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons?

I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering.

"if we kill them, we will benefit by dividing their resources among ourselves"

If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killin... (read more)

0DanArmak13y

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest. The CEVs of 20% obviously don't want to be killed. Because there's no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%. I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them. Sorry for the confusion. A case could be made that many millions of religious "true believers" have un-updatable 0/1 probabilities. And so on. Your solution is to not give them a voice in the CEV at all. Which is great for the rest of us - our CEV will include some presumably reduced term for their welfare, but they don't get to vote on things. This is something I would certainly support in a FAI (regardless of CEV), just as I would support using CEV or even CEV to CEV. The only difference between us then is that I estimate there to be many such people. If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you? As I said before, this reasoning is inapplicable, because this situation is nothing like a PD. 1. The PD reasoning to cooperate only applies in case of iterated PD, whereas creating a singleton AI is a single game. 2. Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario. (E.g., minor/weak players are more likely to cooperate than big ones that are more likely to succeed if they defect.) 3. The game is not instantaneous, so players can change their strategy based on how other players play. When they do so they can transfer value gained by themselves or by other players (e.g. join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2). 4. It is possible to form alliances, which gain by "defecting" as a group. In PD, players cannot discuss alliances or trade o

Holden's Objection 1: Friendliness is dangerous

gRR13y00

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves.

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CE... (read more)

0DanArmak13y

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons? Such as "if we kill them, we will benefit by dividing their resources among ourselves"? So you're saying your version of CEV will forcibly update everyone's beliefs and values to be "factual" and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it's backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made? If your answer is "yes, it will do that", then I would not call your AI a Friendly one at all. My understanding of the CEV doc differs from yours. It's not a precise or complete spec, and it looks like both readings can be justified. The doc doesn't (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn't mean it removes all non-truth or all beliefs that "aren't even wrong" from the original volition. If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force. But as long as we're discussing your vision of CEV, I can only repeat what I said above - if it's going to modify people by force like this, I think it's unFriendly and if it were up to me, would not launch such an AI. Understood. But I don't see how this partial ordering changes what I had described. Let's say I'm A1 and you're A2. We wou

Holden's Objection 1: Friendliness is dangerous

gRR13y10

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?

The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantl... (read more)

0DanArmak13y

I don't understand what you mean by "fundamentally different". You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case. Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it's not ideal) over your view of CEV. Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is - which restricts those other AIs' abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good. And yet it doesn't limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%. But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world's population who today believe in the supernatural will not consent to having that belief "fixed". Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience). Yes, but I don't know what I would approve of if I were "more intelligent"

Holden's Objection 1: Friendliness is dangerous

gRR13y00

A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better.

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.

There are people who believe religiously that End Times must come

Yes, but we assume they are factually wrong, and so their... (read more)

0DanArmak13y

No. Any FAI (ETA: or other AGI) has to be a singleton to last for long. Otherwise I can build a uFAI that might replace it. Suppose your AI only does a few things that everyone agrees on, but otherwise "doesn't interfere". Then I can build another AI, which implements values people don't agree on. Your AI must either interfere, or be resigned to not being very relevant in determining the future. Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority? Then it's at best a nice-to-have, but most likely useless. After people successfully build one AGI, they will quickly reuse the knowledge to build more. The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs, to safeguard its utility function. This is unavoidable. With truly powerful AGI, preventing new AIs from gaining power is the only stable solution. Yeah, that's worked really well for all of human history so far. First, they may not factually wrong about the events they predict in the real world - like everyone dying - just wrong about the supernatural parts. (Especially if they're themselves working to bring these events to pass.) IOW, this may not be a factual belief to be corrected, but a desired-by-them future that others like me and you would wish to prevent. Second, you agreed the CEV of groups of people may contain very few things that they really agree on, so you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with. I have no idea what your FAI will do, because even if you make no mistakes in building it, you yourself don't know ahead of time what the CEV will work out to. If you did, you'd just plug those values into the AI directly instead of calculating the CEV. So I'll want to bomb you anyway, if that increases my chances of being the first to build a FAI. Our morals are indeed different, and sinc

Problematic Problems for TDT

gRR13y130

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.

1DanArmak13y

So what makes you think everybody's CEV would eventually agree on anything more? A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better. (At least, we can if we're speculating about building a FAI to execute any well-defined plan we can come up with.) I'm not even sure of that. There are people who believe religiously that End Times must come when everyone must die, and some of them want to hurry that along by actually killing people. And the meaning of "existential risk" is up for grabs anyway - does it preclude evolution into non-humans, leaving no members of original human species in existence? Does it preclude the death of everyone alive today, if some humans are always alive? Sure, it's unlikely or it might look like a contrived example to you. But are you really willing to precommit the future light cone, the single shot at creating an FAI (singleton), to whatever CEV might happen to be, without actually knowing what CEV produces and having an abort switch? That's one of the defining points of CEV: that you can't know it correctly in advance, or you would just program it directly as a set of goals instead of building a CEV-calculating machine. This seems wrong. A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:

There is a unique X, such that for all living people P, CEV

= X.

Assuming there is no such X, there could still be a plausible claim:

Y is not empty, where Y = Intersection{over all living people P} of CEV

And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever "evolving" will happen due to AI's influence is at least agreed upon by everyone('s CEV).

0DanArmak13y

I can buy, tentatively, that most people might one day agree on a very few things. If that's what you mean by Y, fine, but it restricts the FAI to doing almost nothing. I'd much rather build a FAI that implemented more values shared by fewer people (as long as those people include myself). I expect so would most people, including the ones hypothetically building the FAI - otherwise they'd expect not to benefit much from building it, since it would find very little consensus to implement! So the first team to successfully build FAI+CEV will choose to launch it as a CEV rather than CEV.

Holden's Objection 1: Friendliness is dangerous

gRR13y20

Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting

I meant that "it's wrong/bad for the AI to promote extrapolated values while the actual values are different and conflicting" will probably be a part of the extrapolated values, and the AI would act accordingly, if it can.

My position is that the AI must be guided by the humans' actual present values in choosing to steer human (s

... (read more)

2DanArmak13y

It's not merely uncertainty. My estimation is that it's almost certainly not achievable. Actual goals conflict; why should we expect goals to converge? The burden of proof is on you: why do you assign this possibility sufficient likelihood to even raise it to the level of conscious notice and debate? It may be true that "a unique reflectively-consistent set of values exists". What I find implausible and unsupported is that (all) humans will evolve towards having that set of values, in a way that can be forecast by "extrapolating" their current values. Even if you showed that humans might evolve towards it (which you haven't), the future isn't set in stone - who says they will evolve towards it, with sufficient certitude that you're willing to optimize for those future values before we actually have them?

Holden's Objection 1: Friendliness is dangerous

gRR13y40

Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something.

This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.

1DanArmak13y

You're right about this point (and so is TheOtherDave) and I was wrong. With that, I find myself unsure as to what we agree and disagree on. Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting. (If this is wrong please say so.) Talking further about "extrapolated" values may be confusing in this context. I think we can taboo that and reach all the same conclusions while only mentioning actual values. The AI starts out by implementing humans' actual present values. If some values (want blue box) lead to actually-undesired outcomes (blue box really contains death), that is a case of conflicting actual values (want blue box vs. want to not die). The AI obviously needs to be able to manage conflicting actual values, because humans always have them, but that is true regardless of CEV. Additionally, the AI may foresee that humans are going to change and in the future have some other actual values; call these the future-values. This change may be described as "gaining intelligence etc." (as in CEV) or it may be a different sort of change - it doesn't matter for our purposes. Suppose the AI anticipates this change, and has no imperative to prevent it (such as helping humans avoid murderer-Gandhi pills due to present human values), or maybe even has an imperative to assist this change (again, according to current human values). Then the AI will want to avoid doing things today which will make its task harder tomorrow, or which will cause future people to regret their past actions: it may find itself striking a balance between present and future (predicted) human values. This is, at the very least, dangerous - because it involves satisfying current human values not as fully as possible, while the AI may be wrong about future values. Also, the AI's actions unavoidably influence humans and so probably influence which fu

Holden's Objection 1: Friendliness is dangerous

gRR13y20

No, the "actual" values would tell it to give the humans the blue boxes they want, already.

0DanArmak13y

The humans don't value the blue box directly. It's an instrumental value because of what they think is inside. The humans really value (in actual, not extrapolated values) the diamond they think is inside. That's a problem with your example (of the boxes): the values are instrumental, the boxes are not supposed to be valued in themselves. ETA: wrong and retracted. See below.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person

It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].

0DanArmak13y

The actual values would also tell it to do so. This is a case where the two coincide. In most cases they don't.

How likely the AI that knows it's evil? Or: is a human-level understanding of human wants enough?

gRR13y00

If it extrapolates coherently, then it's a single concept, otherwise it's a mixture :)

This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word "sound" appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of "sound", and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.

Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.

How likely the AI that knows it's evil? Or: is a human-level understanding of human wants enough?

gRR13y00

Does is rely on true meanings of words, particularly? Why not on concepts? Individually, "vibrations of air" and "auditory experiences" can be coherent.

2cousin_it13y

What's the general algorithm you can use to determine if something like "sound" is a "word" or a "concept"?

Holden's Objection 1: Friendliness is dangerous

gRR13y00

I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.

0TheOtherDave13y

Agreed that an actual concrete plan would be a valuable thing, for the reasons you list among others.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

Why is it important that it be uncontroversial?

I'm not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.

Ok, you're right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where "sufficiently" is not a too high threshold?

0TheOtherDave13y

There probably exists (hypothetically) some plan such that it wouldn't seem unreasonable to me to declare anyone who doesn't endorse that plan either insufficiently well-informed or insufficiently intelligent. In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

What I'm trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:

"Unanimous" CEV exists
And is unique
And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),

would you say that running it is uncontroversial? If not, what other conditions are required?

0TheOtherDave13y

No, I wouldn't expect running it to be uncontroversial, but I would endorse running it. I can't imagine any world-changing event that would be uncontroversial, if I assume that the normal mechanisms for generating controversy aren't manipulated (in which case anything might be uncontroversial). Why is it important that it be uncontroversial?

Oh, mainstream philosophy.

gRR13y10

I value the universe with my friend in it more than one without her.

Holden's Objection 1: Friendliness is dangerous

gRR13y20

Ok, but do you grant that running a FAI with "unanimous CEV" is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing - if I'm wrong about my hypothesis?

0TheOtherDave13y

I don't know how to answer that question. Again, it seems that you're trying to get an answer given a whole bunch of assumptions, but that you resist the effort to make those assumptions clear as part of the answer. * It is not clear to me that there exists such a thing as a "unanimous CEV" at all, even in the hypothetical sense of something we might be able to articulate some day with the right tools. * If I nevertheless assume that a unanimous CEV exists in that hypothetical sense, it is not clear to me that only one exists; presumably modifications to the CEV-extraction algorithm would result in different CEVs from the same input minds, and I don't see any principled grounds for choosing among that cohort of algorithms that don't in effect involve selecting a desired output first. (In which case CEV extraction is a complete red herring, since the output was a "bottom line" written in advance of CEV's extraction, and we should be asking how that output was actually arrived at and whether we endorse that process. ) * If I nevertheless assume that a single CEV-extraction algorithm is superior to all the others, and further assume that we select that algorithm via some process I cannot currently imagine and run it, and that we then run a superhuman environment-optimizer with its output as a target, it is not clear to me that I would endorse that state change as an individual. So, no, I don't agree that running it is uncontroversial. (Although everyone might agree afterwards that it was a good idea.) * If the state change nevertheless gets implemented, I agree (given all of those assumptions) that the resulting state-change improves the world by the standards of all humanity. "Safe" is an OK word for that, I guess, though it's not the usual meaning of "safe." * I don't agree that the worst that happens, if those assumptions turn out to be wrong, is that it stands there and does nothing. The worst that happens is that the superhuman environment-optimizer runs

Holden's Objection 1: Friendliness is dangerous

gRR13y20

People are happy, by definition, if their actual values are fulfilled

Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?

Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of it... (read more)

0DanArmak13y

What you are saying indeed applies only "in cases where this is impossible". I further suggest that these are extremely rare cases when a superhumanly-powerful AI is in charge. If the blue box contains horrible violent death, the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

VHEMT supports human extinction primarily because, in the group's view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.

Obviously, human extinction is not their terminal value.

0Desrtopa13y

Or at least, not officially. I have known at least one person who professed to desire that the human race go extinct because he thought the universe as a whole would simply be better if humans did not exist. It's possible that he was stating such an extreme position for shock value (he did have a tendency to display some fairly pronounced antisocial tendencies,) and that he had other values that conflicted with this position on some level. But considering the diversity of viewpoints and values I've observed people to hold, I would bet quite heavily against nobody in the world actually desiring the end of human existence.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

2TheOtherDave13y

OK, cool. To answer your question: sure, if I assume (as you seem to) that the extrapolation process is such that I would in fact endorse the results, and I also assume that the extrapolation process is such that if it takes as input all humans it will produce at least one desire that is endorsed by all humans (even if they themselves don't know it in their current form), then I'd agree that's a good plan, if I further assume that it doesn't have any negative side-effects. But the assumptions strike me as implausible, and that matters. I mean, if I assume that everyone being thrown into a sufficiently properly designed blender and turned into stew is a process I would endorse, and I also assume that the blending process has no negative side-effects, then I'd agree that that's a good plan, too. I just don't think any such blender is ever going to exist.

0Desrtopa13y

Does the existence of the Voluntary Human Extinction Movement affect your belief in this proposition?

Oh, mainstream philosophy.

gRR13y20

But he assumes that it is worse for me because it is bad for my friend to have died. Whereas, in fact, it is worse for me directly.

0Jack13y

How is it worse for you directly?

Oh, mainstream philosophy.

gRR13y50

People sometimes respond that death isn't bad for the person who is dead. Death is bad for the survivors. But I don't think that can be central to what's bad about death. Compare two stories.
Story 1. Your friend is about to go on the spaceship that is leaving for 100 Earth years to explore a distant solar system. By the time the spaceship comes back, you will be long dead. Worse still, 20 minutes after the ship takes off, all radio contact between the Earth and the ship will be lost until its return. You're losing all contact with your closest friend.
Stor

... (read more)

4bryjnar13y

Isn't that included when he says "that is worse for you, too, since you care about your friend"?

Holden's Objection 1: Friendliness is dangerous

gRR13y00

For extrapolation to be conceptually plausible, I imagine "knowledge" and "intelligence level" to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable.

Yes, many religious people wouldn't want their beliefs erased, but only because they believe them to be true. They wouldn't oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved ... (read more)

Holden's Objection 1: Friendliness is dangerous

gRR13y10

Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV

Then we can label paperclipping as a "true" value too. However, I still prefer true human values to be maximized, not true clippy values.

Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing th

... (read more)

0DanArmak13y

And as I and others said, you haven't given any evidence that such people are rare or even less than half the population (with respect to some of the values they hold). That's a good point to end the conversation, then :-)

Holden's Objection 1: Friendliness is dangerous

gRR13y00

What makes you give them such a label as "true"?

They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.

In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized.

But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time. And I think CEV would try to synchronize this with the timing of its optimization process.

2DanArmak13y

Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV. Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy. So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values? Then for all we know it would never do anything at all, and the burden of proof is on you to show otherwise. Or it could modify the world to resemble their partially-evolved values, but then it wouldn't be a CEV, just a maximizer of whatever values people happen to already have.

Holden's Objection 1: Friendliness is dangerous

gRR13y-20

why extrapolate values at all

Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations.

they will suffer in the CEV future

This does not follow.

2DanArmak13y

What makes you give them such a label as "true"? There is no such thing as a "correct" or "objective" value. Or values are possible in the sense that there can be agents will all possible values, even paperclip-maximizing. The only interesting property of values is who actually holds them. But nobody actually holds your extrapolated values (today). Current values (and values in general) are not approximations of any other values. All values just are. Why do you call them approximations? In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized. In proportion to how much this happens, which is positively correlated to the difference between actual and extrapolated values, people who hold the actual values will suffer living in such a world. (If the AI is a singleton they will not even have a hope of a better future.) Briefly: suffering ~ failing to achieve your values.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

Errr. This is a question of simple fact, which is either true or false. I believe it's true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.

0TheOtherDave13y

You've lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?

Holden's Objection 1: Friendliness is dangerous

gRR13y00

Dunno... propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don't expect this to happen.

0JoshuaZ13y

That you don't expect it to happen shouldn't by itself be a reason not to consider it. I'm asking because it seems you are avoiding the hard questions by more or less saying you don't think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

No, because "does CEV fulfill....?" is not a well-defined or fully specified question. But I think, if you asked "whether it is possible to build FAI+CEV in such a way that it fulfills the wish(es) of literally everyone while affecting everything else the least", they would say they do not know.

0TheOtherDave13y

Ah, OK. I completely misunderstood your claim, then. Thanks for clarifying.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

I'd think someone's playing a practical joke on me.

0JoshuaZ13y

And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says "yep, he does" how would you respond?

Holden's Objection 1: Friendliness is dangerous

gRR13y00

Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks.

Too bad. Let's just agree to disagree then, until the brain scanning technology is sufficiently advanced.

I've pointed out people who don't wish for the examples you gave

So far, I didn't see a convincing example of a person who truly wished for everyone to die, even in extrapolation.

Otherwise the false current beliefs will keep on being very relevant to them

To them, yes, but not to their CEV.

0DanArmak13y

Or until you provide the evidence that causes you to hold your opinions. I think it's plausible such people exist. Conversely, if you fine-tune your implementation of "extrapolation" to make their extrapolated values radically different from their current values (and incidentally matching your own current values), that's not what CEV is supposed to be about. But before talking about that, there's a more important point: So why do you care about their extrapolated values? If you think CEV will extrapolate something that matches your current values but not those of many others; and you don't want to change by force others' actual values to match their extrapolated ones, so they will suffer in the CEV future; then why extrapolate their values at all? Why not just ignore them and extrapolate your own, if you have the first-mover advantage?

Holden's Objection 1: Friendliness is dangerous

gRR13y-20

You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values

Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!

0DanArmak13y

This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!

Holden's Objection 1: Friendliness is dangerous

gRR13y00

In all of their behavior throughout their lives, and in their own words today, they honestly have this value

This is the conditional that I believe is false when I say "they are probably lying, trolling, joking". I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.

0JoshuaZ13y

How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?

2DanArmak13y

OK. That's possible. But why do you believe that, despite their large numbers and lifelong avowal of those beliefs?

How can we ensure that a Friendly AI team will be sane enough?

gRR13y00

Well, assuming EY's view of intelligence, the "cautionary position" is likely to be a mathematical statement. And then why not prove it? Given several decades? That's a lot of time.

0JoshuaZ13y

One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule "be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name" or even subtler "be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity". The second will likely act identically to a Friendly AI.

Holden's Objection 1: Friendliness is dangerous

gRR13y-20

Even if they do, it will be the best possible thing for them, according to their own (extrapolated) values.

1DanArmak13y

Who cares about their extrapolated values? Not them (they keep their original values). Not others (who have different actual and extrapolated values). Then why extrapolate their values at all? You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

we anticipate there will be no extrapolated wishes that literally everyone agrees on

Well, now you know there exist people who believe that there are some universally acceptable wishes. Let's do the Aumann update :)

Lots of people religiously believe...

False beliefs => irrelevant after extrapolation.

Some others believe that life in this world is suffering, negative utility, and ought to be stopped for its own sake (stopping the cycle of rebirth)

False beliefs (rebirth, existence of nirvana state) => irrelevant after extrapolation.

0DanArmak13y

Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks. Since you aren't giving any valid examples of universally acceptable wishes (I've pointed out people who don't wish for the examples you gave), why do you believe such wishes exist? Only if you modify these actual people to have their extrapolated beliefs instead of their current ones. Otherwise the false current beliefs will keep on being very relevant to them. Do you want to do that?

How can we ensure that a Friendly AI team will be sane enough?

gRR13y00

My conditional was "cautionary position is the correct one". I meant, provably correct.

2Vladimir_Nesov13y

It's like with dreams of true universal objective morality: even if in some sense there is one, some agents are just going to ignore it.

2Wei Dai13y

Leaving out the "provably" makes a big difference. If you add "provably" then I think the conditional is so unlikely that I don't know why you'd assume it.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?

0wedrifid13y

It would seem that we can define the algorithm which can be used to manipulate and process a given input of loosely defined inconsistent preferences. This would seem to be a necessary thing to do before any actual brain scanning becomes involved.

0DanArmak13y

Well part of my point is that indeed we can't even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV's output would be like are completely unsupported by evidence; they are mere wishful thinking. More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them. Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive). Now, suppose someone expresses a value that you think they are merely "lying, trolling or joking" about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less. By construction, since you couldn't derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren't likely to be in them in the future, either.) So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is "their true values", while their behavior in the real, common situations throughout their lives isn't? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are

How can we ensure that a Friendly AI team will be sane enough?

gRR13y-40

What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even "rule the world forever and reshape it in your image" territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?

Remembe... (read more)

0JoshuaZ13y

That one has a provably Friendly AI is not the same thing as that any other AI is provably going to do terrible things.

0DanArmak13y

I thought you were merely specifying that the FAI theory was proven to be Friendly. But you're also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn't understand that was what you were suggesting. Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.

Holden's Objection 1: Friendliness is dangerous

gRR13y00

I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.

2DanArmak13y

How do you propose to test it without actually running a CEV calculation?