LESSWRONG
LW

1 min read

Comment Permalink

Since I wrote about Extrapolated Volition as a solution to Goodhart's law, I think I should explain why i did so.

Here, what is sought is friendliness (your goal - G), whereas the friendliness architecture, the actual measureable thing, the goal is the proxy(G*).

Extrapolated volition is one way of avoiding G* divergence from G because when one extrapolates the volition of the persons involved, one gets closer to G.

In friendly AI, the entire living humanity's volition is sought to be extrapolated. Unfortunately, this proxy, like any other proxy, is subject to hack attacks. The scale of this problem is such that other solutions proposed cannot be utilised.

EDIT : edited for grammar in 3rd para

Houshalter15y20

In friendly AI, the entire living humanity's volition is sought to be extrapolated.

Thats the number one thing they are doing wrong then. This is exactly why you don't want to do that. Instead, the original programmer(s)'s volition should be the one to be extrapolated. If the programmer wants what is best for humanity, then the AI will also. If the programmer doesn't want whats best, then why would you expect him to make this for humanity in the first place? See, by wanting what is best for humanity, the programmer also doesn't want all potential bugs an... (read more)

See in context

78 Hacking the CEV for Fun and Profit

by Wei Dai

3rd Jun 2010

1 min read

207

78

It’s the year 2045, and Dr. Evil and the Singularity Institute have been in a long and grueling race to be the first to achieve machine intelligence, thereby controlling the course of the Singularity and the fate of the universe. Unfortunately for Dr. Evil, SIAI is ahead in the game. Its Friendly AI is undergoing final testing, and Coherent Extrapolated Volition is scheduled to begin in a week. Dr. Evil learns of this news, but there’s not much he can do, or so it seems. He has succeeded in developing brain scanning and emulation technology, but the emulation speed is still way too slow to be competitive.

There is no way to catch up with SIAI's superior technology in time, but Dr. Evil suddenly realizes that maybe he doesn’t have to. CEV is supposed to give equal weighting to all of humanity, and surely uploads count as human. If he had enough storage space, he could simply upload himself, and then make a trillion copies of the upload. The rest of humanity would end up with less than 1% weight in CEV. Not perfect, but he could live with that. Unfortunately he only has enough storage for a few hundred uploads. What to do…

Ah ha, compression! A trillion identical copies of an object would compress down to be only a little bit larger than one copy. But would CEV count compressed identical copies to be separate individuals? Maybe, maybe not. To be sure, Dr. Evil gives each copy a unique experience before adding it to the giant compressed archive. Since they still share almost all of the same information, a trillion copies, after compression, just manages to fit inside the available space.

Now Dr. Evil sits back and relaxes. Come next week, the Singularity Institute and rest of humanity are in for a rather rude surprise!

New to LessWrong?

Getting Started

FAQ

Library

Coherent Extrapolated VolitionWhole Brain EmulationComplexity of valueValue Learning

Frontpage

78

Mentioned in

23Deliberation as a method to find the "actual preferences" of humans

15Superintelligence 23: Coherent extrapolated volition

5Meetup : West LA—What, Exactly, Is a Person?

Hacking the CEV for Fun and Profit

New Comment

207 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:50 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]CarlShulman15y160

The point about demarcating individuals is important for ethical theories generally (and decision theories that make use of spooky 'reference classes'). Bostrom's Duplication of experience paper illustrates the problem further.

Also, insofar as CEV is just a sort of idealized deliberative democracy, this points to the problem of emulations with systematically unrepresentative values rapidly becoming the majority when emulation hardware is cheap.

[-]Wei Dai15y180

Any ethical theory that depends on demarcating individuals, or "counting people", appears doomed.

It seems likely that in the future, "individuals" will be constantly forked and merged/discarded as a matter of course. And like forking processes in Unix, such operations will probably make use of copy-on-write memory to save resources. Intuitively it makes little sense to attach a great deal of ethical significance to the concept of "individual" in those circumstances.

Is it time to give up, and start looking for ethical theories that don't depend on a concept of 'individual"? I'm curious what your thoughts are.

[-]Vladimir_M15y140

Arguably, the concept of "individual" is incoherent even with ordinary humans, for at least two reasons.

First, one could argue that human brain doesn't operate as a single agent in any meaningful sense, but instead consists of a whole bunch of different agents struggling to gain control of external behavior -- and what we perceive as our stream of consciousness is mostly just delusional confabulation giving rise to the fiction of a unified mind thinking and making decisions. (The topic was touched upon in this LW post and the subsequent discussion.)

Second, it's questionable whether the concept of personal identity across time is anything more than an arbitrary subjective preference. You believe that a certain entity that is expected to exist tomorrow can be identified as your future self, so you assign it a special value. From the evolutionary perspective, it's clear why humans have this value, and the concept is more or less coherent assuming the traditional biological constraints on human life, but it completely breaks down once this assumption is relaxed (as discussed in this recent thread). Therefore, one could argue that the idea of an "individual" existin... (read more)

4mattnewport15y

It strikes me that there's a somewhat fuzzy continuum in both directions. The concept of a coherent identity is largely a factor of how aligned the interests of the component entities are. This ranges all the way from individual genes or DNA sequences, through cells and sub-agents in the brain, past the individual human and up through family, community, nation, company, religion, species and beyond. Coalitions of entities with interests that are more aligned will tend to have a stronger sense of identity. Shifting incentives may lead to more or less alignment of interests and so change the boundaries where common identity is perceived. A given entity may form part of more than one overlapping coalition with a recognizable identity and shifting loyalties between coalitions are also significant.

2SilasBarta15y

Evolution may have reasons for making us think this, but how would you get that the identification of an individual existing through time is subjective? You can quite clearly recognize that there is a being of approximately the same composition and configuration in the same location from one moment to the next. Even (and especially) with the Mach/Barbour view that time as a fundamental coordinate doesn't exist, you can still identify a persistent individual in that it is the only one with nearly-identical memories to another one at the nearest location in the (indistinguishable-particle based) configuration space. (Barbour calls this the "Machian distinguished simplifier" or "fundamental distance", and it matches our non-subjective measures of time.) ETA: See Vladimir_M's response below; I had misread his comment, thereby criticizing a position he didn't take. I'll leave the above unchanged because of its discussion of fundamental distance as a related metric.

3Vladimir_M15y

SilasBarta: That's why I wrote that "the concept [of personal identity] is more or less coherent assuming the traditional biological constraints on human life." It falls apart when we start considering various transhuman scenarios where our basic intuitions no longer hold, and various intuition pump arguments provide conflicting results. Arguably, some of the standard arguments that come into play when we discuss these issues also have the effect that once they've been considered seriously, our basic intuitions about our normal biological existence also start to seem arbitrary, even though they're clearly defined and a matter of universal consensus within the range of our normal everyday experiences.

1SilasBarta15y

Point taken, I misread you as saying that our intuitions were arbitrary specifically in the case of traditional biological life, not just when they try to generalize outside this "training set". Sorry!

2red7515y

On the other hand, one could say that human brain can be described as a collection of interconnected subsystems, acting more or less coherently and coordinated by neural activity, that one perceive as stream of consciousness. Thus, stream of consciousness can be seen as a unifying tool, which allows to treat human brain activity as single agent operation. This point of view, while remaning reductionist-compatible, allows to reinforce perception of self as a real acting agent, thus, hopefully, reinforcing underlying neural coordination activity and making brain/oneself more effective. I'll be convinced that personal identity is a subjective preference, if one can explain strange coincidence: only "tomorrow I" will have those few terabytes of my memories.

1[anonymous]15y

That's roughly my current view. Two minor points. I think "whim" may overstate the point. An instinct cooked into us by millions of years of evolution isn't what I'd call "whim". Also, "subjective" seems to presuppose the very subject whose reality is being questioned.

2red7515y

I think there are significant ethical situations. Is it ethical to forcefully merge with one's own copy? To create unconscious copy and use it as slave/whatever? To forcefully create someone's copies? To discard one's own copy? Why would conscious agent be less significant if it can split/merge at will? Of course, voting will be meaningless, but is it reasonable do drop all ethics?

2jimrandomh15y

With regard to your own copies, it's more a matter of practicality than ethics. Before you make your first copy, while you're still uncertain about which copy you'll be, you should come up with detailed rules about how you want your instances to treat each other, then precommit to follow them. That way, none of your copies can ever force another to do anything without one of them breaking a precommitment.

1red7515y

What if second copy experiences that "click" moment, which makes his/her goals diverge, and he/she is unable to convince first copy to break precommitment on merging or to inflict this "click" moment on first copy?

-1SilasBarta15y

More importantly, does it count as gay or masturbation if you have sex with your copy?

1CarlShulman15y

I have a plan for a LW post on the subject, although I don't know when I'll get around to it.

[-]Alexandros15y130

Thinking about this a bit more, and assuming CEV operates on the humans that exist at the time of its application: Why would CEV operate on humans that do exist, and not on humans that could exist? It seems this is what Dr. Evil is taking advantage of, by densely populating identity-space around him and crowding out the rest of humanity. But this could occur for many other reasons: Certain cultures encouraging high birth rates, certain technologies or memes being popular at the time of CEV-activation that affect the wiring of the human brain, or certain historical turns that shape the direction of mankind. A more imaginative scenario: what if another scientist, who knows nothing about FAI and CEV, finds it useful to address a problem by copying himself into trillions of branches, each examining a certain hypothesis, and all the branches are (merged/discarded) when the answer is found. Let's further say that CEV t-zero occurs when the scientist is deep in a problem-solving cycle. Would the FAI take each branch as a separate human/vote? This scenario involves no intent to defraud the system. It also is not manipulation of a proxy, as there is a real definitional problem here whose answer is not easily apparent to a human. Applying CEV to all potential humans that could have existed in identity-space would deal with this, but pushes CEV further and further into uncomputable territory.

[-]Wei Dai15y100

Why would CEV operate on humans that do exist, and not on humans that could exist?

To do the latter, you would need a definition of "human" that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don't see how to specify this definition. (Is this problem not obvious to everyone else?)

For example, we might specify a prototypical human mind, and say that "human" is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this "humankind" is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of "humanity" come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.

4Alexandros15y

The problem is indeed there, but if the goal is to find out the human coherent extrapolated volition, then a definition of human is necessary. If we have no way of picking out human minds from the space of all possible minds, then we don't really know what we're optimizing for. We can't rule out the possibility that a human mind will come into existence that will not be (perfectly) happy with the way things turn out.* This may well be an inherent problem in CEV. If FAI will prevent such humans from coming into existence, then it has in effect enforced its own definition of a human on humanity. But let's try to salvage it. What if you were to use existing humans as a training set for an AI to determine what a human is and is not (assuming you can indeed carve reality/mind-space at the joints, which I am unsure about). Then you can use this definition to pick out the possible human minds from mind-space and calculate their coherent extrapolated volition. This would be resistant to identity-space stuffing like what you describe, but not resistant to systematic wiping out of certain genes/portions of identity-space before CEV-application. But the wiping out of genes and introduction of new ones is the very definition of evolution. We then would need to differentiate between intentional wiping out of genes by certain humans from natural wiping out of genes by reality/evolution, a rabbit-hole I can't see the way out of, possibly a category error. If we can't do that, we have to accept the gene-pool at the time of CEV-activation as the verdict of evolution about what a human is, which leaves a window open to gaming by genocide. Perhaps taking as a starting poing the time of introduction of the idea of CEV as a means of preventing the possibility of manipulation would work, or perhaps trying to infer if there was any intentional gaming of CEV would also work. Actually this would deal with both genocide and stuffing without any additional steps. But this is assuming rew

1Mitchell_Porter15y

Who ever said that CEV is about taking the average utility of all existing humans? The method of aggregating personal utilities should be determined by the extrapolation, on the basis of human cognitive architecture, and not by programmer fiat.

0[anonymous]15y

So how about taking all humans that do exist, determining the boundary humans, and using the entire section of identity-space delineated by them? That is still vulnerable to Dr. Evil killing everyone, but not to the trillion near-copy strategy. No?

2MichaelVassar15y

Yep. This is also, arguably, why cryonics doesn't work.

3JoshuaZ15y

I don't follow your logic. What is the connection to cryonics?

4Vladimir_Nesov15y

If it's just to use volitions of "people in general" as opposed to "people alive now", it might also turn out to be correct to ignore "people present now" altogether, including cryonauts, in constructing the future. So, this is not just an argument for cryonics not working, but also for people alive at the time in no sense contributing/surviving (in other words, everyone survives to an equal degree, irrespective of whether they opted in on cryonics).

-1MichaelVassar15y

Conditional on the development of FAI. The fact that humanity arises and produces FAI capable of timeless trade at a certain rate would cause the human-volition regarding behaviors of all trading super-intelligences and of all human friendly AIs in the multiverse.

2Vladimir_Nesov15y

Can't parse this statement. The fact that a FAI could appear causes "human-volition" about all AGIs? What's human-volition, what does it mean to cause human-volition, how is the fact that FAI could appear relevant?

2Alexandros15y

I'm stumped. Can you elaborate?

1NancyLebovitz15y

Do you mean that cryonics doesn't work, or that cryonics isn't worth doing?

-2MugaSofer12y

Because for every possible human mind, there's one with the sign flipped on it's utility function. Unless your definition of "human" describes their utility function, in which case ...

[-]RobinHanson15y120

One could of course define the CEV in terms of some previous population, say circa the year 2000. But then you might wonder why it is fair to give higher weight to those groups that managed to reproduce from 1900 to 2000, and so defining it in terms of year 1900 people might be better. But then how far back do you go? How is the increase in population that Dr. Evil manages to achieve for his descendants less legitimate than all the prior gains of various groups in previous periods?

0Gunnar_Zarncke12y

It is differnt in so far as he did it intentionally. Our society differentiates between these cases in general, so why should the CEV not?

0Jiro12y

There are cases where groups achieved decreases in population for other groups they didn't like (typically by mass-murdering them). Moral theories about population are based on the people in the surviving population, not the people who would have survived without the mass murder. If a population that is decreased by mass murder produces a group that is acceptable for the purposes of CEV, why not a population that is increased by mass uploading?

0Gunnar_Zarncke12y

I'm not clear what you are driving at. My point was: "Changes in number of individuals caused intentionally do not count". This applies to self-cloning, birth-control and murder alike.

0Jiro12y

As the proposal does not include a clause saying "Rome wiped out the Carthaginians, so we need to add a couple million Carthaginians to the calculation because that's how many would have existed in the present otherwise", I don't think you are seriously proposing that changes in number of individuals caused intentionally don't count, at least when the changes are reductions.

0Gunnar_Zarncke12y

The goal is to hack your share in the number of individuals sharing benefits from CEV right? On this goal I do not disregard all intentions alike. That Rome wiped out Cathargo is not intentional with respect to hacking the CEV. I'd argue that the deeds of our ancestors are time-barred because collective memory forgives that after sufficiently long time (you just have to ask the youth about the Holocaust to see this effect). Thus even there is no statute of limitations on murder on the individual case there effectively is one on societies. There is an effect of time on intentions. This is because it is usually recognized that you can only look only so far into the future even on you own goals. Another approach via an analogy: Assume a rich ancestor dies and leaves a large inheritance and in his last will has benefitted all living ancestors alike (possibly even unborn ones via a family trust). Then by analogy the following holds: * If you kill anyone of the other heirs after the event you usually void your share. * If you kill anyone of the other heirs before the event and the legacy has made no exception for this then you still gain your share. * If you have children (probably including clones) it depends on the statement of the will (aha). If it is a simple heritage your children will only participate from your share if born after the event. If it is a family trust they will benefit equally.

[-]AlexMennen15y91

Simple solution: Build an FAI to optimize the universe to your own utility function instead of humanity's average utility function. They will be nearly the same thing anyway (remember, you were tempted to have the FAI use the average human utility function instead, so clearly, you sincerely care about other people's wishes). And in weird situations in which the two are radically different (like this one), your own utility function more closely tracks the intended purpose of an FAI.

5AlexMennen15y

Here's what I've been trying to say: The thing that you want an FAI to do is optimize the universe to your utility function. That's the definition of your utility function. This will be very close to the average human utility function because you care about what other people want. If you do not want the FAI to do things like punishing people you hate (and I assume that you don't want that), then your utility function assigns a great weight to the desires of other people, and if an FAI with your utility function does such a thing, it must have been misprogrammed. The only reason to use the average human utility function instead is TDT: If that's what you are going to work towards, people are more likely to support your work. However, if you can convince them that on the average, your utility function is expected to be closer to theirs than the average human's is because of situations like this, then that should not be an issue.

4Roko15y

I dispute this claim: There is a worrying tendency on LW to acknowledge verbally moral antirealism, but then argue as if moral realism is true. We have little idea how much our individual extrapolations will disagree on what to do with the universe, indeed there is serious doubt over just how weird those extrapolations will seem to us. There is no in-principle reason for humans to agree on what to do under extrapolation, and in practice we tend to disagree a lot before extrapolation.

1AlexMennen15y

I did not intend to imply that moral realism was true. If I somehow seemed to indicate that, please explain so I can make the wording less confusing. True, but many of the disagreements between people relate to methods rather than goals or morals, and these disagreements are not relevant under extrapolation. Plus, I want other people to get what they want, so if an AI programmed to optimize the universe to my utility function does not do something fairly similar to optimizing the universe to the average human utility function, either the AI is misprogrammed or the average human utility function changed radically through unfavorable circumstances like the one described in the top-level post. I suspect that the same thing is true of you. And if you do not want other people to get what they want, what is the point of using the average human utility function in the first place?

-1Roko15y

Bob wants the AI to create as close an approximation to hell as possible, and throw you into it forever, because he is a fundamentalist christian. Are you sure you want bob to get what he wants?

2AlexMennen15y

Most fundamentalist christians, although believing that there is a hell and that people like me are destined for it, and want their religion to be right, probably would not want an approximation of their religion created conditional on it not already being right. An AI cannot make Bob right. That being said, there probably are some people who would want me thrown into hell anyway even if their religion stipulating that I would be was not right in the first place. I should amend my statement: I want people to get what they want in ways that do not conflict, or conflict only minimally, with what other people want. Also, the possibility that there are a great many people like the Bob (as I said, I'm not quite sure how many fundamentalists would want to make their religion true even if it isn't) is a very good reason not to use the average human utility function for the CEV. As you said, I do not want Bob to get what he wants and I suspect that you don't either. So why would you want to create an FAI with a CEV that is inclined to accommodate Bob's wish (which greatly conflicts with what other people want) if it proves especially popular?

1Blueberry15y

CEV doesn't just average people's wishes. It extrapolates what people would do if they were better informed. Even if Bob wants to create a hell right now, his extrapolated volition may be for something else.

0Roko15y

I wouldn't.

2AlexMennen15y

Well, I suppose we can reliably expect that there are not enough people like Bob, and me getting tortured removes much more utility from me than it gives Bob, but that's missing the point. Imagine yourself in a world in which the vast majority of people want to subject a certain minority group to eternal torture. The majority who want that minority group to be tortured is so vast that an FAI with an average human utility function-based CEV would be likely to subject the members of that minority group to eternal torture. You have the ability to create an FAI with a CEV based off of the average human utility function, with your personal utility function, or not at all. What do you do?

0Roko15y

With my personal utility function, of course, which would, by my definition of the term "right", always do the right thing.

2AlexMennen15y

Silly me, I thought that we were arguing about whether using a personal utility function is a better substitute, and I was rather confused at what appeared to be a sudden concession. Looking at the comments above, I notice that you in fact only disputed my claim that the results would be very similar.

1purpleposeidon15y

I want bob to think he gets what he wants.

0Douglas_Knight15y

There are a lot of different positions people could take and I think you often demand unreasonable dichotomies. First, there is something more like a trichotomy of realism, (anti-realist) cognitivism and anti-cognitivism. Only partially dependent on that is the question of extrapolation. One could believe that there is a (human-)right answer to human moral questions here-and-now, without believing that weirder questions have right answers or that the answer to simple questions would be invariant under extrapolation. Just because philosophers are wasting the term realism, doesn't mean that it's a good idea to redefine it. You are the one guilty of believing that everyone will converge on a meaning for the word. I happen to agree with the clause you quote because I think the divergence of a single person is so great as to swamp 6 billion people. I imagine that if one could contain that divergence, one would hardly worry about the problem of different people.

2Roko15y

Today, people tend to spend more time and worry about the threat that other people pose than the threat that they themselves (in another mood, perhaps) pose. This might weakly indicate that inter-person divergence is bigger than intra-person. Looking from another angle, what internal conflicts are going to be persistent and serious within a person? It seems to me that I don't have massive trouble reconciling different moral intuitions, compared to the size and persistence of, say, the Israel-Palestine conflict, which is an inter-person conflict.

2Roko15y

The difference between Eliezer's cognitivism and the irrealist stance of, e.g. Greene is just syntactic, they mean the same thing. That is, they mean that values are arbitrary products of chance events, rather than logically derivable truths.

2derefr15y

This seems to track with the Eliezer's fictional "conspiracies of knowledge": if we don't want our politicians to get their hands on our nuclear weapons (or the theory for their operation), then why should they be allowed a say in what our FAI thinks?

1PhilGoetz15y

Besides, the purpose of CEV is to extrapolate the volition humanity would have if it were more intelligent - and since you just created the first AI, you are clearly the most intelligent person in the world (not that you didn't already know that). Therefore, using your own current utility function is an even better approximation than trying to extrapolate humanity's volition to your own level of intelligence!

0[anonymous]15y

"I was tempted not to kill all those orphans, so clearly, I'm a compassionate and moral person."

2AlexMennen15y

That's not an accurate parallel. The fact that you thought it was a good idea to use the average human utility function proves that you expect it to have a result almost identical to an FAI using your own utility function. If the average human wants you not to kill the orphans, and you also want not to kill the orphans, it doesn't matter which algorithm you use to decide not to kill the orphans.

1[anonymous]15y

I think that you're looking too deeply into this; what I'm trying to say is that accepting excuses of the form "I was tempted to do ~x before doing x, so clearly I have properties characteristic of someone who does ~x" is a slippery slope.

2Kingreaper15y

If you killed the orphans because otherwise Dr. Evil would have converted the orphans into clones of himself, and taken over the world, then your destruction of the orphanage is more indicative of a desire for Dr. Evil not to take over the world than any opinion on orphanages. The fact you were tempted not to destroy the orphanage (despite the issue of Dr. Evil) is indicative of the fact you don't want to kill orphans.

0[anonymous]15y

I don't see how it is slippery at all. Instead, it seems that you have simply jumped off the slope. If you were tempted to save the orphans you have some properties that lead to not killing orphans. You likely share some properties with compassionate, moral people. That doesn't make you compassionate or moral. I'm often tempted to murder people by cutting out their heart and shoving it into their mouth. This doesn't make me a murderer, but it does mean I have some properties characteristic of murderers.

-2MugaSofer12y

What if you're a preference utilitarian?

3AlexMennen12y

If you are a true preference utilitarian, then the FAI will implement preference utilitarianism when it maximizes your utility function.

-2MugaSofer12y

My point was that a preference utilitarian would let Dr Evil rule the world, in that scenario. Although, obviously, if you're a preference utilitarian then that's what you actually want.

[-]Oscar_Cunningham15y80

The thing I've never understood about CEV is how the AI can safely read everyone's brain. The whole point of CEV is that the AI is unsafe unless it has a human value system, but before it can get one, it has to open everyones heads and scan their brains!? That doesn't sound like something I'd trust a UFAI to do properly.

I bring this up because without knowing how the CEV is supposed to occur it is hard to analyse this post. I also agree with JoshuaZ that this didn't deserve a top-level post.

[-]CarlShulman15y170

Presumably by starting with some sort of prior, and incrementally updating off of available information (the Web, conversation with humans, psychology literature, etc). At any point it would have to use its current model to navigate tradeoffs between the acquisition of new information about idealised human aims and the fulfillment of those aims.

This does point to another more serious problem, which is that you can't create an AI to "maximize the expected value of the utility function written in this sealed envelope" without a scheme for interpersonal comparison of utility functions (if you assign 50% probability to the envelope containing utility function A, and 50% probability to the envelope containing utility function B, you need an algorithm to select between actions when each utility function alone would favor a different action). See this OB post by Bostrom.

2Mitchell_Porter15y

The C in CEV stands for Coherent, not Collective. You should not think of CEV output as occurring through brute-force simulation of everyone on Earth. The key step is to understand the cognitive architecture of human decision-making in the abstract. The AI has to find the right concepts (the analogues of utility function, terminal values, etc). Then it is supposed to form a rational completion and ethical idealization of the actual architecture, according to criteria already implicit in that architecture. Only then does it apply the resulting decision procedure to the contingent world around us.

4Oscar_Cunningham15y

Still not following you. Does it rely on everyone's preferences or not? If it does then it has to interact with everybody. It might not have to scan their brains and brute force an answer, but it has to do something to find out what they want. And surely this means letting it loose before it has human values? Even if you plan to have it just go round and interview everyone, I still wouldn't trust it.

[-]Mitchell_Porter15y100

Does it rely on everyone's preferences or not? If it does then it has to interact with everybody.

CEV is more like figuring out an ethical theory, than it is about running around fighting fires, granting wishes, and so on. The latter part is the implementation of the ethical theory. That part - the implementation - has to be consultative or otherwise responsive to individual situations. But the first part, CEV per se - deciding on principles - is not going to require peering into the mind of every last human being, or even very many of them.

It is basically an exercise in applied neuroscience. We want to understand the cognitive basis of human rationality and decision-making, including ethical and metaethical thought, and introduce that into an AI. And it's going to be a fairly abstract thing. Although human beings love food, sex, and travel, there is no way that these are going to be axiomatic values for an AI, because we are capable of coming up with ideas about what amounts to good or bad treatment of organisms or entities with none of those interests. So even if our ethical AI looks at an individual human being and says, that person should be fed, it won't be because its theo... (read more)

[-]blogospheroid15y70

Summing up the only counterhacks presented, not including deeper discussions of the other issues people had with CEV.

Taking into account only variances from one mind to another, so that very similar minds cluster and their volition is taken into account, but not given any great preference. running into problem - normal human majorities are also made into minorities.
Taking into account cycle time of humans
Taking into account unique experiences weighted by hours of unique experience
Doing CEV on possible human minds, instead of present human minds **

... (read more)

[-]Yoreth15y70

This seems to be another case where explicit, overt reliance on a proxy drives a wedge between the proxy and the target.

One solution is to do the CEV in secret and only later reveal this to the public. Of course, as a member of said public, I would instinctively regard with suspicion any organization that did this, and suspect that the proffered explanation (some nonsense about a hypothetical "Dr. Evil") was a cover for something sinister.

2blogospheroid15y

Since I wrote about Extrapolated Volition as a solution to Goodhart's law, I think I should explain why i did so. Here, what is sought is friendliness (your goal - G), whereas the friendliness architecture, the actual measureable thing, the goal is the proxy(G*). Extrapolated volition is one way of avoiding G* divergence from G because when one extrapolates the volition of the persons involved, one gets closer to G. In friendly AI, the entire living humanity's volition is sought to be extrapolated. Unfortunately, this proxy, like any other proxy, is subject to hack attacks. The scale of this problem is such that other solutions proposed cannot be utilised. EDIT : edited for grammar in 3rd para

2Houshalter15y

Thats the number one thing they are doing wrong then. This is exactly why you don't want to do that. Instead, the original programmer(s)'s volition should be the one to be extrapolated. If the programmer wants what is best for humanity, then the AI will also. If the programmer doesn't want whats best, then why would you expect him to make this for humanity in the first place? See, by wanting what is best for humanity, the programmer also doesn't want all potential bugs and problems that could come up like this one. The only problem I can see is if there are multiple people working on it. Do they put their trust into one leader that will then take control?

4JoshuaZ15y

You are assuming that the programmer's personal desired reflect what is best for humans as whole. Relying on what humans think that is rather than a top-down approach will likely work better. Moreover, many people see an intrinsic value in some form of democratic approach. Thus, even if I could program a super-smart AI to push through my personal notion of "good" I wouldn't want to because I'd rather let collective decision making occur than impose my view on everyone. This is aside from other issues like the fact that there likely won't be a single programmer for such an AI but rather a host of people working on it. A lot of these issues are discussed in much more detail in the sequences and older posts. You might be downvoted less if you read more of those instead of rehashing issues that have been discussed previously. At least if you read those, you'll know what arguments have been made before and which have not been brought up. Many online communities one can easily jump into without reading much of their recommended reading. Unfortunately, that's not the case for Less Wrong.

1kodos9615y

I don't seem to recall any of the sequences specifically addressing CEV and such (I read about it via eliezer's off-site writings). Did I miss a sequence somewhere?

2JoshuaZ15y

I wasn't sure. That's why I covered my bases with "sequences and older posts." But I also made my recommendation above because many of the issues being discussed by Houshalter aren't CEV specific but general issues of FAI and metaethics, which are covered explicitly in the sequences.

1Houshalter15y

But my point is, if thats what you want, then it will do it. If you want to make it a democracy, then you can spend years trying to figure out every possible exception and end up with a disaster like whats presented in this post, or you can make the AI and it will organize everything the way you want it as best it can without creating any bizzare loopholes that could destroy the world. Its always going to be a win-win for whoever created it. Possibly, though I doubt it. But even if it is, you can just do that democracy thing on the group in question, not the whole world. Also, until your AI is smart enough and powerful enough to work at that level, its going to be extremely dangerous to declare that the AI will be in charge of the world from then on. Even if its working perfectly, without the proper resources and strategy in place, its going to be very though to just "take over" and it will likely cost lives. Infact, to me thats the scariest part of AI. Good or bad, at some point the old system is going to have to be abolished. I only have so much time in a day and in that time there is only so much I can read/do. But I do try.

5JoshuaZ15y

Well, thankfully a lot of the people here care enough about the opinions of others that they want to work out a framework that will work well for others. Note incidentally, that it isn't necessarily the case that it will even be a win for the programmer. Bad AI's can end up trying to paperclip the Earth . Even the democracy example would be difficult for the AI to achieve. Say for example that I tell the AI to determine things with a democratic system and to give that a highest priority and then a majority of people decide to do away with the democracy, what is the AI supposed to do? Keep in mind that AI are not going to act like villainous computers from bad scifi where simply giving the machines an apparent contradiction will make them overheat and meltdown. This is an example where knowing about prior discussions here would help. In particular, you seem to be assuming that the AI will take quite a bit of time to get to be in charge. Now, as a conclusion, that's one I agree with. But a lot of very smart people such as Eliezer Yudkowsky consider the chance that an AI might take over in a very short timespan to be very high. And a decent number of LWians agree with Eliezer or at least consider such results to be likely enough to take seriously. So just working off the assumption that an AI will come to global power but will do so slowly is not a good assumption here: It is one you can preface explicitly as a possibility and say something like "If AI doesn't foom very fast then " but just taking your position for granted like that is a major reason you are getting downvoted.

-2Houshalter15y

That's my point. If they do care about that, then the AI will do it. If it doesn't, then its not working right. Bad AI's can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well. It's only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so. Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn't interfaced with the actual world enough that you could control everything from it, and I can't see any possible way any entity could take over. Doesn't mean it can't happen, but its also wrong to assume it will.

3JoshuaZ15y

So care about other people how? And to what extent? That's the point of things like CEV. Insufficient imagination. What if for example we tell the AI to try the first one and then it decides that the solution is to kill the people who don't support a democracy? That's the point, even when you've got something resembling a rough goal, you are assuming your AI will accomplish the goals the way a human would. To get some idea of how easily something can go wrong it might help to say read about the stamp collecting device for starters. There's a lot that can go wrong with an AI. Even dumb optimizers often arrive at answers that are highly unexpected. Smart optimizers have the same problems but more so. What matters is that an unfriendly AI will make things bad for everyone. If someone screws up just once and makes a very smart paperclipper then that's an existential threat to humanity. Well, no one is assuming that it will. But some people assign the scenario a high probability, and it only needs a very tiny probability to really be a bad scenario. Note incidentally that there's a lot a very smart entity could do simply with basic internet access. For example, consider what happens if the AI finds a fast way to factor numbers. Well then, lots of secure communication channels over the internet are now vulnerable. And that's aside from the more plausible but less dramatic problem of an AI finding flaws in programs that we haven't yet noticed. Even if our AI just decided to take over most of the world's computers to increase processing power that's a pretty unpleasant scenario for the rest of us. And that's on the lower end of problems. Consider how often some bad hacking incident occurs where some system that should not have been online is accessible online. Now think about how many automated or nearly fully automated plants there are (for cars, for chemicals for 3-rd printing). And that situation will only get worse over the next few years. Worse, a smart AI can lik

7[anonymous]15y

If I understand Houshalter correctly, then his idea can be presented using the following story: Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep moral reflection and finally come up with the perfect implementation of CEV completely impervious to the tricks of Dr. Evil and his ilk. However, morality upon which you have reflected for all those years isn't an external force accessible only to humans. It is a computation embedded in your brain. Whatever you ended up doing was the result of your brain-state at the beginning of the story and stimuli that have affected you since that point. All of this could have been simulated by a Sufficiently Smart™ AGI. So the idea is: instead of spending those years coming up with the best goal system for your AGI, simply run it and tell it to simulate a counterfactual world in which you did and then do what you would have done. Whatever will result from that, you couldn't have done better anyway. Of course, this is all under the assumption that formalizing Coherent Extrapolated Volition is much more difficult than formalizing My Very Own Extrapolated Volition (for any given value of me).

3Blueberry15y

Thanks for that link. That is brilliant, especially Eliezer's comment:

[-]Scott Alexander15y60

EDIT: Doesn't work, see Wei Dai below.

This isn't a bug in CEV, it's a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.

But allowing Dr. Evil to clone himself is bad; it will reduce the utility of all currently existing humans except Dr. Evil.

If a normal, relatively nice but non-philosopher human ascended to godhood, ve would probably ignore Dr. Evil's clones' wishes. Ve would destroy the clones and imprison the doctor, because ve... (read more)

[-]Wei Dai15y110

"I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility."

Wait, increase utility according to what utility function? If it's an aggregate utility function where Dr. Evil has 99% weight, then why would that precommitment increase utility?

7Scott Alexander15y

You're right. It will make a commitment to stop anyone who tries the same thing later, but it won't apply it retroactively. The original comment is wrong.

-4MugaSofer12y

The current CEV of humanity, or your best estimate of it, I think. If someone forces us to kill orphans or they'll destroy the world, saving the world is higher utility, but we still want to punish the guy who made it so. I think that's where the idea came from, anyway; I agree with Yvain that it doesn't work.

1Douglas_Knight15y

I think that's wrong. At the very least, I don't think it matches the scenario in the post. In particular, I think "how many people are there?" is a factual question, not a moral question. (and the answer is not an integer)

0Dre15y

But the important (and moral) question here is "how do we count the people for utility purposes." We also need a normative way to aggregate their utilities, and one vote per person would need to be justified separately.

0Blueberry15y

This scenario actually gives us a guideline for aggregating utilities. We need to prevent Dr. Evil from counting more than once. One proposal is to count people by different hours of experience, so that if I've had 300,000 hours of experience, and my clone has one hour that's different, it counts as 1/300,000 of a person. But if we go by hours of experience, we have the problem that with enough clones, Dr. Evil can amass enough hours to overwhelm Earth's current population (giving ten trillion clones each one unique hour of experience should do it). So this indicates that we need to look at the utility functions. If two entities have the same utility function, they should be counted as the same entity, no matter what different experiences they have. This way, the only way Dr. Evil will be able to aggregate enough utility is to change the utility function of his clones, and then they won't all want to do something evil. Something like using a convergent series for the utility of any one goal might work: if Dr. Evil wants to destroy the world, his clone's desire to do so counts for 1/10 of that, and the next clone's desire counts for 1/100, so he can't accumulate more than 10/9 of his original utility weight.

[-]Roko15y50

In a sense, this kind of thing (value drift due to differential reproduction) is already happening. For example, see this article about the changing racial demographics of the USA:

According to an August 2008 report by the U.S. Census Bureau, those groups currently categorized as racial minorities—blacks and Hispanics, East Asians and South Asians—will account for a majority of the U.S. population by the year 2042. Among Americans under the age of 18, this shift is projected to take place in 2023

An increasing Latin-American population in the US seems to... (read more)

9NancyLebovitz15y

What effect do you think the demographic shift will have? It isn't as though blacks, Hispanics, East Asians, and South Asians are going to be a single power bloc.

5Roko15y

I honestly have no idea, and I wish I had a good way of thinking about this. Far right people say that the end of white America will be the beginning of doom and disaster. It is important not to let either politically motivated wishful thinking or politically correct wishful thinking bias one when trying to get to the real facts of the matter. One way to think about the problem is to note that different immigrant populations have different social outcomes, and that there is some evidence that this is partially genetic, though it is obviously partly cultural. You could simply say, what would happen if you extrapolated the ethnic socioeconomics in proportion to their increased share of the population. So, in this spherical cow model of societies, overall American socioeconomic and political variables are simply linear combinations of those for ethnic subpopulations. In this simple model, fewer whites would be a bad thing, though more east asians would probably be a good thing. Another effect is to note that humans innately cluster and ally with those who are similar to them, so as the ethinic diversity (you could even calculate an ethnic entropy as SIGMA[ r_i log r_i ] ) increases, you'd expect the country to be more disunified, and more likely to have a civil war or other serious internal conflict. (Hypothesis: ethnic entropy correlates with civil wars). My rational expectation is that the demographic shift will have a mildly bad effect on overall welfare in 2042, but that no one will believe me because it sounds "racist". If the trend continues to 2100 and no major effect intervenes, US will be Hispanic majority as far as I know. My rational expectation of outcomes in this case is that something disruptive will happen, i.e. that the model will somehow break, perhaps violently. I just can't imagine a smooth transition to a majority Hispanic US.

1RobinZ15y

What, by 2042? There's no chance that it will take all the way to 2043 for this to occur? I wish I could be as confident in predicting events three decades yet to come! (So far as I can tell from tracking down the press release and then the study in question, the prediction in question was formed by estimating birth, death, and migration factors for each age group within each demographic and performing the necessary calculations. Error bars are not indicated anywhere I have found.)

6Roko15y

I don't think that the message is significantly changed if you add some variance to that.

1RobinZ15y

You're right - I'm just a bit trigger-happy about futurism.

9Roko15y

I think that a serious flaw of Less Wrong is that the majority of commenters weigh the defensibility of a statement far higher than the value of the information that it carries (in the technical sense of information value). A highly defensible statement can be nearly useless if it doesn't pertain to something of relevance, whereas a mildly inaccurate and/or mildly hard to rhetorically defend statement can be extremely valuable if it reveals an insight that is highly relevant to aspects of reality that we care about.

[-]Blueberry15y50

I just realized there's an easier way to "hack" CEV. Dr. Evil just needs to kill everyone else, or everyone who disagrees with him.

[-]Alexandros15y50

What if the influence is weighted by degree of divergence from the already-scanned minds, something like a reverse PageRank? All Dr. Evils would cluster, and therefore count as bit above-1 vote. Also, this could cover the human-spectrum better, less influenced by cultural factors. I guess this would give outliers much more influence but if outliers are in all directions, would they cancel each other out? What else could go terribly wrong with this?

[-]Tiiba15y130

Imagine if elections worked that way: one party, one vote, so the time cubists would get only slightly less influence than the Democrats. I dunno...

6blogospheroid15y

Whuffie, the money/social capital analog in Cory Doctorow's "down and out in the magic kingdom" had some feature like that. Right handed whuffie was whuffie was given by people you like and left handed whuffie was whuffie given by people you didn't like. So, mind similarity could incorporate some such measure, where left handed whuffie was made really important.

1Alexandros15y

If I've learned to rely on something, it's this community providing references to amazing and relevant material. Thank you so much. Aside: As it turns out, the Whuffie, especially its weighted variety, is very close to what I had in mind for a distributed cloud architecture I am working on. I called the concept 'community currency' but the monetary reference always puts people off. Perhaps referring to Whuffie (or the related Kudos) will help better communicate. Again, many thanks.

0timtyler15y

It sounds as though it has much the same problems - most obviously difficulty of implementation. Technology has historically been used to give those in power what they want. Are those in power likely to promote a system that fails to assign their aims privilidged status? Probably not, IMO. There's democracy - but that still leaves lots of room for lobbying.

[-]Vladimir_Nesov15y40

Since Dr. Evil is human, it shouldn't be that bad. Extrapolated volition kicks in, making his current evil intentions irrelevant, possibly even preferring to reverse the voting exploit.

[-]Baughn15y150

That is not the most inconvenient possible world.

The conservative assumption to make here is that Dr. Evil is, in fact, evil; and that, after sufficient reflection, he is still evil. Perhaps he's brain-damaged; it doesn't matter.

Given that, will he be able to hijack CEV? Probably not, now that the scenario has been pointed out, but what other scenarios might be overlooked?

5Vladimir_Nesov15y

I agree. (Presumably we shouldn't include volitions of tigers in the mix, and the same should go for the actually evil alien mutants.)

5Baughn15y

So, how do we decide who's evil?

1Kutta15y

A surprisingly good heuristic would be "choose only humans".

3Blueberry15y

If I were to implement CEV, I'd start with only biological, non-uploaded, non-duplicated, non-mentally ill, naturally born adult humans, and then let their CEV decide whether to include others.

7NancyLebovitz15y

Is there a biological tech level you're expecting when building an FAI becomes possible? What do you mean by "naturally born"? Are artificial wombs a problem? It's conceivable that children have important input for what children need that adults have for the most part forgotten.

0Blueberry15y

I don't know. We don't actually need any technology other than Python and vi. ;) But it's possible uploads, cloning, genetic engineering, and so forth will be common then. Yes, just to be safe, we should avoid anyone born through IVF, for instance, or whose birth was created or assisted in a lab, or who experienced any genetic modification. I'm not sure exactly where to draw the line: fertility drugs might be ok. I meant anyone conceived through normal intercourse without any technological intervention. Such people can be added in later if the CEV of the others wants them added. Yes, this is a really good point, but CEV adds in what we would add in if we knew more and remembered more.

[-]thomblake15y100

What do you mean by "naturally born"? Are artificial wombs a problem?

Yes, just to be safe, we should avoid anyone born through IVF, for instance, or whose birth was created or assisted in a lab, or who experienced any genetic modification. I'm not sure exactly where to draw the line: fertility drugs might be ok. I meant anyone conceived through normal intercourse without any technological intervention

That's terrible. You're letting in people who are mutated in all sorts of ways through stupid, random, 'natural' processes, but not those who have the power of human intelligence overriding the choices of the blind idiot god. If the extropians/transhumanists make any headway with germline genetic engineering, I want those people in charge.

6NancyLebovitz15y

Exclude people who aren't different or problematic in any perceptible way because of your yuck factor? Minor point, but are turkey basters technology? Aside from the problem of leaving out what seems to be obviously part of the human range, I think that institutionalizing that distinction for something so crucial would lead to prejudice.

-1Blueberry15y

I have no particular yuck factor involving IVF. And you're right that it's not obvious where to draw the line with things like turkey basters. To be safe, I'd exclude them. Keep in mind that this is just for the first round, and the first round group would presumably decide to expand the pool of people. It's not permanently institutionalized. It's just a safety precaution, because the future of humanity is at stake.

2NancyLebovitz15y

What risk are you trying to protect against?

-1Blueberry15y

Something like the Dr. Evil CEV hack described in the main post. Essentially, we want to block out any way of creating new humans that could be used to override CEV, so it makes sense to start by blocking out all humans created artificially. It might also be a good idea to require the humans to have been born before a certain time, say 2005, so no humans created after 2005 can affect CEV (at least in the first round). Turkey basters are probably not a threat. However, there's an advantage to being overly conservative here. The very small number of people created or modified through some sort of artificial means for non-CEV-hacking reasons can be added in after subsequent rounds of CEV. But if the first round includes ten trillion hacked humans by mistake, it will be too late to remove them because they'll outvote everyone else.

2NancyLebovitz15y

Requiring that people have been incubated in a human womb seems like enough of a bottleneck, though even that's politically problematic if there are artificial wombs or tech for incubation in non-humans. However, I'm more concerned that uncaring inhuman forces already have a vote.

4Blueberry15y

You may also be interested in this article: Can the common brain parasite, Toxoplasma gondii, influence human culture?

0Blueberry15y

You're probably right. It probably is. But we lose nothing by being more conservative, because the first round of CEV will add in all the turkey baster babies.

6[anonymous]15y

What consitutes mental ilness is a horrible can of worms. Even defining the borders of what consitutes brain damage is terribly hard.

2Baughn15y

Ha. Okay, that's a good one. You might find that deciding who's mentally ill is a little harder, but the other criteria should be reasonably easy to define, and there are no obvious failure conditions. Let me think this over for a bit. :)

2Strange715y

Define human.

8Kutta15y

Featherless biped.

[-][anonymous]15y210

Ten thousand years later, postkangaroo children learn from their history books about Kutta, the one who has chosen to share the future with his marsupial brothers and sisters :)

3Strange715y

If an upload remembers having had legs, and/or is motivated to acquire for itself a body with exactly two legs and no feathers, please explain either how this definition would adequately exclude uploads or why you are opposed to equal rights for very young children (not yet capable of walking upright) and amputees.

2anonym15y

Includes sexbots, and excludes uploaded versions of me.

1Blueberry15y

The point is to exclude uploaded versions of you. I'm more concerned about including plucked chickens. BTW, what is the difference between a sexbot and a catgirl?

2anonym15y

A sexbot is a robot for sex -- still a human under the featherless biped definition as long as has two legs and no feathers. If the point is to exclude "uploaded versions", what counts as uploaded? How about if I transfer my mind (brain state) to another human body? If that makes me still a human, what rational basis is there for defining a mind-body system as human or not based on the kind of the body it is running in?

5dclayh15y

Moreover, the CEV of one trillion Evil-clones will likely be vastly different from the CEV of one Dr. Evil. For instance, Dr. Evil may have a strong desire to rule all human-like beings, but for each given copy this desire will be canceled out by the desire of the 1 trillion other copies not to be ruled by that copy.

1Vladimir_Nesov15y

No matter what he currently thinks, it doesn't follow that it's his extrapolated volition to rule.

2blogospheroid15y

The hack troubled me and I read part of the CEV document again. The terms "Were more the people we wished we were" , "Extrapolated as we wish that extrapolated", and "Interpreted as we wish that interpreted" are present in the CEV document explaining extrapolation. These pretty much guarantee that a hack such like what Wei Dai mentioned would be an extremely potent one. However, the conservatism in the rest of the document, with phrases like below seem to take care of it fairly well. "It should be easier to counter coherence than to create coherence. " "The narrower the slice of the future that our CEV wants to actively steer humanity into, the more consensus required. " "the initial dynamic for CEV should be conservative about saying "yes", and listen carefully for "no". " I just hope the actual numbers when entered match that. If they are, then I think the CEV might just come back with to the programmers saying "I see something weird. Kindly explain"

3novalis15y

This sounded really good when I read it in the CEV paper. But now I realize that I have no idea what it means. What is the area being measured for "narrowness"?

0blogospheroid15y

My understanding of narrower future is more choices taken away weighted by the number of people they are taken away from, compared to the matrix of choices present at the time of activation of CEV.

1novalis15y

There are many problems with this definition: (1) it does not know how to weight choices of people not yet alive at time of activation. (2) it does not know how to determine which choices count. For example, is Baskin Robbins to be preferred to Alinea, because Baskin Robbins offers 31 choices while Alinea offers just one (12 courses or 24)? Or Baskin Robbins^^^3 for most vs 4 free years of schooling in a subject of choice for all? Does it improve the future to give everyone additional unpalatable choices, even if few will choose them? I understand that CEV is supposed to be roughly the sum over what people would want, so some of the more absurd meanings would be screened off. But I don't understand how this criterion is specific enough that if I were a Friendly superpower, I could use it to help me make decisions.

0CarlShulman15y

But he should still give sizable credence to that desire persisting.

1Vladimir_Nesov15y

Why? And, more importantly, why should he care? It's in his interest to have the FAI follow his extrapolated volition, not his revealed preference, be it in the form of his own belief about his extrapolated volition or not.

3CarlShulman15y

Because the power of moral philosophy to actually change things like the desire for status is limited, even in very intelligent individuals interested in moral philosophy. The hypothesis that thinking much faster, knowing much more, etc, will radically change that has little empirical support, and no strong non-empirical arguments to produce an extreme credence.

1Vladimir_Nesov15y

When we are speaking about what to do with the world, which is what formal preference (extrapolated volition) is ultimately about, this is different in character (domain of application) from any heuristics that a human person has for what he personally should be doing. Any human consequentialist is a hopeless dogmatic deontologist in comparison with their personal FAI. Even if we take both views as representations of the same formal object, syntactically they have little in common. We are not comparing what a human will do with what advice that human will give to himself if he knew more. Extrapolated volition is a very different kind of wish, a kind of wish that can't be comprehended by a human, and so no heuristics already in mind will resemble heuristics about that wish.

4Wei Dai15y

But you seem to have the heuristic that the extrapolated volition of even the most evil human "won't be that bad". Where does that come from?

0Vladimir_Nesov15y

That's not a heuristic in the sense I use the word in the comment above, it's (rather weakly) descriptive of a goal and not rules for achieving it. The main argument (and I changed my mind on this recently) is the same as for why another normal human's preference isn't that bad: sympathy. If human preference has a component of sympathy, of caring about other human-like persons' preferences, then there is always a sizable slice of the control of the universe pie going to everyone's preference, even if orders of magnitude smaller than for the preference in control. I don't expect that even the most twisted human can have a whole aspect of preference completely absent, even if manifested to smaller degree than usual. This apparently changes my position on the danger of value drift, and modifying minds of uploads in particular. Even though we will lose preference to the value drift, we won't lose it completely, so long as people holding the original preference persist.

3Wei Dai15y

Humans also have other preferences that are in conflict with sympathy, for example the desire to see one's enemies suffer. If sympathy is manifested to a sufficiently small degree, then it won't be enough to override those other preferences. Are you aware of what has been happening in Congo, for example?

0steven046115y

It seems to me there's a pretty strong correlation between philosophical competence and endorsement of utilitarian (vs egoist) values, and also that most who endorse egoist values do so because they're confused about e.g. various issues around personal identity and the difference between pursuing one's self-interest and following one's own goals.

7mattnewport15y

Can we taboo utilitarian since nobody ever seems to be able to agree what it means? Also, do you have any references to strong arguments for whatever you mean by utilitarianism? I've yet to encounter any good arguments in favour of it but given how many apparently intelligent people seem to consider themselves utilitarians they presumably exist somewhere.

0RomanDavis15y

Utility is just a basic way to describe "happiness" (or, if you prefer, "preferences") in an economic context. Sometimes the measurement of utility is a utilon. To say you are a Utilitarian just means that you'd prefer an outcome that results in the largest total number of utilons over tthe human population. (Or in the universe, if you allow for Babyeaters, Clippies, Utility Monsters, Super Happies , and so on.)

3mattnewport15y

Alicorn, who I think is more of an expert on this topic than most, had this to say: Just the other day I debated with PhilGoetz whether utilitarianism is supposed to imply agent-neutrality or not. I still don't know what most people mean on that issue. Even assuming agent neutrality there is a major difference between average and total utilitarianism. Then there are questions about whether you weight agents equally or differently based on some criteria. The question of whether/how to weight animals or other non-human entities is a subset of that question. Given all these questions it tells me very little about what ethical system is being discussed when someone uses the word 'utilitarian'.

1JoshuaZ15y

It does substantially reduce the decision space. For example, it is generally a safe-bet that the individual is not going to subscribe to deontological claims that say "killing humans is always bad." I'd thus be very surprised to ever meet a pacifist utilitarian. It probably is fair to say that given the space of ethical systems generally discussed on LW, talking about utilitarianism doesn't narrow the field down much from that space.

0timtyler15y

I haven't seen any stats on that issue. Is there any evidence relating to the topic?

5mattnewport15y

Depending on how you define 'philosophical competence' the results of the PhilPapers survey may be relevant. Here are the stats for Philosophy Faculty or PhD, All Respondents And for Philosophy Faculty or PhD, Area of Specialty Normative Ethics As utilitarianism is a subset of consequentialism it appears you could conclude that utilitarians are a minority in this sample.

0timtyler15y

Thanks! For perspective: * 2.1 Utilitarianism * 2.2 Ethical egoism and altruism * 2.3 Rule consequentialism * 2.4 Motive consequentialism * 2.5 Negative consequentialism * 2.6 Teleological ethics * http://en.wikipedia.org/wiki/Consequentialism#Varieties_of_consequentialism

1mattnewport15y

Unfortunately the survey doesn't directly address the main distinction in the original post since utilitarianism and egoism are both forms of consequentialism.

[-]PaulAlmond15y30

I think this can be dealt with in terms of measure. In a series of articles, "Minds, Measure, Substrate and Value" I have been arguing that copies cannot be considered equally, without regard to substrate: We need to take account of measure for a mind, and the way in which the mind is implemented will affect its measure. (Incidentally, some of you argued against the series: After a long delay [years!], I will be releasing Part 4, in a while, which will deal with a lot of these objections.)

Without trying to present the full argument here, the mini... (read more)

2PhilGoetz14y

That was my first reaction, but if you rely on information-theoretic measures of difference, then insane people will be weighted very heavily, while homogenous cultures will be weighted little. The basic precepts of Judaism, Christianity, and Islam might each count as one person.

1jimrandomh15y

Does this imply that someone could gain measure, by finding a simpler entity with volition similar to theirs and self-modifying into it or otherwise instantiating it? If so, wouldn't that encourage people to gamble with their sanity, since verifying similarity of volition is hard, and gets harder the greater the degree of simplification?

0PaulAlmond15y

I think I know what you are asking here, but I want to be sure. Could you elaborate, maybe with an example?

[-]JGWeissman15y30

What if we used a two-tiered CEV? A CEV applied to a small, hand selected group of moral philosophers could be used to determine weighting rules and ad hoc exceptions to the CEV that runs on all of humanity to determine the utility function of the FAI.

Then when the CEV encounters the trillion Dr Evil uploads, it will consult what the group of moral philosophers would have wanted to handle it if "they knew more, thought faster, were more the people we wished we were, had grown up farther together", which would be be something like weight them together as one person.

[-]kodos9615y100

What if we used a two-tiered CEV? A CEV applied to a small, hand selected group of moral philosophers

And who would select the initial group? Oh, I know! We can make it a 3-tiered system, and have CEV applied to an even smaller group choose the group of moral philosophers!

Wait... my spidey-sense is tingling... I think it's trying to tell me that maybe there's a problem with this plan

-2JGWeissman15y

SIAI, whose researchers came up with the idea of CEV because they have the goal representing all of humanity with the FAI they want to create. Ultimately, you have to trust someone to make these decisions. A small, select group will be designing and implementing the FAI anyways.

3kodos9615y

Yes, but a small, select group determining its moral values is a different thing entirely, and seems to defeat the whole purpose of CEV. At that point you might as well just have the small group of moral philosophers explicitly write out a "10 Commandments" style moral code and abandon CEV altogether.

8JGWeissman15y

That is not what I am proposing. You are attacking a straw man. The CEV of the small group of moral philosophers does not determine the utility function directly. It only determines the rules used to run the larger CEV on all of humanity, based on what the moral philosophers consider a fair way of combining utility functions, not what they want the answer to be.

3kodos9615y

That may not be the intention, but if they're empowered to create ad hoc exceptions to CEV, that could end up being the effect. Basically, my problem is that you're proposing to fix to a (possible) problem with CEV, by using CEV. If there really is a problem here with CEV (and I'm not convinced there is), then that problem should be fixed - just running a meta-CEV doesn't solve it. All you're really doing is substituting one problem for another: the problem of who would be the "right" people to choose for the initial bootstrap. Thats a Really Hard Problem, and if we knew how to solve it, then we wouldn't really even need CEV in the first place - we could just let the Right People choose the FAI's utility function directly.

2JGWeissman15y

You seem to be imagining the subjects of the CEV acting as agents within some negotiating process, making decisions to steer the result to their prefered outcome. Consider instead that the CEV is able to ask the subjects questions, which could be about the fairness (not the impact on the final result) of treating a subject of the larger CEV in a certain way, and get honest answers. If your thinking process has a form like "This would be best for me, but that wouldn't really be fair to this other person", the CEV can focus in on the "but that wouldn't really be fair to this other person". Even better, it can ask the question "Is it fair to that other person", and figure out what your honest answer would be. No, I am trying to solve a problem with CEV applied to an unknown set of subjects with a CEV applied to a known set of subjects. The problem of selecting a small group of subjects for the first CEV is orders of magnitude easier than specifying a Friendly utility function. These subjects do not have to write out the utilty function, or even directly care about all things that humanity as a whole cares about. They just have to care about the problem of fairly weighting everyone in the final CEV.

0torekp15y

I think this is an even better point than you make it out to be. It obviates the need to consult the small group of subjects in the first place. It can be asked of everyone. When this question is asked of the Dr. Evil clones, the honest answer would be "I don't give a care what's fair," and the rules for the larger CEV will then be selected without any "votes" from Evil clones.

7Vladimir_Nesov15y

torekp!CEV: "Is it fair to other people that Dr. Evil becomes the supreme ruler of the universe?" Dr. Evil clone #574,837,904,521: "Yes, it is. As an actually evil person, I honestly believe it."

-3D_Alex15y

And right there is the reason why the plan would not work...! The wishes of the evil clones would not converge on any particular Dr. Evil. You'd get a trillion separate little volitions, which would be outweighed by the COHERENT volition of the remaining 1%.

2JoshuaZ15y

That might be true if Dr. Evil's goal is to rule the world. But if Dr. Evil's goals are either a) for the world to be ruled by a Dr. Evil or b) to destroy the world, then this is still a problem. Both of those seem like much less likely failure modes more out of something from a comic book or the like (the fact that we are calling this fellow Dr. Evil doesn't help matters) but it does suggest that there are serious general failures of the CEV protocol.

2stcredzero15y

It could be worse: The reason why there are only two Sith, a master and apprentice, is because The Force can be used to visualize the CEV of a particular group, and The Sith have mastered this and determined that 2 is the largest reliably stable population.

-2MugaSofer12y

An excellent idea! Of course, the CEV of a small group would probably be less precise, but I expect it's good enough for determining the actual CEV procedure. What if it turns out we're ultimately preference utilitarians?

[-]Tyrrell_McAllister15y30

Insofar as the Evil clones are distinct individuals, they seem to be almost entirely potentially distinct. They will need to receive more computing resources before they can really diverge into distinct agents.

I would expect CEV to give the clones votes only to the extent that CEV gives votes to potential individuals. But the number of potential clones of normal humans is even greater than Evil's trillion, even accounting for their slightly greater actuality. So, I think that they would still be outvoted.

5JoshuaZ15y

Does a pair of identical twins raised in the same environment get marginally less weight for the CEV than two unrelated individuals raised apart? If not, how do you draw the line for what degree of distinction matters?

0Baughn15y

This depends on how the AI counts humans. I agree that it should discount similar individuals in this way, but I am not entirely sure on the exact algorithm. What should happen if, for example, there are two almost identical individuals on opposite sides of the world - by accident?

[-]JoshuaZ15y30

That's an interesting point, but I'm having trouble seeing it as worthy of a top-level post. Maybe if you had a solution proposed also.

[-]Wei Dai15y170

I can see why you might feel that way, if this was just a technical flaw in CEV that can be fixed with a simple patch. But I've been having a growing suspicion that the main philosophical underpinning of CEV, namely preference utilitarianism, is seriously wrong, and this story was meant to offer more evidence in that vein.

[-]Vladimir_Nesov15y180

Why should anyone choose aggregation of preference over a personal FAI, other than under explicit pressure? Whatever obligations you feel (as part of your preference, as opposed to as part of an imaginary game where you play fair), will be payed in full according to your personal preference. This explicit pressure to include other folks in the mix can only be exerted by those present, and presumably "in the know", so there is no need to include the dead or potential future folk. Whatever sympathy you have for them, you'll have ability to express through the personal FAI. The virtue of laziness in FAI design again (this time, moral laziness).

9Wei Dai15y

But that doesn't explain why Eliezer is vehemently against any unequal weighting of volitions in CEV, such as the "geopolitical-power-weighted set of volitions" that Roko suggested might be necessary if major political powers got involved. As far as I can tell, Eliezer's actual motivations for wanting to build CEV of humanity instead of a personal FAI are: 1. It's the fair thing to do (in some non-game-theoretic sense of fairness). 2. The CEV of humanity has a better chance to leading to a good outcome than his personal extrapolated volition. (See this comment.) Personally I don't think these reasons are particularly good, and my current position is close to yours and Roko's. But the fact that Eliezer has stuck to his beliefs on this topic makes me wonder if we're missing something.

2Vladimir_Nesov15y

A given person's preference is one thing, but their mind is another. If we do have a personal preference on one hand, and a collection of many people's preferences on the other, the choice is simple. But the people included in the preference extraction procedure are not the same thing as their preferences. We use a collection of people, not a collection of preferences. It's not obvious to me that my personal preference is best described by my own brain and not an extrapolation from as many people's brains as possible. Maybe I want to calculate, but I'm personally a flawed calculator, as are all the others, each in its own way. By examining as many calculators as possible, I could glimpse a better picture of how correct calculation is done, than I could ever find by only examining myself. I value what is good not because humans value what is good, and I value whatever I in particular value (as opposed to what other people value) not because it is I who values that. If looking at other people's minds helps me to figure out what should be valued, then I should do that. That's one argument for extrapolating collective volition; however, it's a simple argument, and I expect that whatever can be found from my mind alone should be enough to reliably present arguments such as this, and thus to decide to go through the investigation of other people if that's necessary to improve understanding of what I value. Whatever moral flaws specific to my mind exist, shouldn't be severe enough to destroy this argument, if it's true, but the argument could also be false. If it's false, then I lose by defaulting to the collective option, but if it's true, delegating it to FAI seems like a workable plan. At the same time, there are likely practical difficulties in getting my mind in particular as the preference source to FAI. If I can't get my preference in particular, then as close to the common ground for humanity as I can get (a decision to which as many people as possible agree as

[-]SilasBarta15y110

If that was your point, I wish you had gone into more detail about that in a top-level article.

6CarlShulman15y

A similar problem shows up in hedonic utilitarianism, or indeed in any case where your algorithm for determining what to do requires 'counting people.'

3RomanDavis15y

Time inconsistency doesn't bother me at all. It's not my fault if you're dead.

3wuwei15y

CEV is not preference utilitarianism, or any other first-order ethical theory. Rather, preference utilitarianism is the sort of thing that might be CEV's output.

6Wei Dai15y

Obviously CEV isn't identical to preference utilitarianism, but CEV and preference utilitarianism have the following principles in common, which the hack exploits: * Give people what they want, instead of what you think is good for them. * If different people want different things, give each individual equal weight. It seems clear that Eliezer got these ideas from preference utilitarianism, and they share some of the same flaws as a result.

1timtyler15y

Whether preference utilitarianism is "right" or "wrong" would appear to depend on whether you are a preference utilitarian - or not.

0[anonymous]15y

Well, you always have game-theoretic mix option, nothing "seriously wrong" with that (and so with preference aggregation more broadly construed than CEV in particular), although it's necessarily a worse outcome than a personal FAI.

6Alexandros15y

To be honest, I loke that it's straight and to the point without being stretched out. And the point itself is quite powerful.

1Kevin15y

I missed the part where Less Wrong had a definition of the required worthiness of a top-level post.

9Tyrrell_McAllister15y

The definition is still being established, largely through comments such as JoshuaZ's

3RobinZ15y

From the About page: As far as I can tell, most people vote along the lines of the promotion criteria.

-2JoshuaZ15y

There's isn't a formal definition as of yet, but ideally I'd like to see top-level posts satisfy the following criteria: 1) Too long or involved to be included in an open thread. 2) Of general interest to the LW community. 3) Contribute substantial new and interesting points. 4) Likely to generate wide-ranging discussion. I have trouble seeing this post as meeting 1 or 4.

7Kevin15y

People also complained about the "AI in a box boxes you post", which was a great post nearly identical in structure to this one. Few people read the open thread; good posts should not default to the open thread. Why is your criteria for top-level posts so arbitrarily difficult? We are not facing a problem of an influx of low quality content and the moderation+promotion system works well.

3JoshuaZ15y

My criteria for top-level posts is not "so arbitrarily difficult." Frankly, I'm not completely sure that that the AI boxing you post should have been a top-level post either. However, given that that post did not focus on any specific AI solution but a more general set of issues, whereas this one focuses on CEV, there may be a distinction between them. That said, I agree that as of right now, the moderation/promotion system is working well. But I suspect that that is partially due to people implicitly applying criteria like the ones I listed in their moderation decisions. Incidentally, I'm curious what evidence you have that the open threads are not as read as top-level posts. In particular, I'm not sure this applies to non-promoted top-level posts. I suspect that it is true, and indeed, if it isn't then my own logic for wanting criterion 2 becomes substantially weaker. Now that you've made our shared premise explicit I have to wonder what evidence we have for the claim.

[-]Wei Dai15y20

I just noticed that my old alter ego came up with a very similar "hack" two years ago:

Why doesn't Zaire just divide himself in half, let each half get 1/4 of the pie, then merge back together and be in possession of half of the pie?

[-]novalis15y10

I think it might be possible to patch around this by weighting people by their projected future cycle count. Otherwise, I fear that you may end up with a Repugnant Conclusion even without an adversary -- a very large number of happy emulated people running very slowly would outweigh a smaller number of equally happy people running at human-brain-speed. Of course, this still gives an advantage to the views of those who can afford more computing power, but it's a smaller advantage. And perhaps our CEV would be to at least somewhat equalize the available computing power per person.

0MugaSofer12y

Of course, the slowest possible clock speed is ... none, or one tick per lifetime of the universe or something, so we'd all end up as frozen snapshots.

[-][anonymous]12y00

Doesn't the "coherent" aspect of "coherent extrapolated volition" imply, generally speaking, that it's not democracy-of-values, so to speak? That is to say, CEV of humanity is supposed to output something that follows on from the extrapolated values of both the guy on the street corner holding a "Death to all fags" sign who's been arrested twice for assaulting gay men outside bars, and the queer fellow walking past him -- if prior to the implementation of a CEV-using FAI the former should successfully mobilize raise the visibi... (read more)

0MugaSofer12y

Presumably the sign guy based his hatred on a mistaken belief (e.g. "God is always right and he told me gays are Evil.") Dr Evil was implied, I think, to have different terminal values; if he didn't then CEV would be fine with him, and it would also ruin the appropriateness his name.

0[anonymous]12y

That covers "Extrapolated", not "coherent", though. If Dr Evil really has supervillain terminal values, that still doesn't cohere with the many humans who don't.

4Elithrion12y

I think you would find that there is more coherence among the 99% of humans who are Dr. Evil than among the 1% of humans who are not.

0MugaSofer12y

Well, psychopaths don't share our moral terminal values, and I would still expect them to get shouted down. Dr Evil's clones outnumber us. I guess it comes down to how small a minority still holds human values, doesn't it?

0TheOtherDave12y

You know, I keep hearing this said on LW as though it were a foregone conclusion. Is there an argument you can point me to that makes the case for believing this?

0MugaSofer12y

My (faulty) memory claims it's from an interview with a psychopath I saw on TV (who had been working on a project to help identify violent psychopaths, and had been unaware of his condition until he tested himself as a control.) He described being aware of what other people considered "right" or "moral", but no particular motivation towards it. His example was buying icecream instead of going to his grandmother's funeral, as I recall. However, I also recall feeling confirmation, not surprise, on watching this interview, so I probably have source amnesia on this one. Still, data point. It's worth noting that your classic "serial killer" almost certainly has other issues in any case.

1TheOtherDave12y

Hm. I infer you aren't asserting that going to one's grandmother's funeral rather than buying icecream is a moral terminal value for non-psychopaths, but rather that there's some moral terminal value implicit in that example which the psychopath in question demonstrably doesn't share but the rest of us do. Is that right? If so, can you say more about how you arrive at that conclusion?

-1MugaSofer12y

Well, it was his example. The idea is that they can model our terminal values (as well as anybody else can) but they aren't moved by them. Just like I can imagine a paperclipper that would cheerfully render down humans for the iron in our blood, but I'm not especially inclined to emulate it.

0TheOtherDave12y

I still don't see how you get from observing someone describing not being moved by the same surface-level social obligations as their peers (e.g., attending grandma's funeral) to the conclusion that that person doesn't share the same moral terminal values as their peers, but leaving that aside, I agree that someone doesn't need to be moved by a value in order to model it.

-2MugaSofer12y

Oh, it was only an example; he described his experience in much more detail. I guess he didn't want to use a more, well, disturbing example; he had been studying violent psychopaths, after all. (He also claimed his murderous predispositions had probably been curbed by a superlative home life.)

0ArisKatsaris12y

Wait a sec, there are three different claims that have seemed to gotten confused in this thread. * Psychopaths don't have a moral sense as we'd recognize it. * Psychopaths have a different moral sense than most people. * Psychopaths have pretty much the same moral sense as us, but it doesn't drive them nearly as much as most people. The difference in the above is between the absense of a moral push, a moral push in a different direction, or a push in the same direction but feebler than is felt by most people. And I think distinguishing between the three is probably significant in discussions about CEV...

-1MugaSofer12y

I think that knowing what people mean by "right" and actually having it as a terminal value are different things, but I'm not sure if 3 means regular garden-variety akrasia or simply terminal values with different weighing to our own.

-1Andreas_Giger12y

If Dr Evil's clones outnumber us, clearly they're the ones who hold human values and we're the psychopaths.

-2MugaSofer12y

In which case it would be nice to find a way to make sure our values are treated as "human" and his as "psychopath", wouldn't it?

[-]wedrifid15y00

Good post. It seems I missed it, I probably wasn't around that time in June. That is one extreme example of a whole range of problems with blindly implementing a Coherent Extrapolated Volition of that kind.

[-]Epiphany13y-10

And then Dr. Evil, forced to compete with 999,999,999,999 copies of himself that all want to rule, is back to square one. Would you see multiplying your competition by 999,999,999,999 as a solution to how to rule the universe? If you were as selfish as Dr. Evil, and intelligent enough to try attempting to take over the universe, wouldn't it occur to you that the copies are all going to want to be the one in charge? Perhaps it won't, but if it did, would you try multiplying your competition, then? If not, then maybe part of the solution to this is makin... (read more)

-2MugaSofer12y

Assume the least convenient possible world; the copies are united against the noncopies.

2wedrifid12y

That is the most convenient possible world from the perspective of making the grandparent's point. Was that your intention? The least convenient possible world would have all the copies united in favor of the original Dr. Evil. The world that I infer is imagined in the original post (also fairly inconvenient to the grandparent) is one in which Dr. Evil wants the universe to be shaped according to his will rather than wanting to be the one who shapes the universe according to his will. This is a world in which having more copies of himself ensures that he does get his way more and so he has an incentive to do it.

-1MugaSofer12y

Well ... yeah. The least convenient world for avoiding the point. Hey, I didn't name it. shrugs That is less convenient. Yup, that's what I meant by "the copies are united".

[-]Jonii15y-20

But of course an AI realizes that satisfying the will of trillion copies of Dr. Evil wasn't what his/her programmers intented.

Pun being, this legendary bad argument is surprisingly strong here. I know, I shouldn't be explaining my jokes.

[-]Unknowns15y120

Of course, the AI realizes that it's programmers did not want it doing what the programmers intended, but what the CEV intended instead, so this response fails completely.

Moderation Log