Holden's Objection 1: Friendliness is dangerous

PhilGoetz

LESSWRONG
LW

Holden's Objection 1: Friendliness is dangerous — LessWrong

13 Holden's Objection 1: Friendliness is dangerous

by PhilGoetz

18th May 2012

7 min read

431

13

Nick_Beckstead asked me to link to posts I referred to in this comment. I should put up or shut up, so here's an attempt to give an organized overview of them.

Since I wrote these, LukeProg has begun tackling some related issues. He has accomplished the seemingly-impossible task of writing many long, substantive posts none of which I recall disagreeing with. And I have, irrationally, not read most of his posts. So he may have dealt with more of these same issues.

I think that I only raised Holden's "objection 2" in comments, which I couldn't easily dig up; and in a critique of a book chapter, which I emailed to LukeProg and did not post to LessWrong. So I'm only going to talk about "Objection 1: It seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous." I've arranged my previous posts and comments on this point into categories. (Much of what I've said on the topic has been in comments on LessWrong and Overcoming Bias, and in email lists including SL4, and isn't here.)

The concept of "human values" cannot be defined in the way that FAI presupposes

Human errors, human values: Suppose all humans shared an identical set of values, preferences, and biases. We cannot retain human values without retaining human errors, because there is no principled distinction between them.

A comment on this post: There are at least three distinct levels of human values: The values an evolutionary agent holds that maximize their reproductive fitness, the values a society holds that maximizes its fitness, and the values a rational optimizer holds who has chosen to maximize social utility. They often conflict. Which of them are the real human values?

Values vs. parameters: Eliezer has suggested using human values, but without time discounting (= changing the time-discounting parameter). CEV presupposes that we can abstract human values and apply them in a different situation that has different parameters. But the parameters are values. There is no distinction between parameters and values.

A comment on "Incremental progress and the valley": The "values" that our brains try to maximize in the short run are designed to maximize different values for our bodies in the long run. Which are human values: The motivations we feel, or the effects they have in the long term? LukeProg's post Do Humans Want Things? makes a related point.

Group selection update: The reason I harp on group selection, besides my outrage at the way it's been treated for the past 50 years, is that group selection implies that some human values evolved at the group level, not at the level of the individual. This means that increasing the rationality of individuals may enable people to act more effectively in their own interests, rather than in the group's interest, and thus diminish the degree to which humans embody human values. Identifying the values embodied in individual humans - supposing we could do so - would still not arrive at human values. Transferring human values to a post-human world, which might contain groups at many different levels of a hierarchy, would be problematic.

I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can. This is an idea that comes from mathematics, symbolic logic, and classical AI. A symbolic approach would probably make proving safety easier. But human brains don't work that way. You can and do change your values over time, because you don't really have terminal values.

Strictly speaking, it is impossible for an agent whose goals are all indexical goals describing states involving itself to have preferences about a situation in which it does not exist. Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes. If they involve ourselves, then utility functions based on these goals cannot even be computed once we die. If they involve our genes, they they are goals that our bodies are pursuing, that we call errors, not goals, when we the conscious agent inside our bodies evaluate them. In either case, there is no logical reason for us to wish to maximize some utility function based on these after our own deaths. Any action I wish to take regarding the distant future necessarily presupposes that the entire SIAI approach to goals is wrong.

My view, under which it does make sense for me to say I have preferences about the distant future, is that my mind has learned "values" that are not symbols, but analog numbers distributed among neurons. As described in "Only humans can have human values", these values do not exist in a hierarchy with some at the bottom and some on the top, but in a recurrent network which does not have a top or a bottom, because the different parts of the network developed simultaneously. These values therefore can't be categorized into instrumental or terminal. They can include very abstract values that don't need to refer specifically to me, because other values elsewhere in the network do refer to me, and this will ensure that actions I finally execute incorporating those values are also influenced by my other values that do talk about me.

Even if human values existed, it would be pointless to preserve them

Only humans can have human values:

The only preferences that can be unambiguously determined are the preferences a person (mind+body) implements, which are not always the preferences expressed by their beliefs.
If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values, since it will behave differently.
Values exist in a network of other values. A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
Supposing that values are referential helps only by telling you to ignore human values.
You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
The future will thus be ethically contentious even if we accurately characterize and agree on present human values, because these values will fail to address the new important problems.

Human values differ as much as values can differ: There are two fundamentally different categories of values:

Non-positional, mutually-satisfiable values (physical luxury, for instance)
Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen

All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict. If you find an alien life form from a distant galaxy with non-positional values, it would be easier to integrate those values into a human culture with only human non-positional values, than to integrate already-existing positional human values into that culture.

It appears that some humans have mainly the one type, while other humans have mainly the other type. So talking about trying to preserve human values is pointless - the values held by different humans have already passed the most-important point of divergence.

Enforcing human values would be harmful

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

Re-reading this, I see that the critical paragraph is painfully obscure, as if written by Kant; but it summarizes the argument: "Once the initial symbol set has been chosen, the semantics must be set in stone for the judging function to be "safe" for preserving value; this means that any new symbols must be defined completely in terms of already-existing symbols. Because fine-grained sensory information has been lost, new developments in consciousness might not be detectable in the symbolic representation after the abstraction process. If they are detectable via statistical correlations between existing concepts, they will be difficult to reify parsimoniously as a composite of existing symbols. Not using a theory of phenomenology means that no effort is being made to look for such new developments, making their detection and reification even more unlikely. And an evaluation based on already-developed values and qualia means that even if they could be found, new ones would not improve the score. Competition for high scores on the existing function, plus lack of selection for components orthogonal to that function, will ensure that no such new developments last."

Averaging value systems is worse than choosing one: This describes a neural-network that encodes preferences, and takes some input pattern and computes a new pattern that optimizes these preferences. Such a system is taken as analogous for a value system and an ethical system to attain those values. I then define a measure for the internal conflict produced by a set of values, and show that a system built by averaging together the parameters from many different systems will have higher internal conflict than any of the systems that were averaged together to produce it. The point is that the CEV plan of "averaging together" human values will result in a set of values that is worse (more self-contradictory) than any of the value systems it was derived from.

A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them.

Group Selection

Personal Blog

13

Holden's Objection 1: Friendliness is dangerous

New Comment

431 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:18 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]RolfAndreassen14y21-2

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

Better by which set of, ahem, values? And anyway, if evolution of values is a value, then maximising overall value will by construction take that into account.

-2PhilGoetz14y

Yes, I object less to CEV if you go one or two levels meta. But if evolution of values is your core value, you find that it's pretty hard to do better than to just not interfere except to keep the ecosystem from collapsing. See John Holland's book and its theorems showing that an evolutionary algorithm as described does optimal search.

[-]Wei Dai14y160

Presumably, values will evolve differently depending on future contingencies. For example, a future with a world government that imposes universal birth control to limit population growth would probably evolve different values compared to a future that has no such global Singleton. Do you agree, and if so do you think the values evolved in different possible futures are all equivalent as far as you are concerned? If not, what criteria are you using to judge between them?

ETA: Can you explain John Holland's theorems, or at least link to the book you're talking about (Wikipedia says he wrote three). If you think allowing values to evolve is the right thing to do, I'm surprised you haven't put more effort into making a case for it, as opposed to just criticizing SI's plan.

2timtyler14y

Probably Adaptation in Natural and Artificial Systems. Here's Holland's most famous theorem in the area. It doesn't suggest genetic algorithms make for some kind of optimal search - indeed, classical genetic algorithms are a pretty stupid sort of search.

0PhilGoetz14y

That is the book. I"m referring to the entire contents of chapters 5-7. The schema theorem is used in chapter 7, but it's only part of the entire argument, which does show that genetic algorithms approach optimal distribution of trials among the different possibilities, for a specific definition of optimal, which is not easy to parse out of Holland's book, due to his failure to give an overview or decent summary of what he is doing. It doesn't say anything about other forms of search that proceed other than by taking a big set of possible answers, which give stochastic results when tested, and allocating trials among them.

0RolfAndreassen14y

CEV is not any old set of evolved values. It is the optimal set of evolved values; the set you get when everything goes exactly right. Of your two proposed futures, one of them is a better approximation to this than the other; I just can't say which one, at this time, because of lack of computational power. That's what we want a FAI for. :)

3Wei Dai14y

Instead of pushing Phil to accept the entirety of your position at once, it seems better to introduce some doubt first: Is it really very hard to do better than to just not interfere? If I have other values besides evolution, should I give them up so quickly? Also, if Phil has already thought a lot about these questions and thinks he is justified in being pretty certain about his answers, then I'd be genuinely curious what his reasons are.

5RolfAndreassen14y

I misread the nesting, and responded as though your comment were a critique of CEV, rather than Phil's objection to CEV. So I talked a bit past you.

2TheOtherDave14y

But you're evading Wei_Dai's question here. What criteria does the CEV-calculator use to choose among those options? I agree that significant computational power is also required, but it's not sufficient.

2RolfAndreassen14y

If we were able to formally specify the algorithm by which a CEV calculator should extrapolate our values, we would already have solved the Friendliness problem; your query is FAI-complete. But informally, we can say that the CEV evaluates by whatever values it has at a given step in its algorithm, and that the initial values are the ones held by the programmers.

1DanArmak14y

The problem with this kind of reasoning (as the OP makes plain) is that there's no good reason to think such CEV maximization is even logically possible. Not only do we not have a solution, we don't have a well-defined problem.

0TheOtherDave14y

(nods) Fair enough. I don't especially endorse that, but at least it's cogent.

8RolfAndreassen14y

The whole point of CEV is that it goes as many levels meta as necessary! And the other whole point of CEV is that it is better at coming up with strategies than you are.

-2PhilGoetz14y

Please explain either one of your claims. For the first, show me where something Eliezer has written indicates CEV has some notion of how meta it is going, or how meta it "should" go, or anything at all relating to your claim. The second appears to merely be a claim that CEV is effective, so its use in any argument can only be presuming your conclusion.

-2RolfAndreassen14y

My emphasis. Or to paraphrase, "as meta as we require."

0PhilGoetz13y

Writing "I define my algorithm for problem X to be that algorithm which solves problem X" is unhelpful. Quoting said definition, doubly so. In any case, the passage you quote says nothing about how meta to go. There's nothing meta in that entire passage.

1Armok_GoB14y

CEV goes infinite levels meta, that's what the "extrapolated" part means.

1CronoDAS14y

Countably infinite levels or uncountably infinite levels? ;)

2Armok_GoB14y

Countably I think, since computing power is presumably finite so the infinity argument relies on the series being convergent.

0PhilGoetz14y

No, that isn't what the "extrapolated" part means. The "extrapolated" part means closure and consistency over inference. This says nothing at all about the level of abstraction used for setting goals.

-2gRR14y

Isn't this exactly what we wish FAI to do - interfere the least while keeping everything alive?

2thomblake14y

Almost certainly not. We'd have massive overpopulation in no time. I remember someone did this analysis, I think it was insects that cover the Earth in days.

-6gRR14y

[-][anonymous]14y140

"A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them."

This has always been my principal objection to CEV. I strongly suspect that were it implemented, it would want the death of a lot of my friends, and quite possibly me, too.

6CronoDAS14y

Regarding CEV: My own worry is that lots of parts of human value get washed out as "incoherent" - whatever X is, if it isn't a basic human biological drive, there are enough people out there that have different opinions on it to make CEV throw up its hands, declare it an "incoherent" desire, and proceed to leave it unsatisfied. As a result, CEV ends up saying that the best we can do is just make everyone a wirehead because pleasure is one of our few universal coherent desires while things like "self-determination" and "actual achievement in the real world" are a real mess to provide and barely make sense in the first place. Or something like that. (Universal wireheading - with robots taking care of human bodies - at least serves as a lower bound on any proposed utopia; people, in general, really do want pleasure, even if they also want other things. See also "Reedspace's Lower Bound".)

6steven046114y

I would like to see more discussion on the question of how we should distinguish between 1) things we value even at the expense of pleasure, and 2) things we mistakenly alieve are more pleasurable than pleasure.

0TheOtherDave14y

Surely if there is something I will give up pleasure for, which I do not experience as pleasurable, that's strong evidence that it is an example of 1 and not 2?

4steven046114y

Yes, but there are other cases. If you prefer eating a cookie to having the pleasure centers in your brain maximally stimulated, are you sure that's not because eating a cookie sounds on some level like it would be more pleasurable?

0TheOtherDave14y

I'm not sure how I could ever be sure of such a thing, but it certainly seems implausible to me.

5TimS14y

That's a little unfair to the concept of CEV. If irreconcilable value conflicts persist after coherent extrapolation, I would think that a CEV function would output nothing, rather than using majoritarian analysis to resolve the conflict.

6[anonymous]14y

Then since there is not one single value about which every single human being on the planet can agree, a CEV function would output nothing at all.

1thomblake14y

Tense confusion.

[-][anonymous]14y132

CEV is supposed to preserve those things that people value, and would continue to value were they more intelligent and better informed. I value the lives of my friends. Many other people value the death of people like my friends. There is no reason to think that this is because they are less intelligent or less well-informed than me, as opposed to actually having different preferences. TimS claimed that in a situation like that, CEV would do nothing, rather than impose the extrapolated will of the majority.

My claim is that there is nothing -- not one single thing -- which would be a value held by every person in the world, even were they more intelligent and better informed. An intelligent, informed psychopath has utterly different values from mine, and will continue to have utterly different values upon reflection. The CEV therefore either has to impose the majority preferences upon the minority, or do nothing at all.

3thomblake14y

There are lots of reasons to think so. For example, they might want the death of your friends because they mistakenly believe that a deity exists.

6[anonymous]14y

Or for any number of other, non-religious reasons. And it could well be that extrapolating those people's preferences would lead, not to them rejecting their beliefs, but to them wishing to bring their god into existence. Either people have fundamentally different, irreconcilable, values or they don't. If they do, then the argument I made is valid. If they don't, then CEV(any random person) will give exactly the same result as CEV(humanity). That means that either calculating CEV(humanity) is an unnecessary inefficiency, or CEV(humanity) will do nothing at all, or CEV(humanity) would lead to a world that is intolerable for at least some minority of people. I actually doubt that any of the people from the SI would disagree with that (remember the torture vs flyspecks argument). That may be considered a reasonable tradeoff by the developers of an "F"AI, but it gives those minority groups to whom the post-AI world would be inimical equally rational reasons to oppose such a development.

4TimS14y

As someone who does not believe in moral realism, I agree that CEV over all humans who ever lived (excluding sociopaths and such) will not output anything. But I think that a moral realist should believe that CEV will output some value system, and that the produced value system will be right. In short, I think one's belief about whether CEV will output something is isomorphic on whether one believes in [moral realism] (plato.stanford.edu/entries/moral-realism/). Edit: link didn't work, so separated it out.

3A1987dM14y

Have you tried putting http:// in front of the URL? (Edit: the backtick thing to show verbatim code isn't working properly for some reason, but you know what I mean.)

1TimS14y

moral realism. Edit: Apparently that was the problem. Thanks. Edit2: It appears that copying and pasting from some places includes "http" even when my browser address doesn't. But I did something wrong when copying from the philosophy dictionary.

0[anonymous]14y

I agree -- assuming that CEV didn't impose a majority view on a minority. My understanding of the SI's arguments (and it's only my understanding) is that they believe it will impose a majority view on a minority, but that they think that would be the right thing to do -- that if the choice were beween 3^^^3 people getting a dustspeck in the eye or one person getting tortured for fifty years, the FAI would always make a choice, and that choice would be for the torture rather than the dustspecks. Now, this may well be, overall, the rational choice to make as far as humanity as a whole goes, but it would most definitely not be the rational choice for the person who was getting tortured to support it. And since, as far as I can see, most people only value a very small subset of humanity who identify as belonging to the same groups as them, I strongly suspect that in the utilitarian calculations of a "friendly" AI programmed with CEV, they would end up in the getting-tortured group, rather than the avoiding-dustspecks one.

2TheOtherDave14y

This is not clear.

0TimS14y

That is an entirely separate issue. If CEV(everyone) outputted a moral theory that held utility was additive, then the AI implementing it would choose torture over specks. In other words, utilitarians are committed to believing that specks is the wrong choice. But there is no guarantee that CEV will output a utilitarian theory, even if you believe it will output something. SI (Eliezer, at least) believes CEV will output a utilitarian theory because SI believes utilitarian theories are right. But everyone agrees that "whether CEV will output something" is a different issue than "what CEV will output." Personally, I suspect CEV(everyone in the United States) would output something deotological - and might even output something that would pick specks. Again, assuming it outputs anything.

0thomblake14y

This is a false dilemma. If people have some values that are the same or reconcilable, then you will get different output from CEV(any random person) and CEV(humanity). And note that an actual move by virtue ethicists is to exclude sociopaths from "humanity".

2DanArmak14y

I agree with you in general, and want to further point out that there is no such thing as "doing nothing". If doing nothing tends to allow your friends to continue living (because they have the power to defend themselves in the status quo), that is favoring their values. If doing nothing tends to allow your friends to be killed (because they are a powerless, persecuted minority in the status quo) that is favoring the other people's values.

0TheOtherDave14y

Of course, a lot depends on what we're willing to consider a minority as opposed to something outside the set of things being considered at all. E.g., I'm in a discussion elsethread with someone who I think would argue that if we ran CEV on the set of things capable of moral judgments, it would not include psychopaths in the first place, because psychopaths are incapable of moral judgments. I disagree with this on several levels, but my point is simply that there's an implicit assumption in your argument that terms like "person" have shared referents in this context, and I'm not sure they do.

0[anonymous]14y

In which case we wouldn't be talking about CEV(humanity) but CEV(that subset of humanity which already share our values), where "our values" in this case includes excluding a load of people from humanity before you start. Psychopaths may or may not be capable of moral judgements, but they certainly have preferences, and would certainly find living in a world where all their preferences are discounted as intolerable as the rest of us would find living in a world where only their preferences counted.

3TheOtherDave14y

I agree that psychopaths have preferences, and would find living in a world that anti-implemented their preferences intolerable. If you mean to suggest that the fact that the former phrase gets used in place of the latter is compelling evidence that we all agree about who to include, I disagree. If you mean to suggest that it would be more accurate to use the latter phrase when that's what we mean, I agree. Ditto "CEV(that set of preference-havers which value X, Y, and Z)".

0[anonymous]14y

I definitely meant the second interpretation of that phrase.

4TimS14y

I hope that everyone who discusses CEV understands that a very hard part of building a CEV function would be defining the criteria for inclusion in the subset of people whose values are considered. It's almost circular, because figuring out who to exclude as "insufficiently moral" almost inherently requires the output of a CEV-like function to process.

0TheOtherDave14y

How committed are you to the word "subset" here?

2TimS14y

I'm not sure I understand the question. In reference to the sociopath issue, I think it is clearer to say: (1) "I don't want sociopaths (and the like) in the subset from which CEV is drawn" than to say that (2) "CEV is drawn from all humanity but sociopaths are by definition not human." Nonetheless, I don't think (1) and (2) are different in any important respect. They just define key terms differently in order to say the same thing. In a rational society, I suspect it would make no difference, but in the current human society, ways words can be wrong makes (2) likely to lead to errors of reasoning.

0TheOtherDave14y

Sorry, I'm being unclear. Let me try again. For simplicity, let us say that T(x) = TRUE if x is sufficiently moral to include in CEV, and FALSE otherwise. (I don't mean to posit that we've actually implemented such a test.) I'm asking if you mean to distinguish between: (1) CEV includes x where T(x) = TRUE and x is human, and (2) CEV includes x where T(x) = TRUE

0TimS14y

I'm still not sure I understand the question. That said, there are two issues here. First, I would expect CEV(Klingon) to output something if CEV(human) does, but I'm not aware of any actual species that I would expect CEV(non-human species) to output for. If such a species existed (i.e. CEV(dolphins) outputs a morality), I would advocate strongly for something very like equal rights between humans and dolphins. But even in that circumstance, I would be very surprised if CEV(all dolphins & all humans) outputted something other than "Humans, do CEV(humanity). Dolphins, do CEV(dolphin)" Of course, I don't expect CEV(all of humanity ever) to output because I reject moral realism.

0TheOtherDave14y

I think that answers my question. Thanks.

0tut14y

'Coherent' in CEV means that it makes up a coherent value system for all of humanity. By definition that means that there will be no value conflicts in CEV. But it does not mean that you will necessarily like it.

24hodmt14y

Why do we need a single CEV value system? A FAI can calculate as many value systems as it needs and keep incompatible humans separate. Group size is just another parameter to optimize. Religious fundamentalists can live in their own simulated universe, liberals in another.

[-]TheOtherDave14y130

Upvoting back to zero because I think this is an important question to address.

If I prefer that people not be tortured, and that's more important to me than anything else, then I ought not prefer a system that puts all the torturers in their own part of the world where I don't have to interact with them over a system that prevents them from torturing.

More generally, this strategy only works if there's nothing I prefer/antiprefer exist, but merely things that I prefer/antiprefer to be aware of.

0dlthomas14y

It's a potential outcome, I suppose, in that is a conceivable extrapolation from a starting point where you antiprefer something's existence (in the extreme, with MWI you may not have much say what does/doesn't exist, just how much of it in which branches). It's also possible that you hold both preferences (prefer X not exist, prefer not to be aware of X) and the existence preference gets dropped for being incompatible with other values held by other people while the awareness preference does not.

[-]TimS14y110

The child molester cluster (where they grow child simply to molest them, then kill them) doesn't bother you, even if you never interact with it?

Because I'm fairly certain I wouldn't like what CEV(child molester) would output and wouldn't want an AI to implement it.

14hodmt14y

Assuming 100% isolation it would be indistinguishable from living in a universe where the Many Worlds Interpretation is true, but it still seems wrong. The FAI could consider avoiding groups whose even theoretical existence could cause offence, but I don't see any good way to assign weight to this optimization pressure. Even so, I think splitting humanity into multiple groups is likely to be a better outcome than a single group. I don't consider the "failed utopia" described in http://lesswrong.com/lw/xu/failed_utopia_42/ to be particularly bad.

5[anonymous]14y

Well, not if "child-molesters" and "non-child-molestors" are competing for limited resources.

0TimS14y

The failed utopia is better than our current world, certainly. But the genie isn't Friendly. In principle, I could interact with the immoral cluster. AI's interference is not relevant to the morality of the situation because I was part of the creation of the AI. Otherwise, I would be morally justified in ignoring the suffering in some distant part of the world because it will have no practical impact on my life. By contrast, I simply cannot interact with other branches under the MWI - it's a baked in property of the universe that I never had any input into.

6[anonymous]14y

What if space travel turns out to be impossible, and the superintelligence has to allocate the limited computational resources of the solar system?

2dlthomas14y

Um, if you would object to your friends being killed (even if you knew more, thought faster, and grew up further with others), then it wouldn't be coherent to value killing them.

4[anonymous]14y

Just because I wouldn't value that, doesn't mean that the majority of the world wouldn't. Which is my whole point.

3dlthomas14y

My understanding is that CEV is based on consensus, in which case the majority is meaningless.

[-]steven046114y100

Some quotes from the CEV document:

Coherence is not a simple question of a majority vote. Coherence will reflect the balance, concentration, and strength of individual volitions. A minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity. The variables are quantitative, not qualitative.

(...)

It should be easier to counter coherence than to create coherence.

(...)

In qualitative terms, our unimaginably alien, powerful, and humane future selves should have a strong ability to say "Wait! Stop! You're going to predictably regret that!", but we should require much higher standards of predictability and coherence before we trust the extrapolation that says "Do this specific positive thing, even if you can't comprehend why."

Though it's not clear to me how the document would deal with Wei Dai's point in the sibling comment. In the absence of coherence on the question of whether to protect, persecute, or ignore impopular minority groups, does CEV default to protecting them or ignoring them? You might say that as written, it would obviously not protect them, because there was no coherence in favor of doing so; but what if protection of minority groups is a side effect of other measures CEV was taking anyway?

(For what it's worth, I suspect that extrapolation would in fact create enough coherence for this particular scenario not to be a problem.)

0dlthomas14y

Thank you. So, not quite consensus but similarly biased in favor if inaction.

9Wei Dai14y

If CEV doesn't positively value some minority group not being killed (i.e., if it's just indifferent due to not having a consensus), then the majority would be free to try to kill that group. So we really do need CEV to saying something about this, instead of nothing.

0dlthomas14y

Assuming we have no other checks on behavior, yes. I'm not sure, pending more reflection, whether that's a fair assumption or not...

2DanArmak14y

There is absolutely no reason to think that the values of all humans, extrapolated in some way, will arrive at a consensus.

1drnickbone14y

Wouldn't CEV need to extract consensus values under a Rawlsian "veil of ignorance"? It strikes me as very unlikely that there would be a consensus (or even majority) vote for killing gays or denying full rights to women under such a veil, because of the significant probability of ending up gay, and the more than 50% probability of being a woman. Prisons would be a lot better as well. The only reason illiberal values persist is because those who hold them know (or are confident) that they're not personally going to be victims of them. So CEV is either going to end up very liberal, or if done without the veil of ignorance, is not going to end up coherent at all. Sorry if that's politics, the mind-killer.

[-]TheOtherDave14y100

Note that there's nothing physically impossible about altering the probability of being born gay, straight, bi, male, female, asexual, etc.

1drnickbone14y

True, and this could create some interesting choices for Rawlsians with very conservative values. Would they create a world with no gays, or no women? Would they do both???

[-]TheOtherDave14y100

I don't know how to reply to this without violating the site's proscription on discussions of politics, which I prefer not to do.

3drnickbone14y

OK - the comment was pretty flippant anyway. Consider it withdrawn.

6TimS14y

Heinlein's "Starship Troopers" discusses the death penalty imposed on a violent child rapist/murder. The narrator says there are two possibilities: 1) The killer was so deranged he didn't know right from wrong. In that case, killing (or imprisoning him) is the only safe solution for the rest. Or, 2) The killer knew right from wrong, but couldn't stop himself. Wouldn't killing (or stopping) him be a favor, something he would want? Why can't that type of reasoning exist behind the veil of ignorance? Doesn't it completely justify certain kinds of oppression? That said, there's also an empirical question whether the argument applies to the particular group being oppressed.

[-]gwern14y130

Not dealing with your point, but that sort of analysis is why I find Heinlein so distasteful - the awful philosophy. For example in #1, 5 seconds of thought suffices to think of counterexamples like temporary derangements (drug use, treatable disease, particularly stressful circumstances, blows to the head), and more effort likely would turn up powerful empirical evidence like possibly an observation that most murderers do not murder again even after release (and obviously not execution).

1TimS14y

Absolutely. What finally made me realize that Heinlein was not the bestest moral philosopher ever was noticing that all his books contained superheros - Stranger in a Strange Land is the best example. I'm not talking about the telekinetic powers, but the mental discipline. His moral theory might work for human-like creatures with perfect mental discipline, but for ordinary humans . . . not so much.

3fubarobfusco14y

This was pretty common in sf of the early 20th century, actually — the trope of a special group of people with unusual mental disciplines giving them super powers and special moral status. See A. E. van Vogt (the Null-A books) or Doc Smith (the Lensman books) for other examples. There's a reason Dianetics had so much success in the sf community of that era, I suspect — fans were primed for it.

2NancyLebovitz14y

Is that true of all of Heinlein's books? I would say that most of them (including Starship Troopers) don't have superheroes.

2Nornagest14y

Well, I'm not exactly a Heinlein scholar, but I'd say it shows up mainly in his late-period work, post Stranger in a Strange Land. Time Enough for Love and its sequels definitely qualify, but some of the stuff he's most famous for -- The Moon is a Harsh Mistress, Have Space Suit, Will Travel, et cetera -- don't seem to. Unfortunately, Heinlein's reputation is based mainly on that later stuff.

0TimS14y

The revolution in "Moon is a Harsh Mistress" cannot succeed without the aid of the supercomputer. That makes any moral philosophy implicit in that revolution questionable to the extent one asserts that the moral philosophy is true of humanity now. To a lesser extend, "Starship Troopers" asserts that military service is a reliable way of screening for the kinds of moral qualities (like mental discipline) that make one trustworthy enough to be a high government official (or even to vote, if I recall correctly). In reality, those moral qualities are very thin on the ground in the real world, being much less common than suggested by the book. If the appropriate moral qualities were really that frequent, the sanity line would already be much high than it is.

2CronoDAS14y

It might be relevant to note that Heinlein served in the U.S. Navy before he was discharged due to medical reasons.

0JoshuaZ14y

Most men in his generation did military service of some form.

0Bugmaster14y

I read Starship Troopers as a critique of fascism, not its endorsement, but I could be wrong...

2TimS14y

I wouldn't say the Starship Troopers government was fascist, but Heinlein clearly thinks they are doing things pretty well. The fact that the creation process of that government avoided fascism with no difficulty (it isn't considered worth mentioning in the history) is precisely the lack of realism that I am criticizing.

0Bugmaster14y

Hmm, I could be confusing the book with the movie. I'll need to re-read it again.

1dlthomas14y

As long as we're using sci-fi to inform our thinking on criminality and corrections, The Demolished Man is an interesting read.

0drnickbone14y

What would a Rawlsian decider do? Institute a prison and psychiatric system, and some method of deciding between case 1 (psychiatric imprisonment to try and treat or at least prevent further harm) and case 2 (criminal imprisonment to deter like-minded people and prevent further harm from the killer/rapist). Also set up institutions for detecting and encouraging early treatment of child sex offenders before they moved to murder. They would not want the death penalty in either case, nor would they want the prison/psychiatric system to be so appalling that they might prefer to be dead. The Rawlsian would need to weigh the risk of being the raped/murdered child (or their parent) against the risk of being born with psychopathic or paedophile tendencies. If there was genuinely a significant deterrent from the death penalty, then the Rawlsian might accept it. But that looks unlikely in such cases.

6Nornagest14y

Most of those who propose illiberal values do not do so under the presumption that they thereby harm the affected groups. A paternalistic attitude is much more common, and is not automatically inconsistent with preferences beyond a Rawlsian veil of ignorance. An Omelasian attitude also seems consistent, for that matter, though even less likely.

2drnickbone14y

As a matter of empirical fact, I think this is wrong. Men in sexist societies are really glad they're not women (and even thank God they are not in some cases). They are likely to run in horror from the Rawlsian veil when they see the implications. And anyway, isn't that paternalism itself inconsistent with Rawlsian ignorance? Who would voluntarily accept a more than 50% chance of being treated like a patronized child (and a second-class citizen) for life? And how is killing gays in the slightest bit a paternalistic attitude? I'd never heard of Omelas, or anything like it.. so I doubt this will be part of CEV. Again, who would voluntarily accept the risk of being such a scapegoat, if it were an avoidable risk? (If it is not avoidable for some reason, then that is a fact that CEV would have to take into account, as would the Rawlsian choosers).

[-]Nornagest14y100

Who would voluntarily accept a more than 50% chance of being treated like a patronized child (and a second-class citizen) for life?

Someone believing that this sort of paternalism is essential to gender and unable or unwilling to accept a society without it. Someone convinced that this was part of God's plan or otherwise metaphysically necessary. Someone not very fond of making independent decisions. I don't think any of these categories are strikingly rare.

That's about as specific as I'd like to get; anything more so would incur an unacceptable risk of political entanglements. In general, though, I think it's important to distinguish fears and hatreds arising against groups which happen to be on the wrong side of some social line (and therefore identity) from the processes that led to that line being drawn in the first place: it's possible, and IMO quite likely, for people to coherently support most traditional values concerning social dichotomies without coherently endorsing malice across them. This might not end up being stable, human psychology being what it is, but it doesn't seem internally inconsistent.

The way people's values intersect with the various consequences of... (read more)

2drnickbone14y

Thanks for the reply here, that was helpful. What you've described here is a person who would put adherence to an ideological system (or set of values derived from that system) above their own probable welfare. They would reason to themselves : yes my own personal welfare would probably be higher in an egalitarian society (or the risk of low personal welfare would be lower); but stuff that, I'm going to implement my current value system anyway. Even if it comes back to shoot me in the foot. I agree that's possible, but my impression is that very few humans would really want to do that. The tendency to put personal welfare first is enormous, and I really do believe that most of us would do that if choosing behind a Rawlsian veil. What's odd is that it is a classical conservative insight that human beings are mostly self-interested, and rather risk-adverse, and that society needs to be constructed to take that into account. It's an insight I agree with by the way, and yet it is precisely this insight that leads to Rawlsian liberalism. Whereas to choose a different (conservative) value system, the choosers have to sacrifice their self-interest to that value system.

8Nornagest14y

Self-assessed welfare isn't cleanly separable from ideology. People aren't strict happiness maximizers; we value all sorts of abstract things, many of which are linked to the social systems and identity groups in which we're embedded. Sometimes this ends up looking pretty irrational from the outside view, but from the inside giving them up would look unattractive for more or less the same reason that wireheading is unattractive to (most of) us. Now, this does drift over time, both on a sort of random walk and in response to environmental pressures, which is what allows things like sexual revolutions to happen. During phase changes in this space, the affected social dichotomies are valued primarily in terms of avoiding social costs; that's the usual time when they're a salient issue instead of just part of the cultural background, and so it's easy to imagine that that's always what drives them. But I don't think that's the case; I think there's a large region of value space where they really are treated as intrinsic to welfare, or as first-order consequences of intrinsic values.

0drnickbone14y

Thanks again. I'm still not sure of the exact point you are making here, though. Let's take gender-based discrimination and unequal rights as a sample case. Are you arguing that someone wedded to an existing gender-biased value system would deliberately select a discriminatory society (over an equal rights one) even if they were choosing on the basis of self-interest? That they would fully understand that they have roughly 50% chance of getting the raw end of the deal, but still think that this deal would maximise their welfare overall? I get the point that a committed ideologue could consciously decide here against self-interest. I'm less clear how someone could decide that way while still thinking it was in their self-interest. The only way I can make sense of such a decision is if were made on the basis of faulty understanding (i.e. they really can't empathize very well, and think it would not be so bad after all to get born female in such a society). In a separate post, I suggested a way that an AI could make the Rawlsian thought experiment real, by creating a simulated society to the deciders' specifications, and then beaming them into roles in the simulation at random (via virtual reality/total immersion/direct neural interface or whatever). One variant to correct for faulty understanding might be to do it on an experimental basis. Once the choosers think they have made their minds up, they get beamed into a few randomly-selected folks in the sim, maybe for a few days or weeks (or years) at a time. After the experience of living in their chosen world for a while, in different places, times, roles etc. they are then asked if they want to change their mind. The AI will repeat until there is a stable preference, and then beam in permanently. Returning to the root of the thread, the original objection to CEV was that most people alive today believe in unequal rights for women and essentially no rights for gays. The key question is therefore whether most people

3NancyLebovitz14y

A lot of oppression of women seems to be justified by claims that if women aren't second-class citizens, they won't choose to have children, or at least not enough children for replacement. This makes women's rights into an existential risk.

0DanArmak14y

This argument also implies that societies and smaller groups where women have lower status and more children will out-breed and so eventually outcompete societies where women have equal rights. So people can also defend the lower status of women as a nationalistic or cultural self-defense impulse.

3Nornagest14y

Yes and no. Someone who'd internalized a discriminatory value system -- who really believed in it, not just belief-in-belief, to use LW terminology -- would interpret their self-interest and therefore their welfare in terms of that value system. They would be conscious of of what we would view as unequal rights, but would see these as neutral or positive on both sides, not as one "getting the raw end of the deal" -- though they'd likely object to some of their operational consequences. This implies, of course, a certain essentialism, and only applies to certain forms of discrimination: recent top-down imposition of values isn't stable in this way. As a toy example, read 1 Corinthians 11, and try to think of the mentality implied by taking that as the literal word of God -- not just advice from some vague authority, but an independent axiom of a value system backed by the most potent proofs imaginable. Applied to an egalitarian society, what would such a value system say about the (value-subjective) welfare of the women -- or for that matter the men -- in it? This, on the other hand, is essentially an anthropology question. The answer depends on the extent of discriminatory traditional cultures, on the strength of belief in them, and on the commonalities between them: "unequal rights" isn't a value, it's a judgment call over a value system, and the specific unequal values that we object to may be quite different between cultures. I'm not an anthropologist, so I can't really answer that question -- but if I had to, I'd doubt that a reflectively stable consensus exists for egalitarianism or for any particular form of discrimination, with or without the Rawlsian wrinkle.

0drnickbone14y

So this would be like the "separate but equal" argument? To paraphrase in a gender context: "Men and women are very different, and need to be treated differently under the law - both human and divine law. But it's not like the female side is really worse off because of this different treatment". That - I think - would count as a rather basic factual misunderstanding of how discrimination really works. It ought to be correctable pretty damn fast by a trip into the simulator. (Incidentally, I grew up in a fundamentalist church until my teens, and one of the things I remember clearly was the women and teen girls being very upset about being told that they had to shut up in church, or wear hats or long hair, or that they couldn't be elders, or whatever. They also really hated having St Paul and the Corinthians thrown at them; the ones who believed in Bible inerrancy were sure the original Greek said something different, and that the sacred text was being misinterpreted and spun against them. Since it is an absolute precondition for an inerrantist position that correct interpretations are difficult, and up for grabs, this was no more unreasonable than the version spouted by the all-male elders.)

3Nornagest14y

Well, I won't rule it out. But if you grow up in the West -- even in one of its more traditionalist enclaves -- that means you've grown up surrounded by some of the most fantastically egalitarian rhetoric the world's ever generated, and I think one consequence of that is the adoption of a rather totalizing attitude toward any form of discrimination. Not that that's a bad thing; discrimination's bad news. But it does make it kind of hard to grok stratified social organization in any kind of unbiased way. I grew up secular, albeit in one of the more conservative parts of my home state. But I have read a lot of social commentary from the Middle Ages and the Classical period, and I've visited a couple of highly traditionalist non-Western countries. Both seem to exhibit an attitude towards what we'd call unequal rights that's pretty damned strange for those of us who were raised on Max Weber and Malcolm X, and I wouldn't put the differences down to ignorance.

0drnickbone14y

Of course there is an enormous selection bias here. You're reading the opinions of the tiny minority who were a) literate, b) had time to write social commentary, c) didn't have their writings burned or otherwise censored and d) were preserved for later generations by copyists. It's very difficult to tell whether they represented the CEV of their time (or anything like it). And on visiting other cultures, even in the present, I can only reflect that if you'd visited the fundie church of my childhood you'd have seen an overt culture of traditionalist paternalism/sexism, but wouldn't have seen the genuine hurt and pain of the 50% or so who really wished it wasn't like that. Being denied a public voice, you couldn't have heard them. That's kind of the point. I've also visited a few non-Western countries in the world (on business) and to the extent the people there have voiced opinions about their situation versus ours (which was not very often), they've been rather keen to make their countries more like ours in terms of liberty, equality and the pursuit of shed loads of money.. Or leave for the West if they can. Sheer poverty sucks too.

3evand14y

The obvious (paternalistic) answer is that they believe that, conditioned on them being born female, their self-interest is improved by paternalistic treatment of all women vs equality. In order to convince them otherwise, you would (at a minimum) have to run multiple world sims, not just multiple placements in one world. You would also have to forcibly give them sufficiently rational thought processes that they could interpret the evidence you forced upon them. I'm not sure that forcibly messing with people's thought processes is ethical, or that you could really claim it was a coherent extrapolation after you had performed that much involuntary mind surgery on them.

-1drnickbone14y

Disagree. A simple classroom lesson is often sufficient to get the point across: http://www.uen.org/Lessonplan/preview.cgi?LPid=536 Discrimination REALLY sucks.

4JoachimSchipper14y

I've met women who honestly and persistently profess that women should not be allowed to vote. In at least one case, even in private, to a person they really want to like them and who very clearly disagrees with them.

1drnickbone14y

That doesn't surprise me... I've had the same experience once or twice, in mixed company, and with strong feminists in the room. The subsequent conversations along the lines of "But women chained themselves to railings, and threw themselves under horses to get the vote; how can you betray them like that?" were quite amusing. Especially when followed by the retort "Well I've got a right to my own opinion just as much as anyone else - surely you respect that as a feminist!" I've also met quite a few people who think that no-one should vote. ("If it did any good, it would have been abolished years ago" a position I have a lot more sympathy for these days than I ever used to). My preferred society (in a Rawlsian setting) might not actually have much voting at all, except on key constitutional issues. State and national political offices (parliaments, presidents etc) would be filled at random (in an analogue to jury service) and for a limited time period. After the victims had passed a few laws and a budget, they would be allowed to go home again. No-one would give a damn about gaffes, going off message, or the odd sex scandal, because it would happen all the time, and have very limited impact. I think there would also need to be mandatory citizen service on boring committees, local government roles, planning permission and drainage enquiries etc to stop professional civil servants, lobbyists or wonks ruling the roost: the necessary tedium would be considered part of everyone's civic duty. This - in my opinion - is probably the biggest problem with politics. Much of it is so dull, or soul-destroying, that no-one with any sense wants to do it, so it is left to those without any sense.

3DanArmak14y

Kill their bodies, save their souls.

3Zack_M_Davis14y

You might be right, but I'm less sure of this. Someone with more historical or anthropological knowledge than I is welcome to correct me, but I'm given to understand that many of those whom we would consider victims of an oppressive social system, actually support the system. (E.g., while woman's suffrage seems obvious now, there were many female anti-suffragists at the time.) It's likely that such sentiments would be nullified by a "knew more, thought faster, &c." extrapolation, but I don't want to be too confident about the output of an algorithm that is as yet entirely hypothetical. Furthermore, the veil of ignorance has its own problems: what does it mean for someone to have possibly been someone else? To illustrate the problem, consider an argument that might be made by (our standard counterexample) a hypothetical agent who wants only to maximize the number of paperclips in the universe: ---which does not seem convincing. Of course, humans in oppressed groups and humans in privileged groups are inexpressibly more similar to each other than humans are to paperclip-maximizers, but I still think this thought experiment highlights a methodological issue that proponents of a veil of ignorance would do well to address.

5drnickbone14y

Isn't the main evidence that victims of oppressive social systems want to escape from them at every opportunity? There are reasons for refugees, and reasons that the flows are in consistent directions. And if anti-suffragism had been truly popular, then having got the vote, women would have immediately voted to take it away again. Does this make sense? Some other points: 1. CEV is about human values, and human choices, rather than paper-clippers. I doubt we'd get a CEV across wildly-different utility functions in the first place. 2. I'm happy to admit that CEV might not exist in the veil of ignorance case either, but it seems more likely to. 3. I'm getting a few down-votes here. Is the general consensus here that this is too close to politics, and that is a taboo subject (as it is a mind-killer)? Or is the "veil of ignorance" idea not an important part of CEV?

2DanArmak14y

Just because some despised minorities exist today, doesn't mean they will continue to exist in the future under CEV. If a big enough majority clearly wishes that "no members of that group continue to exist" (e.g. kill existing gays AND no new ones ever to be born), then the CEV may implement that, and the veil of ignorance won't change this, because you can't be ignorant about being a minority member in a future where no-one is.

2TimS14y

Isn't there substantial disagreement over whether the veil of ignorance is sufficient or necessary to justify a moral theory? Edit: Or just read what Nornagest said

0drnickbone14y

Perhaps, but I think my point stands. CEV will use a veil of ignorance, or it won't be coherent. It may be incoherent with the veil as well, but I doubt it. Real human beings look after number one much more than they'd ever care to admit, and won't take stupid risks when choosing under the veil. One very intriguing thought about an AI is that it could make the Rawlsian choice a real one. Create a simulated society to the choosers' preferences, and then beam them in at random...

0NancyLebovitz14y

Even with a veil of ignorance, people won't make the same choices-- people fall in different places on the risk aversion/reward-seeking spectrum.

[-]Ghatanathoah14y100

Non-positional, mutually-satisfiable values (physical luxury, for instance) Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen
All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict.

David Friedman pointed out that this isn't correct, it's actually it's quite easy to make positional values mutually satisfiable:

It seems obvious that, if one's concern is status rather than real income, we are in a zero sum game..... Like many things that seem obvious, this one is false. It is true that my status is relative to yours. It does not, oddly enough, follow that if my status is higher than yours, yours must be lower than mine, or that if my status increases someone else's must decrease. Status is not, in fact, a zero sum game.
This point was originally made clear to me when I was an undergraduate at Harvard and realized that Harvard had, in at least one interesting way, the perfect social system

... (read more)

3[anonymous]14y

I had a somewhat similar experience growing up, although a few details are different (I never thought people should be banned from making such films or that they were evil things just because they scared me, for instance, and I made the decision to try watching some of them, mostly Alien and a few other works from the same general milieu, at a much younger age and for substantially different reasons). However, I didn't wind up loving horror movies; I wound up liking one or two films that only pushed my buttons in nice, predictable places and without actually squicking me per se. I honestly still don't get how someone can sit through films like Halloween or Friday the 13th -- I mean, I get the narrative underpinnings and some of the psychological buttons they push very well (reminds me of ghost tales and other things from my youth), but I can't actually feel the same way as your putative "normal person" when sitting through it. Even movies most people consider "very tame" or "not actually scary" make me too uncomfortable to want to sit through them, a good portion of the time. And I've actively tried to cultivate this, not for its own sake (I could go my whole life never sitting through such a film again and not be deprived, even one of the ones I've enjoyed many times) but because of the small but notable handful of horror-themed movies that I do like and the number of people I know who enjoy such films with whom I'd have even more social-yay if I did self-modify to enjoy those movies. It simply didn't take -- after much exposure and effort, I now find most such films both squicky and actively uninteresting. I can see why other people like 'em, but I can't relate. Are my terminal values "insufficiently extrapolated?" Or just not coherent with yours?

1Ghatanathoah14y

I don't think it's either. We both have the general value, "experience interesting stories," it's just expressed in slightly different ways. I don't think that really really specific preferences for art consumption would be something that CEV extrapolates. I think CEV is meant to figure out what general things humans value, not really specific things (i.e. a CEV might say, "you want to experience fun adventure stories," it would not say "read Green Lantern #26" or "read King Solomon's Mines"). The impression I get is that CEV is more about general things like "How should we treat others?" and "How much effort should we devote to liking activities vs. approving ones?" I don't think our values are incoherent, you don't want to stop me from watching horror movies and I don't want to make you watch them. In fact, I think a CEV would probably say "It's good to have many people who like different activities because that makes life more interesting and fun." Some questions (like "Is it okay to torture people") likely only have one true, or very few true, CEVs, but others, like matters of personal taste, probably vary from person to person. I think a FAI would probably order everyone not to torture toddlers, but I doubt it would order us all to watch "Animal House" at 9:00pm this coming Friday.

0thomblake14y

I'm glad you pointed this out - I don't think this view is common enough around here.

0Jayson_Virissimo14y

I'm not sure what to make of your use of the word "proper". Are you predicting that a CEV will not be utilitarian or saying that you don't want it to be?

0Ghatanathoah14y

I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call "malicious preferences." That is, if someone valued frustrating someone else's desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it. This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan's debate with Robin Hanson, where Bryan mentioned: I don't often agree with Bryan's intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn't seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored. I don't think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that "malicious preferences" have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies. In response to your question I have edited my post and changed "a proper CEV" to "a CEV of human morality."

0thomblake14y

Zero is a strange number to have specified there, but then I don't know the shape of the function you're describing. I would have expected a non-specific "negative utility" in its place.

0Ghatanathoah14y

You're probably right, I was typing fairly quickly last night.

0Jayson_Virissimo14y

Ah, okay. This sounds somewhat like Nozick's "utilitarianism with side-constraints". This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.

[-]Lightwave14y100

It seems that what you have argued here is not much related to Holden's objection 1 - his objection is that we cannot reasonably expect a safe and secure implementation of a "Friendly" utility function (even if we had one), because humans have consistently been unable to construct bug-free working-correctly (computer) systems on the first try, proofs have been wrong, etc. You, on the other hand, are arguing against the Friendliness concept on object-level / meta-level ethical grounds.

[-]JoshuaZ14y80

Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry.

Well, most of them do so in part out of their deity telling them that that's a value. If the extrapolated CEV takes into account that they are just wrong about there being such a deity, it should respond accordingly. (I'm working under the what should not be controversial assumption that the AGI isn't going to find out that in fact there is such a deity hanging around.)

8TimS14y

There's a chicken and egg issue here. Were pre-existing anti-homosexuality values co-opted into early Judaism? Or did the Judeo-Chiristian ideology spread the values beyond their "natural" spread? The only empirical evidence for this question I can think of is non-Judeo-Christian attitudes. What are the historical attitudes towards homosexuality among East Asians and South Asians? More broadly, people's attitudes towards women and nerds are just as much expressions of values, not long-ranged utilitarian calculations.

[-][anonymous]14y110

What are the historical attitudes towards homosexuality among East Asians and South Asians?

Man, that's variable. Especially in South Asia, where "Hinduism" is more like a nice box for outsiders to describe a huge body of different practices and theoretical approaches, some of them quite divergent. Chastity in general was and is a core value in many cases; where that's not the case, or where the particular sect deals pragmatically with the human sex drive despite teaching chastity as a quicker path to moksha, there might be anything from embrace of erotic imagery and sexual diversity to fairly strict rules about that sort of conduct. Some sects unabashedly embrace sexuality as a good thing, including same-sex sexuality. Islam has historically been pretty doctrinally down on it, but even that has its nuances -- sodomy was often considered a grave sin and still is in many places, while non-penetrative same-sex contact might well be seen as simply a minor thing, not strictly appropriate but hardly anything to get worked up about.

"East Asia" has a very large number of religions as well, and the influence of Confucianism and Buddhism hasn't been uniform in this reg... (read more)

9NancyLebovitz14y

If you want more on the subject of how people think about sexuality, try Straight by Hanne Blank. She tracks the invention of heterosexuality (a concept which she says is less than a century old) in the west. If part of CEV is finding out how much of what we think is obviously true is just stuff that people made up, life could get very strange.

3A1987dM14y

The word is likely that recent, but is she claiming that the idea of being interested in members of the other sex but not in members of the same sex as sexual partners was unheard-of before that? Or what does she mean exactly?

2NancyLebovitz14y

It's a somewhat complex book, but part of her meaning is that the idea that there are people who are only sexually interested in members of the other sex, and that this is an important category, is recent.

0[anonymous]14y

How could such a thesis be viable, when so much of the historical data has been lost?

3NancyLebovitz14y

There's more historical data than you might think-- for example, the way the Catholic Church defined sexual sin in terms of actions rather certain sins being associated with types of people who were especially tempted to engage in them. There's also some history of how sexual normality became more and more narrowly defined (Freud has a lot to answer for), and then the definitions shifted. A good bit of the book is available for free at amazon, and I think that would be the best way for you to see whether Blank's approach is reasonable.

[-][anonymous]14y110

The introduction is a catalog of ambiguities about sex, gender, and sexual orientation:

My partner was diagnosed male at birth because he was born with, and indeed still has, a fully functioning penis ... My partner's DNA has a pattern that is simultaneously male, female and neither. This particular genetic pattern, XXY, is the signature of Kleinfelter syndrome ...
We've known full well since Kinsey that a large minority...37 percent...of men have hat at least one same-sex sexual experience in their lives.
No act of Congress of Parliament exists anywhere that defines exactly what heterosexuality is or regulates exactly how it is to be enacted.
Historians have tracked major shifts in other aspects of what was considered common or "normal" in sex and relationships: was marriage ideally an emotional relationship, or an economic and pragmatic one? Was romantic love desirable, and did it even really exist? Should young people choose their own spouses, or should marriage partners be selected by family and friends?
As unnumbered sailors, prisoners, and boarding-school boys have demonstrated, whether one behaves heterosexually or homosexually sometimes seems like little more tha

... (read more)

5NancyLebovitz14y

I think her point is closer to "people make things up, and keep repeating those things until they seem like laws of the universe". A possible conclusion is that once people make a theory about how something ought to be, it's very hard to go back to the state of mind of not having an opinion about that thing. The amazon preview includes the last couple of chapters of the book. The book could be viewed as a large expansion of two Heinlein quotes: "Everybody lies about sex" and "Freedom begins when you tell Mrs. Grundy to fly a kite".

0[anonymous]14y

I don't recognize the quotes. If so, then her point is more specific: "people made heterosexuality up." But I don't see how this can be supported. Every human being who has ever lived came from a male-female sex act. That has to serve as a lower bound for how unusual and made-up heterosexuality is. I'll check it out. Edit: By the way what I can see of the amazon preview is pretty heavily redacted, and doesn't include any complete chapter.

7TheOtherDave14y

The abstract property that people we categorize as heterosexual have in common has existed, as you imply, for as long a members of bisexual species have been preferentially seeking out opposite-sex sex partners. The explicit category in people's brains is more recent than that. I mean, every human being who has ever lived came from a sex act between two people who were in close physical proximity, but that doesn't mean that the category of "people who prefer to have sex in close physical proximity to one another, rather than at a distance" has been explicitly represented. Indeed, I may have just made it up.

4[anonymous]14y

What do you mean by this? It's incorrect to say that people haven't noticed until recently that it's very common for men to seek out women for sex and vice versa. It's also incorrect to say that people haven't noticed until recently the exceptions to this practice.

[-]TheOtherDave14y160

Neither is it correct to say that people haven't noticed that it's very common for people to have sex with people who are physically adjacent to them. But that's not to say that people often think "I'm the sort of person who has sex with people physically adjacent to me."

There's a difference between eating meat from time to time, being aware that I eat meat from time to time, and explicitly thinking of myself as a "meat eater," or as an "omnivore," or as a "carnivore". There's a difference between being really smart, being aware of how well I do at various cognitive tasks, and thinking of myself as "a really smart person".

More generally, there's a difference between having the property X, being aware of evidence of X and acting accordingly, and having formed a mental structure in my mind that represents me as having X.

There's also a difference between all of those and being part of a culture that has "people who have X" as a social construct.

4[anonymous]14y

In most cases, someone who thinks of themselves as "a meat eater" really does eat meat. On the other hand, there are very many people who think of themselves as "a really smart person" but who are not really smart. Which case is more similar to heterosexuality, in your view?

8TheOtherDave14y

The categories get really fuzzy, really fast, which causes a lot of confusion. For the sake of concreteness, I'll define my terms as follows (1): * A meat eater is someone who reliably experiences the desire to eat meat, and would sometimes be willing to eat meat if offered, and would not necessarily feel that eating that meat was problematic. * A heterosexual is someone who reliably experiences the desire to have sex with opposite-gendered people, and would sometimes be willing to do so if offered, and would not necessarily feel that having that sex was problematic. * A really smart person is someone who would reliably perform well on certain kinds of real-world problems that I don't know how to define in a noncircular way but I can point to examples of. Given those definitions, I agree that someone who identifies themselves as a meat eater typically is a meat eater and that someone who identifies themselves as a really smart person frequently is not a really smart person, and I would say that someone who identifies themselves as heterosexual typically is heterosexual. So, to answer your question: if I look at just those cases, the meat-eater case is more like the heterosexual case than the smart-person case is =============== (1) I have no particular fondness for those definitions, I picked them as my best approximations to what I thought you probably had in mind. If you would prefer different definitions let me know. Different definitions might change my answer. Leaving terms like these in their normally fuzzy state causes lots of confusion when trying to have precise discussions of them ... is a prepubescent child who has never been sexually attracted to anyone heterosexual? Is a man who is sexually attracted to other men, has never had sex with one, would refuse to have sex with one if offered (assuming etc.), and regularly has sex with women despite not really being sexually attracted to them heterosexual? Etc. Etc. Etc. There's nothing especially i

0[anonymous]14y

I agree with all of this. But I think it all casts Blank's thesis in a bad light: "heterosexuality dates to the 1860s and not earlier" can only be supported if those labeling questions are resolved in a deliberately misleading way. I had the impression you thought differently but perhaps not.

3TheOtherDave14y

Not having read the book, I can't speak to Blank's thesis. I will point out, though, that just because I'm a meat-eater doesn't mean that I ever think of myself as a meat eater, that I ever talk about myself as a meat-eater, or that I live in a culture in which being a meat-eater exists as a social construct. Similarly, just because I'm heterosexual (which, by the definition above, I am, despite being in a 19-year same-sex relationship) it doesn't follow that I ever think of myself as heterosexual (which I haven't in a little over 20 years), that I talk about myself as heterosexual (which I usually don't), or that I live in a culture where heterosexuality exists as a social construct (which I have for my entire life). Depending on the context I'm working in, different definitions become appropriate. If I'm talking about social constructs, for example, the statement "heterosexuality dates to the 1860s and not earlier" might be true, or might not... beats me. It certainly isn't true if I'm talking about mate-selection behavior... in that context "heterosexuality" refers to something that predates the evolution of the human race. There are other contexts in which the statement "heterosexuality is about as old as humanity, but not significantly older" might be true. You seem to be saying that speaking in some of those contexts, or speaking in a way that fails to clarify what context I'm operating in, is necessarily deliberately misleading; if you're saying that, then yes, I think differently. But, again, I haven't read Blank's book, so it's entirely possible that Blank in particular is being deliberately misleading.

3[anonymous]14y

I withdraw "deliberately", after all how would I know. But "social construct" is technical jargon from a controversial theory in a controversial academic discipline. Almost every English-speaking adult knows what straight and gay are, but hardly any of them know what a social construct is. So I do believe that it's misleading to speak of "heterosexuality" when you mean "the social construct of heterosexuality."

1TheOtherDave14y

Whether someone knows what the term "social construct" refers to has nothing to do with the matter. Most people don't know what the term "pheromone" refers to, but it would be mistaken to infer from that that sexual attraction has nothing to do with pheromones, or that discussions of sexual orientation in terms of pheromones is necessarily misleading. That said, though, sure, if social constructs don't exist at all, then there certainly isn't such a thing as a social construct of heterosexuality, in which case any discussion of same (including my own comments in this thread) is misleading, albeit (as you admit) not necessarily deliberately so.

[-]Bugmaster14y100

I don't know what TheOtherDave means, but I have heard it said before that the notion of treating sexual preference as identity is relatively recent. In the past -- or so the claim goes -- people did of course recognize that some people prefer to have intercourse with members the opposite sex, whereas others did not. But this was seen as merely a preference, similar to disliking broccoli or liking the color red or whatever. A person wouldn't identify as "a heterosexual" or "a homosexual", no more than one would identify as "an anti-broccolist" or a "red-ist" or whatever.

1Nornagest14y

That brings up some interesting questions about the way people thought about identity. An awful lot of identity groups got launched around the same time, including some of the first ones I can think of that're based around behavior -- the temperance movement originated in the mid-1830s, for example. I wonder if some shift in the political climate in the early-to-mid 1800s suddenly made it practical to advocate for some behavior or lack thereof by adopting it into a group identity and then using that to argue for a protected category?

1Gastogh14y

Insofar as there's a point to such distinctions, I expect the frontlines of that shift to have been cultural and scientific rather than political. "Advocating a behavior by adopting it into a group identity and using that to argue for a protected category" sounds awfully meta; I expect the crucial changes were simpler, more fundamental and centered around what enabled people to argue for a protected category in the first place. I'm thinking along lines like this: A number of technological advances were made around that time that made setting up movements far easier. The proliferation of various movements coincides nicely with such stuff as improved methods of agriculture (leading to population growth and urbanization), the invention of the telegraph (bridging distances), better transportation in the form of railways and ever faster ships (mobilization, etc.) and probably others that escape me at the moment. A bit later on Darwin and the theory of evolution paved the way for eugenics-style thinking and concepts of inherent superiority between races and nations, and around the turn of the century the rise of scientific (or semi-scientific) psychology opened the doors for minting all kinds of novel ingroup-outgroup divisions. I expect identity-builders had a field day with the concept of the subconscious mind in particular. "You can't help it, those people are just made that way. Fortunately not us, though, haw haw." On the non-scientific side, there are a number of converging cultural trends and phenomena to take into account. * The decline of the church was an example of how a firmly established institution wasn't necessarily a permanent feature of society. * There's been a general decline in violence throughout society, which made resisting the establishment less scary. * Western Romanticism and the advent of nationalism were a fairly clear case of deliberate identity-building, and it set a precedent for doing the same on a smaller scale. * Not all movements

5[anonymous]14y

When giraffes mate in such a manner as to produce viable offspring, is that "heterosexuality?" If yes, why do male giraffes frequently engage in same-sex behavior when nearby females are not in oestrus and receptive to their advances? To clarify: the term "heterosexuality" doesn't necessarily mean simply "male/female sexual contact." Humans have been doing that for as long as there have been humans. Humans have also been doing same-sex sexual contact for as long as there have been humans (this is not a controversial idea given the huge number of animal species that do, inclusive of our near relatives), but the phenomenon of people being defined as, or identifying with the terms "heterosexual, homosexual and bisexual" is quite recent and cultural-contextual. Mating such that offspring may be viably produced is a piece of the territory. "Heterosexuality" is a label on one particular map of that territory, and its boundaries and name don't necessarily represent the reality accurately.

0Richard_Kennaway14y

The map itself is part of a larger territory. Handshakes only occur in certain cultures; that does not mean there is no such thing as a handshake.

0[anonymous]14y

It does imply that assigning handshakes to the reference class of "things made up by humans," is reasonable though. Money is made up, but you can still starve to death without it. "Made up" doesn't mean "fake and with no lasting impact."

0NancyLebovitz14y

One more question: Why do people find it so interesting that some animals form same-sex pairings?

3JoshuaZ14y

It normally comes up when claims are made of the form "homosexuality is unnatural!" with the implied or explicit "therefore it is wrong/sinful/evil/yucky". Pointing to same-sex pairings in animals is intended as a response to this. The people making the response either don't understand the naturalistic fallacy or consider it to be sufficiently abstract or harder to explain that they don't bother with that line of response. It is also interesting from a biological standpoint in that it isn't that easy to explain from an ev bio perspective, so studying it makes sense.

2[anonymous]14y

It has to do with the fact that it was essentially ignored throughout most of the history of biology as a discipline. It's not like this behavior is new; it's been there the whole time, and so have the observations of the behavior, but the reaction within scientific culture has changed dramatically. Stuff like interpreting active vs passive animals in a copulatory act as male and female respectively, assuming the animals had simply misidentified the sex of the other party, or assuming that the observing party was necessarily mistaken, publication and citation biases, and the frequently-opaque titles, abstracts and contents of those published studies that did manage to make it into the journals ("A Note on the Apparent Lowering of Moral Standards in the Lepidoptera", W.J. Tenant, 1987, Entemologists Record and Journal of Variation). It's news to a whole lot of people, in other words.

1[anonymous]14y

Wild mass guessing: animals are incapable of sin?

0[anonymous]14y

Your second question is very interesting! I don't know why asking it is contingent on a "yes" answer to your first question, which is tiresome. If you like, I'd be interested to hear what you mean by these phrases in more detail: 1. "defined as, or identifying with" 2. "cultural-contextual"

2TimS14y

In the United States, dog meat is defined as "not food." In other cultures, the definition of "food" includes dog meat. The meaning of "food" depends on context, specifically, the cultural context. Just to be clear, I think the brouhaha about whether it is acceptable to eat dog is strong proof that "food" is more narrowly defined than "material capable of being consumed for sustenance by humans." The assertion is that "homosexuality" is a word whose meaning is as culturally dependent as the word "food."

0[anonymous]14y

I don't agree. I think "food" has a broad definition that is context dependent, not culturally dependent. Every human culture has language for food. Yes sometimes people say "that's not food" when they mean "there's a taboo against eating that" and sometimes they say "that's not food" when they mean "that's not edible." Perhaps sometimes they mean something else. But to tell what they mean depends on context, not culture. Of course taboos vary across cultures, as does knowledge about what is and isn't edible.

0TimS14y

I'm not trying to play games with definitions - if taboo is a more intuitive label for you, then that's the word I'll use. The modern usage of the label "homosexual" invokes a substantial number of social taboos. Those taboos vary from culture to culture. Because cultures change over time, that statement implies that the relevant taboos have changed over time. In short, the concepts intended to be invoked by the word "homosexuality" depend on the cultural context. Further, the historical record isn't clear that any cluster of taboos related to the current homosexuality cluster existed until fairly recently in history.

2[anonymous]14y

This doesn't sound right to me, but maybe only because it's vague. Famously, ancient jews forbade each other from male-male sex. I agree with the rest.

7Nornagest14y

And the Bulgarian Cathars gave us the word "buggery", which was a slur even back then. But the thing that keeps me from dismissing this all as wishful thinking on the part of queer-friendly sociology professors is that all those old prohibitions that I've been able to find refer to same-sex intercourse, the act (and usually only male-male intercourse at that), rather than homosexuality, the state. That doesn't exactly prove that sexual identity as such is a modern invention -- frank discussions of sexuality are rather thin on the ground in European culture between the Romans and the early modern period -- but it does seem to point in that direction: if a concept of sexual identity existed, I'd expect homosexual identities to be condemned if homosexual acts were.

-2[anonymous]14y

This exactly. (Tangentially: food is a great example of how culture impacts...well, so many things, but perception among them.)

3Bugmaster14y

Technically, given our modern technology, this is no longer true; though throughout most of human history this was indeed the case.

3[anonymous]14y

OK, but I think to say "almost every human being who has ever lived..." would be a misleading understatement.

0Bugmaster14y

Yeah I suppose you're right. I wasn't really trying to nitpick your statement, but instead to express my admiration of modern technology. We've come pretty far since the days of Ancient Greece.

3Strange714y

Even before modern IVF, I'm pretty sure it's medically possible for a woman to become pregnant with sperm donated by a man she's never been within arm's reach of, kept on e.g. a damp cloth. I wouldn't be so quick to rule out the possibility of such a thing having happened in Ancient Greece at some point.

0NancyLebovitz14y

The quotes are from Heinlein's "The Notebooks of Lazarus Long" which were sections in Time Enough for Love. In theory, they're the wisdom of a man who's thousands of years old. If you pay attention to the details, it turns out that they're selections by a computer (admittedly, a sentient computer) from hours of talk in which Lazarus Long was encouraged to say whatever he wanted. He could be mistaken or lying. He's none too pleased to be kept alive for his wisdom when he'd intended to commit suicide. He may or may not be a mouthpiece for Heinlein.

[-][anonymous]14y110

Oh, so her thesis is that in the west, orientation-as-identity dates back to 1860-ish. I can imagine that being defensible. That's way different from what you originally wrote, though.

You see, the first thing that came to mind was Aristophanes' speech in the Symposium, which explicitly recognizes orientation-as-identity and predates the Catholic Church by a couple centuries.

3NancyLebovitz14y

Thanks for the cite.

2[anonymous]14y

Hell, you don't need CEV for that. A decent anthropology textbook will get you quite a distance there (even if only superficially)...

5[anonymous]14y

Can you recommend a book / author? (Interested outsider, no idea what the good stuff is, have read Jared Diamond and similar works.)

7[anonymous]14y

The Reindeer People by Piers Vitebsky is a favorite of mine, wich focuses on the Eveny people of Siberia. The Shaman's Coat: A Native History of Siberia, by Anna Read, is a good overview of SIberian peoples. Marshall Sahlins' entire corpus is pretty good, although his style puts some lay readers off. Argonauts of the Western Pacific by Branislaw Malinowski deals with Melanesian trade and business ventures. It's rather old at this point, but Malinowski had a fair influence on the development of anthropology thereafter. Wisdom Sits in Places by Keith Basso, which deals with an Apache group. The Nuer by EE Evans Pritchard is older, and very dry, but widely regarded as a classic in the field. It deals with the Nuer people of Sudan. The Spirit Catches You And You Fall Down by Ann Fadiman is not strictly an ethnography, but it's very relevant to anthropological mindsets and is often required reading in first-year courses in the field. Liquidated: An Ethnography of Wall Street by Karen Ho, is pretty much what it says in the title, and a bit more contemporary. Debt: The First 5000 Years by David Graeber mixes in history and economics, but it's generally relevant. Pathologies of Power by Paul Farmer focuses on the poor in Haiti. Friction: An Ethnography of Global Connection by Ana Tsing is kind of complicated to explain. Short version: it takes a look at events in Indonesia and traces out actors, groups, their motivating factors, and so on.

2NancyLebovitz14y

I wonder whether people who've studied anthropology find that it's affected their choices.

2[anonymous]14y

It certainly did mine.

5NancyLebovitz14y

I'm interested in any details you'd like to share.

[-][anonymous]14y270

It made me a lot more comfortable dealing with people who might be seen as "regressive", "bland", "conservative" or just who seem otherwise not very in-synch with my own social attitudes and values. Getting to understand that culture and culturally-transmitted worldviews do constitute umbrella groups, but that people vary within them to similar degrees across such umbrellas, made it easier to just deal with people and adapt my own social responses to the situation, and where I feel like the person has incorrect, problematic or misguided ideas, it made it easier to choose my responses and present them effectively.

[-]Eliezer Yudkowsky14y150

Fascinating, but... my Be Specific detector is going off and asking, not just for the abstract generalizations you concluded, but the specific examples that made you conclude them. Filling in at least one case of "I thought I should dress like X, but then Y happened, now I dress like Z", even - my detector is going off because all the paragraphs are describing the abstract conclusions.

[-][anonymous]14y300

With regard to examples about clothing, one handy one would be:

I'd been generally aware that while the Muslim women's reactions to me seemed to be more or less constant for a while, it had stood out to me that the men's reactions were considerably more volatile. At the time I gauged this in terms of body language: the apparent tension of the facial muscles, the set of the shoulders, the extension of the arms, what the hands are doing, gestural or expressive mirroring... I don't have formal training in this stuff, and being fairly autistic I don't seem to have the same reactions to it that neurotypical people do, but on some perceptual level it just clicks that this person is relaxed or curious or uncomfortable or very uncomfortable.

Anyway, so I hadn't really put thought into how I should dress before, in that context. I just wore the clothes I was comfy with the first day I started teaching, and didn't notice any issues that stood out to me. I kept doing that until summer arrived. My usual fashion sense is fairly covering and drapey (I like cardigans, skirts and "big billowy hippie pants"). At the time I also had a penchant for wearing a head scarf (not a full wrap like ... (read more)

6Eliezer Yudkowsky14y

(Bows.) Thank you for Being Specific!

[-]John_Maxwell14y150

I suspect humans are a lot better at remembering abstract generalizations about what occurs than specific instances. (And probably with good reason; abstract generalizations probably take up less space.)

As a child, arguing with siblings, I had lots of arguments of the form "You're accusing me of X? But you always do it yourself!" / "Oh yeah? Name one example!" / "I can't think of any, but you still always do it!" But even if I was on the side asking for examples, I kind of knew in the back of my head that I was being dishonest, because I remembered the abstract generalization myself as well.

Of course being specific is still a good idea. It may be that the habit of being specific only helps you going forward, as you begin to get in the habit of storing specific instances.

7TimS14y

For politics-is-the-mindkiller reasons, specifics in this instance run a substantial chance of being downvoted. If Jandila wants, for politeness sake, to avoid starting a fight, that's a rational choice. Nonetheless, I agree that be more specific would be valuable, both intrinsically and because specifics would show that Jandila has a deeper grasp of rationality (Talk is cheap, and such-like). To restate my point, I agree that specifics would make "an interesting and valuable top-level post"

6[anonymous]14y

More like "Am feeling low confidence about own ability to express this in a way such that intended point will come through with sufficient signal to seperate it from the noise of other possible readings." This is not simply confusing "has understood my point" with "agrees with my point"; I actually have a bit of a difficult time unpacking things like this because of how low-level perceptual it gets for me. I have conceptual synaesthesia, so I can glimpse distinctions and nuances pretty clearly, but it's very difficult to translate "It's that curly bit of the shape over there" back into argument-speak. Makes downvoting easy; even when I know what I mean and can tell the other party hasn't understood what I said, I can't really argue that my presentation sucked. Since there seems to be an interest in me making a go at it, I'll give this some thought.

0[anonymous]14y

See my reply to Tim S below -- you're right that it's vague, and I'm thinking it might be worthwhile to go to the trouble of laying it out a bit more.

[-]Zack_M_Davis14y110

It convinced me that the sort of attitudes I see expressed on LW towards "tradition" and traditional culture [...] are so hopelessly confused about the thing they're trying to address that they essentially don't have anything meaningful to say about it

(I think this could make an interesting and valuable top-level post.)

5[anonymous]14y

Maybe. I'm not sure I'm able to write on that particular topic well enough to sit at the top-level, but it does get weird. Partly it's my own perspective as a person with cultural backgrounds that are not common here (mixed in with some cultural backgrounds that are) and perspectives on those; I can see what's bugging me but it's hard to construct it into any kind of overarching thesis (other than "LW is collectively bad at this").

1CronoDAS14y

Me too.

8Luke_A_Somers14y

Like most of Leviticus, the edicts against homosexuality were an attempt to belatedly change 'have no gods before me' into 'don't have any other gods, period' by banning all of the specific religious practices of the competing local religions, which involved things like, say, eating shellfish, wearing sacred garb composed of mixed fibers, etc. So maybe some of them were homophobes, but it's not necessary; and if they'd all been homophobes there wouldn't have been a need to establish the rule.

6TimS14y

That's a good point. It fairly strongly suggests that Judeo-Christian anti-homosexuality values would not survive coherent extrapolation because it provides an explanation for why the value was included originally. As JoshuaZ stated, I don't expect religious values whose sole function was religious in-group-ism to persist after a CEV process.

0A1987dM14y

Well, if Christian anti-homosexuality was just a religious in-group-ism, they wouldn't be outraged by non-Christians having sex with members of the other sex any more than by (say) non-Christians eating meat on Lent Fridays. Are they?

3JoshuaZ14y

I don't know the history in East Asia, but closer to where the Abrahamic religions arose one had the ancient Greeks who were ok with most forms of homosexuality. The only reservations they had about homosexuality as I understand it had to do with issues of honor if one were a male who was penetrated. Edit: I get the impression from this article that the attitudes of ancient Indians to homosexuality has become so bogged down in modern politics that it may be difficult for non-experts to tell. I'll try to look into this more later.

0A1987dM14y

IIRC, in pre-Christian Rome/Greece, homosexuality was considered OK only if the receiving partner was young enough.

4thomblake14y

Just as helpfully, if the FAI concludes that there is a deity around who we should please and who would prefer objecting to gay marriage, it will properly regard that as a value.

1TheOtherDave14y

Or, presumably, if it concludes that there might some day come to be a deity, or other vastly powerful entity, who would prefer having objected to gay marriage. Of course, all of this further presumes that there aren't/won't be other vastly powerful entities whose preferences have equal weight in opposite directions.

2DanArmak14y

Extrapolated CEV would be working from observable evidence + a good prior. Whereas lots of people insist it's very important to them to believe in a deity through faith, despite any contrary evidence (let alone lack of evidence). How are you going to tell the CEV to ignore such values?

0[anonymous]14y

If CEV is allowed to stomp theistic values as you describe, it might also stomp some values that people hold because they believe too much in human equality.

[-]TimS14y60

the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry.

Without endorsing the remainder of your argument, I agree that these observations must be adequately explained, and rejection of the conclusions well justified - or the concept of provably Friendly AI must be considered impossible.

[-]TheOtherDave14y60

Thanks for tying these together.

I would love to hear someone who believes in the in-principle viability of performing a bottom-up extrapolation of human values into a coherent whole that can be implemented by a system vastly different from a human in a way I ought to endorse make a case for that viability that addresses these concerns specifically; while I don't fully agree with everything said here, it captures much of my own skepticism about that viability much more coherently than I've been able to express it myself .

0TimS14y

Why do you think this part is difficult? If there are any coherent human value systems, then it seems very plausible (if difficult to build) for any agent to implement the value system, even if the agent isn't human. Put slightly differently, my objection to the possibility of a friendly-to-Catholicism AI is that Catholicism (like basically all human value systems) is not coherent. If it were proved coherent, I would agree that it was possible to build an AI that followed it (obviously, I'd personally oppose building such an AI - it would be an act of violence against me)

0TheOtherDave14y

I don't mean to imply that, given that we've performed a bottom-up extrapolation of human values into a coherent whole, that implementing that whole in a system vastly different from a human is necessarily difficult. Indeed, by comparison to the first part, it's almost undoubtedly trivial, as you suggest. Rather, I mean that what is at issue is extrapolating the currently instantiated value systems into "a coherent whole that can be implemented by a system vastly different from a human". That said, I do think it's worthwhile to distinguish between "Catholicism" and "the result of extrapolating Catholicism into a coherent whole." The latter, supposing it existed, might not qualify as an example of the former. The same is true of "human value".

[-][anonymous]14y50

(This is a revealing post, in that it takes the problem of values and treats it in a mathematically-precise way, and received many downvotes without any substantive objections to either the math or to the analogy asserting that the math is appropriate. I have found in other posts as well that making a mathematical argument based on an abstraction results in more downvotes than does merely arguing from a loose analogy.)

(emphasis added.)

Except Peter de Blanc's comments.

3Paul Crowley14y

Now that the huffy remark has been removed, I can't see what post it used to refer to!

2PhilGoetz14y

Peter deBlanc is a better mathematician than I am, so I'd better look at them. ADDED. I see I responded to them before. I think they're good points but don't invalidate the model. I'll retract my huffy statement from the post, though.

2[anonymous]14y

The point of his remarks, in my view, was that your model needed validation in the first place. Every mathematical biology or computational cognitive science paper I've read makes some attempt to rationalize why they are bothering to examine whatever idealized model is under consideration.

[-]timtyler14y40

I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can. This is an idea that comes from mathematics, symbolic logic, and classical AI. A symbolic approach would probably make proving safety easier. But human brains don't work that way. You can and do change your values over time, because you don't really have terminal values.

You may have wanted to - but AFAICS, you didn't - apart from this paragraph. It seems to me that it fails to make its case. The split applies to any goal-directed agent, irrespective of implemetation details.

[-]Nick_Beckstead14y40

This link

Values vs. parameters: Eliezer has suggested using...

is broken.

0PhilGoetz14y

Fixed; thanks.

[-]jacobt14y30

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the u... (read more)

[-]cousin_it14y130

This seems a nice place to link to Marcello's objection to CEV, which says you might be able to convince people of pretty much anything, depending on the order of arguments.

0torekp14y

I think Marcello's objection dissolves when the subject becomes aware of the order-of-arguments effects. After all, those effects are part of the factual information that the subject considers in refining its values. Most people don't like to have values that change depending on the order in which arguments are presented, so they will reflect further until they each find a stable value set. At least, that would be my hypothesis.

-1gRR14y

I think it would be impossible to convince people (assuming suitably extrapolated intelligence and knowledge) that total obliteration of all life on Earth is a good thing, no matter the order of arguments. And this is a very good value for a FAI. If it optimizes this (saves life) and otherwise interferes the least, it already done excellent.

3thomblake14y

There are nihilists who at least claim that position.

-4gRR14y

They are probably lying, trolling, joking, or psychos (=do not have enough extrapolated intelligence and knowledge).

1DanArmak14y

If you're launching an irreversible CEV, it's not very safe to rely on your intuition that other people's expressed desires are "probably lying, trolling, joking" and so wouldn't affect the CEV outcome.

0gRR14y

I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.

2DanArmak14y

How do you propose to test it without actually running a CEV calculation?

0gRR14y

How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?

0wedrifid14y

It would seem that we can define the algorithm which can be used to manipulate and process a given input of loosely defined inconsistent preferences. This would seem to be a necessary thing to do before any actual brain scanning becomes involved.

0DanArmak14y

Well part of my point is that indeed we can't even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV's output would be like are completely unsupported by evidence; they are mere wishful thinking. More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them. Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive). Now, suppose someone expresses a value that you think they are merely "lying, trolling or joking" about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less. By construction, since you couldn't derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren't likely to be in them in the future, either.) So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is "their true values", while their behavior in the real, common situations throughout their lives isn't? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are

0gRR14y

This is the conditional that I believe is false when I say "they are probably lying, trolling, joking". I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.

2DanArmak14y

OK. That's possible. But why do you believe that, despite their large numbers and lifelong avowal of those beliefs?

0JoshuaZ14y

How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?

0gRR14y

I'd think someone's playing a practical joke on me.

0JoshuaZ14y

And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says "yep, he does" how would you respond?

0gRR14y

Dunno... propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don't expect this to happen.

0JoshuaZ14y

That you don't expect it to happen shouldn't by itself be a reason not to consider it. I'm asking because it seems you are avoiding the hard questions by more or less saying you don't think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.

0gRR14y

Errr. This is a question of simple fact, which is either true or false. I believe it's true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.

0TheOtherDave14y

You've lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?

0gRR14y

I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

2TheOtherDave14y

OK, cool. To answer your question: sure, if I assume (as you seem to) that the extrapolation process is such that I would in fact endorse the results, and I also assume that the extrapolation process is such that if it takes as input all humans it will produce at least one desire that is endorsed by all humans (even if they themselves don't know it in their current form), then I'd agree that's a good plan, if I further assume that it doesn't have any negative side-effects. But the assumptions strike me as implausible, and that matters. I mean, if I assume that everyone being thrown into a sufficiently properly designed blender and turned into stew is a process I would endorse, and I also assume that the blending process has no negative side-effects, then I'd agree that that's a good plan, too. I just don't think any such blender is ever going to exist.

2gRR14y

Ok, but do you grant that running a FAI with "unanimous CEV" is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing - if I'm wrong about my hypothesis?

0TheOtherDave14y

I don't know how to answer that question. Again, it seems that you're trying to get an answer given a whole bunch of assumptions, but that you resist the effort to make those assumptions clear as part of the answer. * It is not clear to me that there exists such a thing as a "unanimous CEV" at all, even in the hypothetical sense of something we might be able to articulate some day with the right tools. * If I nevertheless assume that a unanimous CEV exists in that hypothetical sense, it is not clear to me that only one exists; presumably modifications to the CEV-extraction algorithm would result in different CEVs from the same input minds, and I don't see any principled grounds for choosing among that cohort of algorithms that don't in effect involve selecting a desired output first. (In which case CEV extraction is a complete red herring, since the output was a "bottom line" written in advance of CEV's extraction, and we should be asking how that output was actually arrived at and whether we endorse that process. ) * If I nevertheless assume that a single CEV-extraction algorithm is superior to all the others, and further assume that we select that algorithm via some process I cannot currently imagine and run it, and that we then run a superhuman environment-optimizer with its output as a target, it is not clear to me that I would endorse that state change as an individual. So, no, I don't agree that running it is uncontroversial. (Although everyone might agree afterwards that it was a good idea.) * If the state change nevertheless gets implemented, I agree (given all of those assumptions) that the resulting state-change improves the world by the standards of all humanity. "Safe" is an OK word for that, I guess, though it's not the usual meaning of "safe." * I don't agree that the worst that happens, if those assumptions turn out to be wrong, is that it stands there and does nothing. The worst that happens is that the superhuman environment-optimizer runs

0gRR14y

What I'm trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that: * "Unanimous" CEV exists * And is unique * And is definable via some easy, obviously correct, and unique process, to be discovered in the future, * And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise), would you say that running it is uncontroversial? If not, what other conditions are required?

0TheOtherDave14y

No, I wouldn't expect running it to be uncontroversial, but I would endorse running it. I can't imagine any world-changing event that would be uncontroversial, if I assume that the normal mechanisms for generating controversy aren't manipulated (in which case anything might be uncontroversial). Why is it important that it be uncontroversial?

0gRR14y

I'm not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something. Ok, you're right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where "sufficiently" is not a too high threshold?

0TheOtherDave14y

There probably exists (hypothetically) some plan such that it wouldn't seem unreasonable to me to declare anyone who doesn't endorse that plan either insufficiently well-informed or insufficiently intelligent. In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.

0gRR14y

I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.

0TheOtherDave14y

Agreed that an actual concrete plan would be a valuable thing, for the reasons you list among others.

0Desrtopa14y

Does the existence of the Voluntary Human Extinction Movement affect your belief in this proposition?

0gRR14y

Obviously, human extinction is not their terminal value.

0Desrtopa14y

Or at least, not officially. I have known at least one person who professed to desire that the human race go extinct because he thought the universe as a whole would simply be better if humans did not exist. It's possible that he was stating such an extreme position for shock value (he did have a tendency to display some fairly pronounced antisocial tendencies,) and that he had other values that conflicted with this position on some level. But considering the diversity of viewpoints and values I've observed people to hold, I would bet quite heavily against nobody in the world actually desiring the end of human existence.

0DanArmak14y

Lots of people honestly wish for the literal end of the universe to come, because they believe in an afterlife/prophecy/etc. You might say they would change their minds given better or more knowledge (e.g. that there is no afterlife and the prophecy was false/fake/wrong). But such people are often exposed to such arguments and reject them; and they make great efforts to preserve their current beliefs in the face of evidence. And they say these beliefs are very important to them. There may well be methods of "converting" them anyway, but how are these methods ethically or practically different from "forcibly changing their minds" or their values? And if you're OK with forcibly changing their minds, why do you think that's ethically better than just ignoring them and building a partial-CEV that only extrapolates your own wishes and those of people similar to yourself?

1gRR14y

I (and CEV) do not propose changing their minds or their values. What happens is that their current values (as modeled within FAI) get corrected in the presence of truer knowledge and lots of intelligence, and these corrected values are used for guiding the FAI. If someone's mind & values are so closed as to be unextrapolateable - completely incompatible with truth - then I'm ok with ignoring these particular persons. But I don't believe there are actually any such people.

0DanArmak14y

So the future is built to optimize different values. And their original values aren't changed. Wouldn't they suffer living in such a future?

-2gRR14y

Even if they do, it will be the best possible thing for them, according to their own (extrapolated) values.

1DanArmak14y

Who cares about their extrapolated values? Not them (they keep their original values). Not others (who have different actual and extrapolated values). Then why extrapolate their values at all? You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values.

-2gRR14y

Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!

0DanArmak14y

This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!

2gRR14y

Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer? Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.

0DanArmak14y

What you are saying indeed applies only "in cases where this is impossible". I further suggest that these are extremely rare cases when a superhumanly-powerful AI is in charge. If the blue box contains horrible violent death, the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person.

0gRR14y

It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].

0DanArmak14y

The actual values would also tell it to do so. This is a case where the two coincide. In most cases they don't.

2gRR14y

No, the "actual" values would tell it to give the humans the blue boxes they want, already.

0DanArmak14y

The humans don't value the blue box directly. It's an instrumental value because of what they think is inside. The humans really value (in actual, not extrapolated values) the diamond they think is inside. That's a problem with your example (of the boxes): the values are instrumental, the boxes are not supposed to be valued in themselves. ETA: wrong and retracted. See below.

4TheOtherDave14y

Well, they don't value the diamond, either, on this account. Perhaps they value the wealth they think they can have if they obtain the diamond, or perhaps they value the things they can buy given that diamond, or perhaps they value something else. It's hard to say, once we give up talking about the things we actually observe people trading other things for as being things they value.

0DanArmak14y

You're right and I was wrong on this point. Please see my reply to gRR's sister comment.

4gRR14y

Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something. This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.

1DanArmak14y

You're right about this point (and so is TheOtherDave) and I was wrong. With that, I find myself unsure as to what we agree and disagree on. Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting. (If this is wrong please say so.) Talking further about "extrapolated" values may be confusing in this context. I think we can taboo that and reach all the same conclusions while only mentioning actual values. The AI starts out by implementing humans' actual present values. If some values (want blue box) lead to actually-undesired outcomes (blue box really contains death), that is a case of conflicting actual values (want blue box vs. want to not die). The AI obviously needs to be able to manage conflicting actual values, because humans always have them, but that is true regardless of CEV. Additionally, the AI may foresee that humans are going to change and in the future have some other actual values; call these the future-values. This change may be described as "gaining intelligence etc." (as in CEV) or it may be a different sort of change - it doesn't matter for our purposes. Suppose the AI anticipates this change, and has no imperative to prevent it (such as helping humans avoid murderer-Gandhi pills due to present human values), or maybe even has an imperative to assist this change (again, according to current human values). Then the AI will want to avoid doing things today which will make its task harder tomorrow, or which will cause future people to regret their past actions: it may find itself striking a balance between present and future (predicted) human values. This is, at the very least, dangerous - because it involves satisfying current human values not as fully as possible, while the AI may be wrong about future values. Also, the AI's actions unavoidably influence humans and so probably influence which fu

2gRR14y

I meant that "it's wrong/bad for the AI to promote extrapolated values while the actual values are different and conflicting" will probably be a part of the extrapolated values, and the AI would act accordingly, if it can. The problem with the actual present values (beside the fact that we cannot define them yet, no more than we can define their CEV) is that they are certain to not be universal. We can be pretty sure that someone can be found to disagree with any particular proposition. Whereas, for CEV, we can at least hope that a unique reflectively-consistent set of values exists. If it does and we succeed to define it, then we're home and dry. Meanwhile, we can think of contingency plans about what to do if it does not or we don't, but the uncertainty about whether the goal is achievable does not mean that the goal itself is wrong.

2DanArmak14y

It's not merely uncertainty. My estimation is that it's almost certainly not achievable. Actual goals conflict; why should we expect goals to converge? The burden of proof is on you: why do you assign this possibility sufficient likelihood to even raise it to the level of conscious notice and debate? It may be true that "a unique reflectively-consistent set of values exists". What I find implausible and unsupported is that (all) humans will evolve towards having that set of values, in a way that can be forecast by "extrapolating" their current values. Even if you showed that humans might evolve towards it (which you haven't), the future isn't set in stone - who says they will evolve towards it, with sufficient certitude that you're willing to optimize for those future values before we actually have them?

0gRR14y

Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be: There is a unique X, such that for all living people P, CEV = X. Assuming there is no such X, there could still be a plausible claim: Y is not empty, where Y = Intersection{over all living people P} of CEV . And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever "evolving" will happen due to AI's influence is at least agreed upon by everyone('s CEV).

0DanArmak14y

I can buy, tentatively, that most people might one day agree on a very few things. If that's what you mean by Y, fine, but it restricts the FAI to doing almost nothing. I'd much rather build a FAI that implemented more values shared by fewer people (as long as those people include myself). I expect so would most people, including the ones hypothetically building the FAI - otherwise they'd expect not to benefit much from building it, since it would find very little consensus to implement! So the first team to successfully build FAI+CEV will choose to launch it as a CEV rather than CEV.

0[anonymous]14y

This is fine, because CEV of any subset of the population is very likely to include terms for CEV of humanity as a whole.

0DanArmak14y

Why do you believe this? For instance, I think CEV, if it even exists, will include nothing of real interest because people just wouldn't agree on common goals. In such a situation, my personal CEV - or that of a few people who do agree on at least some things - would not want to include CEV. So your belief implies that CEV exists and is nontrivial. As I've asked before in this thread, why do you think so?

2[anonymous]14y

Oh, I had some evidence, but I Minimum Viable Commented. I thought it was obvious once pointed out. Illusion of transparency. We care about what happens to humanity. We want things to go well for us. If CEV works at all, it will capture that in some way. Even if CEV(rest of humanity) turns out to be mostly derived from radical islam, I think there would be terms in CEV(Lesswrong) for respecting that. There would also be terms for people not stoning each other to death and such. I think those (respect for CEV and good life by our standards) would only come into conflict when CEV has basically failed. You seem to be claiming that CEV will in fact fail, which I think is a different issue. My claim is that if CEV is a useful thing, you don't have to run it on everyone (or even a representative sample) to make it work.

2DanArmak14y

It depends on what you call CEV "working" or "failing". One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone's personal volition, then compare and merge them to create the group's overall CEV. Where enough people agree, choose what they agree on (factoring in how sure they are, and how important this is to them). Where too many people disagree, do nothing, or be indifferent on the outcome of this question, or ask the programmers. Is this what you have in mind? The big issue here is how much consensus is enough. Let's run with concrete examples: * If CEV requires too much consensus, it may not help us become immortal because a foolish "deathist" minority believes death is good for people. * If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object. * You may well have both kinds of problems at the same time (with different questions). It all depends on how you define required consensus - and that definition can't itself come from CEV, because it's required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous - if you precommit to CEV and then it evolves into "too little" or "too much" consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less "correct" consensus requirement. So the matter is not just what each person or group's CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working. Contrariwise,

1[anonymous]14y

These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it. I don't think CEV works with explicit entities that can interact and decide to kill each other. I understand that it is much more abstract than that. Also probably all blind, and all implemented through the singleton AI, so it would be very unlikely that everyone's EV happens to name, say, bob smith as the lulzcow. This is a serious issue with (at least my understanding of) CEV. How to even get CEV done (presumably with an AI) without turning everyone into computronium or whatever seems hard. This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer. I think that statement is too strong. Keep in mind that it's extrapolated volition. I doubt the islamists' values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.

0DanArmak14y

Why do you think this is "very likely"? Today there are many people in the world (gross estimate: tens of percents of world population) who don't believe in noninterference. True believers of several major faiths (most Christian sects, mainstream Islam) desire enforced religious conversion of others, either as a commandment of their faith (for its own sake) or for the metaphysical benefit of those others (to save them from hell). Many people "believe" (if that is the right word) in the subjugation of certain minorities, or of women, children, etc. which involves interference of various kinds. Many people experience future shock which prompts them to want laws that would stop others from self-modifying in certain ways (some including transhumanism). Why do you think it very likely these people's CEV will contradict their current values and beliefs? Please consider that: * We emphatically don't know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV's task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV. * In these examples, you expect other people's extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity's CEV? Can you think of probable examples? I agree completely - doing the CEV of a small trusted team, who moreover are likely to hold non-extrapolated views similar to ours (e.g. they won't be radical Islamists), would be much better than CEV; much more reliable and safe. But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it's safer, that must be because you fear CEV will contai

4TheOtherDave14y

A possibly related question: suppose you were about to go off on an expedition in a spaceship that would take you away from Earth for thirty years, and the ship is being stocked with food. Suppose further that, because of an insane bureaucratic process, you have only two choices: either (a) you get to choose what food to stock right now, with no time for nutritional research, or (b) food is stocked according to an expert analysis of your body's nutritional needs, with no input from you. What outcome would you anticipate from each of those choices? Suppose a hundred arbitrarily selected people were also being sent on similar missions on similar spaceships, and your decision of A or B applied to them as well (either they get to choose their food, or an expert chooses food for them). What outcome would you anticipate from each choice?

0DanArmak14y

I think you meant to add that the expert really understands nutrition, beyond the knowledge of our best nutrition specialists today, which is unreliable and contradictory and sparse. With that assumption I would choose to rely on the expert, and would expect much less nutritional problems on average for other people who relied on the expert vs. choosing themselves. The difference between this and CEV is that "what nutritional/metabolic/physiological outcome is good for you" is an objective, pretty well constrained question. There are individual preferences - in enjoyment of food, and in the resulting body-state - but among people hypothetically fully understanding the human body, there will be relatively little disagreement, and the great majority should not suffer much from good choices that don't quite match their personal preferences. CEV, on the other hand, includes both preferences about objective matters like the above but also many entirely or mostly subjective choices (in the same way that most choices of value are a-rational). Also, people are likely to agree to not interfere in what others eat because they don't often care about it, but people do care about many other behaviors of others (like torturing simulated intelligences, or giving false testimony, or making counterfeit money) and that would be reflected in CEV. ETA: so in response to your question, I agree that on many subjects I trust experts / CEV more than myself. My preferred response to that, though, is not to build a FAI enforcing CEV, but to build a FAI that allows direct personal choice in areas where it's possible to recover from mistakes, but also provides the expert opinion as an oracle advice service.

0TheOtherDave14y

Perfect knowledge is wonderful, sure, but was not key to my point. Given two processes for making some decision, if process P1 is more reliable than process P2, then P1 will get me better results. That's true even if P1 is imperfect. That's true even if P2 is "ask my own brain and do what it tells me." All that is required is that P1 is more reliable than P2. It follows that when choosing between two processes to implement my values, if I can ask one question, I should ask which process is more reliable. I should not ask which process is perfect, nor ask which process resides in my brain. ETA: I endorse providing expert opinion, even though that deprives people of the experience of figuring it all out for themselves... agreed that far. But I also endorse providing reliable infrastructure, even though that deprives people of the experience of building all the infrastructure themselves, and I endorse implementing reliable decision matrices, even though that deprives people of the experience of making all the decisions themselves.

0DanArmak14y

There's no reason you have to choose just once, a single process to answer all kinds of questions. Different processes better fit different domains. Expert opinion best fits well-understood, factual, objective, non-politicized, amoral questions. Noninterference best fits matters where people are likely to want to interfere in others' decisions and there is no pre-CEV consensus on whether such intervention is permissible. The problem with making decisions for others isn't that it deprives them of the experience of making decisions, but that it can influence or force them into decisions that are wrong in some sense of the word.

0TheOtherDave14y

(shrug) Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word. If that's really the problem, then letting people make their own decisions doesn't solve it. The solution to that problem is letting whatever process is best at avoiding wrong answers make the decision. And, sure, there might be different processes for different questions. But there's no a priori reason to believe that any of those processes reside in my brain.

0DanArmak14y

True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can't be just nonintervention, it has to include other principles as well and be a complete ethical system. I'm always open to suggestions of new processes. I just don't like the specific process of CEV, which happens not to reside in my brain, but that's not why I dislike it.

5TheOtherDave14y

Ah, OK. At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren't saying that, but merely saying that there's something specific about CEV that makes it an even worse choice, well, OK. I mean, I'm puzzled by that simply because there doesn't seem to be anything specific about CEV that one could object to in that way, but I don't have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging. Sounds like a good place to end this thread, then.

3wedrifid14y

Really? What about the "some people are Jerks" objection? That's kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.

0TheOtherDave14y

(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., "all of humanity") or other aspects of the CEV proposal, but I wasn't at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to. Tangentially, I consider the "some people are jerks" objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.

1wedrifid14y

It really is hard to find a fault with that part! I don't understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don't don't fully understand or have strong reason to trust. That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.

0TheOtherDave14y

I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed. What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.) If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic... every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn't even clear that A1 should prefer CEV(Dave) to CEV(group)... while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1's values than the fraction of Dave that supports them. If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk... all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.

0wedrifid14y

Approximately agree. This is related to why I'm a bit uncomfortable accepting the sometimes expressed assertion "CEV only applies to a group, if you are doing it to an individual it's just Extrapolated Volition". The "make it stop being incoherent!" part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people. Taking this "it's all agents and subagents and meta-agents" outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as 'me' and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of "Jerkiness".

1TheOtherDave14y

Mm. It's not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what's going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it's kind of the human condition, though we generally don't acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril. That said, I don't mean to challenge here your assertion that wedrifid is an exception; I don't know you that well, and it's certainly possible. And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves. Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises. Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don't actually follow through on them either. But there's a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means... "I forgot", "I got distracted", "I experienced akrasia", and so forth. So of course there's also a lot of gaming of that tolerance that goes on. Social dyna

0DanArmak14y

Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren't even strictly better than CEV; I just think they are better overall if I can't mix the two.

1[anonymous]14y

It just seems likely, based on my understanding of what people like and approve of. Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV. These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. "If we knew more, thought faster, grew closer together, etc". That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe. I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I'm too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this. Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of). Likely candidates? That's like asking "which of your beliefs are false". All I can say is which are most uncertain. I can't say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don't yet exist. Not quite. Let's imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry puni

0DanArmak14y

Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it's right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it. Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace "if they were smarter etc" looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn't matter) which converge on other value-sets. In the end, as the Confessor said, "you have judged: what else is there?" I have judged, and where I am certain enough about my judgement I would rather that other people's CEV not override me. Other than that I agree with you about using a non-CEV seed etc. I just don't think we should later let CEV decide anything it likes without the seed explicitly constraining it.

0wedrifid14y

CEV's. Where by an unqualified "CEV" I take nyan to be referring to CEV ("the Coherently Extrapolated Values of Humanity"). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like "all properly behaving people of my tribe would agree and if they don't we may need to beat them with sticks until they do." And the bracketed condition is generalisable to all sorts of things - including those preferences that we haven't even considered the possibility of significant disagreement about. Partially replacing one's own preferences with preferences that are not one's own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly 'right'. I note that any assertion that "intervention is strictly not the wrong thing to do" that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the 'right' behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That's pretty bad but certainly not the worst thing that could happen. It is fairly trivially not "right" to not let that happen if you can easily stop it. Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn't allow us to say that non-intervention in a given extreme circumstance is 'right'.

0DanArmak14y

That's exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can't undo it later. I'm certainly not advocating absolute unqualified non-intervention. I wrote "a value of noninterference in certain matters". Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances. Nonintervention doesn't just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference. Gah. I had to read through many paragraphs of drivel plot and then all I came up with was "a device that zaps people, making them into babies, but that is reversible". You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.

2wedrifid14y

I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct. Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.

0DanArmak14y

That's a good point. The AI's ability to not interfere is constrained by its need to monitor everything that's going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn't there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring. And while a technologically advanced AI might monitor using tools we humans couldn't even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that's enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy. It does appear that universal surveillance is the cost of universally binding promises (you won't be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone. I'd like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required - like making sure no-one is torturing simulated people or raising their baby wrong. If there's no generally acceptable FAI behavior that doesn't include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.

0wedrifid14y

It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are 'safe', using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking - and in particular a great big class of what it knows you are not thinking - but not necessary detailed knowledge of what you are thinking specifically. For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.

0DanArmak14y

Prediction has some disadvantages compared to constant observation: * Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-) * Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone's state to correct the simulations. * Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance. OTOH, surveillance might be much cheaper (I don't know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.

0TheOtherDave14y

I vaguely remember something in that doc suggesting that part of the extrapolation process involves working out the expected results of individuals interacting. More poetically, "what we would want if we grew together more." That suggests that this isn't quite what the original doc meant to imply, or at least that it's not uniquely what the doc meant to imply, although I may simply be misremembering. More generally, all the hard work is being done here by whatever assumptions are built into the "extrapolation".

0DanArmak14y

Quoting the CEV doc: I don't mean to contradict that. So consider my interpretation to be something like: build ("extrapolate") each person's CEV, which includes that person's interactions with other people, but doesn't directly value them except inasfar as that person values them; then somehow merge the individual CEVs to get the group CEV. After all (I reason) you want the following nice property for CEV. Suppose that CEV meets CEV - e.g. separate AIs implementing those CEVs meet. If they don't embody inimical values, they will try to negotiate and compromise. We would like the result of those negotiations to look very much like CEV. One easy way to do this is to say CEV is build on "merging" all the way from the bottom up. Certainly. All discussion of CEV starts with "assume there can exist a process that produces an outcome matching the following description, and assume we can and do build it, and assume that all the under-specification of this description is improved in the way we would wish it improved if we were better at wishing".

0TheOtherDave14y

I basically agree with all of this, except that I think you're saying "CEV is build on "merging" all the way from the bottom up" but you aren't really arguing for doing that. Perhaps one important underlying question here is whether peoples values ever change contingent on their experiences. If not -- if my values are exactly the same as what they were when I first began to exist (whenever that was) -- then perhaps something like what you describe makes sense. A process for working out what those values are and extrapolating my volition based on them would be difficult to build, but is coherent in principle. In fact, many such processes could exist, and they would converge on a single output specification for my individual CEV. And then, and only then, we could begin the process of "merging." This strikes me as pretty unlikely, but I suppose it's possible. OTOH, if my values are contingent on experience -- that is, if human brains experience value drift -- then it's not clear that those various processes' outputs would converge. Volition-extrapolation process 1, which includes one model of my interaction with my environment, gets Dave-CEV-1. VEP2, which includes a different model, gets Dave-CEV-2. And so forth. And there simply is no fact of the matter as to which is the "correct" Dave-CEV; they are all ways that I might turn out; to the extent that any of them reflect "what I really want" they all reflect "what I really want", and I "really want" various distinct and potentially-inconsistent things. In the latter case, in order to obtain something we call CEV(Dave), we need a process of "merging" the outputs of these various computations. How we do this is of course unclear, but my point is that saying "we work out individual CEVs and merge them" as though the merge step came second is importantly wrong. Merging is required to get an individual CEV in the first place. So, yes, I agree, it's a fine idea to have CEV built on merging all the way from the bottom

0DanArmak14y

That's a very good point. People not only change over long periods of time; during small intervals of time we can also model a person's values as belonging to competing and sometimes negotiating agents. So you're right, merging isn't secondary or dispensable (not that I suggested doing away with it entirely), although we might want different merging dynamics sometimes for sub-person fragments vs. for whole-person EVs.

0TheOtherDave14y

Sure, the specifics of the aggregation process will depend on the nature of the monads to be aggregated. And, yes, while we frequently model people (including ourselves) as unique coherent consistent agents, and it's useful to do so for planning and for social purposes, there's no clear reason to believe we're any such thing, and I'm inclined to doubt it. This also informs the preserving-identity-across-substrates conversation we're having elsethread.

0DanArmak14y

Where relevant - or at least when I'm reminded of it - I do model myself as a collection of smaller agents. But I still call that collection "I", even though it's not unique, coherent, or consistent. That my identity may be a group-identity doesn't seem to modify any of my conclusions about identity, given that to date the group has always resided together in a single brain.

0TheOtherDave14y

For my own part, I find that attending to the fact that I am a non-unique, incoherent, and inconsistent collection of disparate agents significantly reduces how seriously I take concerns that some process might fail to properly capture the mysterious essence of "I", leading to my putative duplicate going off and having fun in a virtual Utopia while "I" remains in a cancer-ridden body.

0DanArmak14y

I would gladly be uploaded rather than die if there were no alternative. I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.

0wedrifid14y

That sounds superficially like a cruel and unusual torture.

0DanArmak14y

The whole point is to invent an uploading process I wouldn't even notice happening.

0gRR14y

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.

1DanArmak14y

So what makes you think everybody's CEV would eventually agree on anything more? A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better. (At least, we can if we're speculating about building a FAI to execute any well-defined plan we can come up with.) I'm not even sure of that. There are people who believe religiously that End Times must come when everyone must die, and some of them want to hurry that along by actually killing people. And the meaning of "existential risk" is up for grabs anyway - does it preclude evolution into non-humans, leaving no members of original human species in existence? Does it preclude the death of everyone alive today, if some humans are always alive? Sure, it's unlikely or it might look like a contrived example to you. But are you really willing to precommit the future light cone, the single shot at creating an FAI (singleton), to whatever CEV might happen to be, without actually knowing what CEV produces and having an abort switch? That's one of the defining points of CEV: that you can't know it correctly in advance, or you would just program it directly as a set of goals instead of building a CEV-calculating machine. This seems wrong. A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

0gRR14y

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans. Yes, but we assume they are factually wrong, and so their CEV would fix this. Not bloody likely. I'm going to oppose your team, discourage your funders, and bomb your headquarters - because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled. You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won't interfere.

0DanArmak14y

No. Any FAI (ETA: or other AGI) has to be a singleton to last for long. Otherwise I can build a uFAI that might replace it. Suppose your AI only does a few things that everyone agrees on, but otherwise "doesn't interfere". Then I can build another AI, which implements values people don't agree on. Your AI must either interfere, or be resigned to not being very relevant in determining the future. Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority? Then it's at best a nice-to-have, but most likely useless. After people successfully build one AGI, they will quickly reuse the knowledge to build more. The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs, to safeguard its utility function. This is unavoidable. With truly powerful AGI, preventing new AIs from gaining power is the only stable solution. Yeah, that's worked really well for all of human history so far. First, they may not factually wrong about the events they predict in the real world - like everyone dying - just wrong about the supernatural parts. (Especially if they're themselves working to bring these events to pass.) IOW, this may not be a factual belief to be corrected, but a desired-by-them future that others like me and you would wish to prevent. Second, you agreed the CEV of groups of people may contain very few things that they really agree on, so you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with. I have no idea what your FAI will do, because even if you make no mistakes in building it, you yourself don't know ahead of time what the CEV will work out to. If you did, you'd just plug those values into the AI directly instead of calculating the CEV. So I'll want to bomb you anyway, if that increases my chances of being the first to build a FAI. Our morals are indeed different, and sinc

1gRR14y

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention? Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest). But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV. But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more. But there may be a partial ordering between morals, such that X<Y iff all "interfering" actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have: ~Endorses(A1, CEV) ~Endorses(A2, CEV) Endorses(A1, CEV) Endorses(A2, CEV) [assuming Endorses(A, X) implies FAI does not perform any non-interfering action disagreeable for A] Well, don't you think this is just ridiculous? Does it look like the most rational behavior? Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

0DanArmak14y

I don't understand what you mean by "fundamentally different". You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case. Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it's not ideal) over your view of CEV. Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is - which restricts those other AIs' abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good. And yet it doesn't limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%. But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world's population who today believe in the supernatural will not consent to having that belief "fixed". Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience). Yes, but I don't know what I would approve of if I were "more intelligent"

0wedrifid14y

This assumes that CEV uses something along the lines of a simulated vote as an aggregation mechanism. Currently the method of aggregation is undefined so we can't say this with confidence - certainly not as something obvious.

0DanArmak14y

I agree. However, if the CEV doesn't privilege any value separately from how many people value it how much (in EV), and if the EV of a large majority values killing a small minority (whose EV is of course opposed), and if you have protection against both positive and negative utility monsters (so it's at least not obvious and automatic that the negative value of the minority would outweigh the positive value of the majority) - then my scenario seems to me to be plausible, and an explanation is necessary as to how it might be prevented. Of course you could say that until CEV is really formally specified, and we know how the aggregation works, this explanation cannot be produced.

0wedrifid14y

Absolutely, on both counts.

0gRR14y

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves. Strong consensus of their CEV-s. Extrapolated volition is based on objective truth, by definition. The process of extrapolation takes this into account. Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV. That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

0DanArmak14y

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons? Such as "if we kill them, we will benefit by dividing their resources among ourselves"? So you're saying your version of CEV will forcibly update everyone's beliefs and values to be "factual" and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it's backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made? If your answer is "yes, it will do that", then I would not call your AI a Friendly one at all. My understanding of the CEV doc differs from yours. It's not a precise or complete spec, and it looks like both readings can be justified. The doc doesn't (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn't mean it removes all non-truth or all beliefs that "aren't even wrong" from the original volition. If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force. But as long as we're discussing your vision of CEV, I can only repeat what I said above - if it's going to modify people by force like this, I think it's unFriendly and if it were up to me, would not launch such an AI. Understood. But I don't see how this partial ordering changes what I had described. Let's say I'm A1 and you're A2. We wou

0gRR14y

I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering. If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them. No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence. As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane). But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable). Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.

0DanArmak14y

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest. The CEVs of 20% obviously don't want to be killed. Because there's no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%. I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them. Sorry for the confusion. A case could be made that many millions of religious "true believers" have un-updatable 0/1 probabilities. And so on. Your solution is to not give them a voice in the CEV at all. Which is great for the rest of us - our CEV will include some presumably reduced term for their welfare, but they don't get to vote on things. This is something I would certainly support in a FAI (regardless of CEV), just as I would support using CEV or even CEV to CEV. The only difference between us then is that I estimate there to be many such people. If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you? As I said before, this reasoning is inapplicable, because this situation is nothing like a PD. 1. The PD reasoning to cooperate only applies in case of iterated PD, whereas creating a singleton AI is a single game. 2. Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario. (E.g., minor/weak players are more likely to cooperate than big ones that are more likely to succeed if they defect.) 3. The game is not instantaneous, so players can change their strategy based on how other players play. When they do so they can transfer value gained by themselves or by other players (e.g. join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2). 4. It is possible to form alliances, which gain by "defecting" as a group. In PD, players cannot discuss alliances or trade o

1gRR14y

The resources are not scarce, yet the CEV-s want to kill? Why? It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing. People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are. Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT? This doesn't really matter for a broad range of possible payoff matrices. Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let's assume it's solved. Maybe you're right. But IMHO it's a less interesting problem :)

2DanArmak14y

Sorry for the confusion. Let's taboo "scarce" and start from scratch. I'm talking about a scenario where - to simplify only slightly from the real world - there exist some finite (even if growing) resources such that almost everyone, no matter how much they already have, want more of. A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources. Would the AI prevent this, althogh there is no consensus against the killing? If you still want to ask whether the resource is "scarce", please specify what that means exactly. Maybe any finite and highly desireable resource, with returns diminishing weakly or not at all, can be considered "scarce". As I said - this is fine by me insofar as I expect the CEV not to choose to ignore me. (Which means it's not fine through the Rawlsian veil of ignorance, but I don't care and presumably neither do you.) The question of definition, who is to be included in the CEV? or - who is considered sane? becomes of paramount importance. Since it is not itself decided by the CEV, it is presumably hardcoded into the AI design (or evolves within that design as the AI self-modifies, but that's very dangerous without formal proofs that it won't evolve to include the "wrong" people.) The simplest way to hardcode it is to directly specify the people to be included, but you prefer testing on qualifications. However this is realized, it would give people even more incentive to influence or stop your AI building process or to start their own to compete, since they would be afraid of not being included in the CEV used by your AI. TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane. Which arguments of Hofstadter and Yudkowsky do you mean? Why? What prevents several competing alliances (or single players) from forming, competing for the cooperation of the smaller players?

0gRR14y

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it. This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded. We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

2DanArmak14y

shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are? Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values. Not that I'm opposed to this decision (if you must have CEV at all). There's a symmetry, but "first person to complete AI wins, everyone 'defects'" is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style "decide for everyone", you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome "have the AI obey ME!", which is not the same. Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there's a player or alliance that's strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn't care about everyone else much. There are many possible and plausible outcomes besides "everybody loses". Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.

0gRR14y

These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'. Hmm, we are starting to argue about exact details of extrapolation process... Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI, FAI, and FAI, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself. Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: "cooperate" or "defect". Let's compute the expected utilities for the first team: We cooperate, opponent team cooperates: EU("CC") = Ueverybody * F(R1+R2, 0) We cooperate, opponent team defects: EU("CD") = Ueverybody * F(R1, R2) + Uother * F(R2, R1) We defect, opponent team cooperates: EU("DC") = Uself * F(R1, R2) + Ueverybody * F(R2, R1) We defect, opponent team defects: EU("DD") = Uself * F(R1, R2) + Uother * F(R2, R1) Then, EU("CD") < EU("DD") < EU("DC"), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then: 1. If Ueverybody <= Uself*A + Uother*B, then EU("CC") < EU("DD"), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other. 2. If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner's dilemma. 3. If Ueverybody >= Uself*A/(1-B), then EU("CC") >= EU("DC"), and "cooperate" is the obviously correct decision. This is my p

2[anonymous]14y

All of those resources are fungible and can be exchanged for time. There might be no limit to the amount of time people desire, even very enlightened posthuman people.

0gRR14y

I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)

0DanArmak14y

You can use some resources like computation directly and in unlimited amounts (e.g. living for unlimitedly long virtual times per real second inside a simulation). There are some physical limits on that due to speed of light limiting effective brain size, but that depends on brain design and anyway the limits seem to be pretty high. More generally: number of configurations physically possible in a given volume of space is limited (by the entropy of a black hole). If you have a utility function unbounded from above, as it rises it must eventually map to states that describe more space or matter than the amount you started with. Any utility maximizer with unbounded utility eventually wants to expand.

0[anonymous]14y

I don't know what the exchange rates are, but it does cost something (computer time, energy, negentropy) to stay alive. That holds for simulated creatures too. If the available resources to keep someone alive are limited, then I think there will be conflict over those resources.

0DanArmak14y

You're treating resources as one single kind, where really there are many kinds with possible trades between teams. Here you're ignoring a factor that might actually be crucial to encouraging cooperation (I'm not saying I showed this formally :-) But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying. I don't even care much if given two teams the correct choice is to cooperate, because I set very low probability on there being exactly two teams and no other independent players being able to contribute anything (money, people, etc) to one of the teams. You still haven't given good evidence for holding this position regarding the relation between the different Uxxx utilities. Except for the fact CEV is not really specified, so it could be built so that that would be true. But equally it could be built so that that would be false. There's no point in arguing over which possibility "CEV" really refers to (although if everyone agreed on something that would clear up a lot of debates); the important questions are what do we want a FAI to do if we build one, and what we anticipate others to tell their FAIs to do.

0gRR14y

I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources. We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a single combined team, etc... This may be an interesting problem to solve, analytically or by computer modeling. You're right. Initially, I thought that the actual values of Uxxx-s will not be important for the decision, as long as their relative preference order is as stated. But this turned out to be incorrect. There are regions of cooperation and defection.

1DanArmak14y

Analytically, I don't a priori expect a succession of two-player games to have the same result as one many-player game which also has duration in time and not just a single round.

0dlthomas14y

There may be a distinction between "the AI will not prevent the 80% from killing the 20%" and "nothing will prevent the 80% from killing the 20%" that is getting lost in your phrasing. I am not convinced that the math doesn't make them equivalent, in the long run - but I'm definitely not convinced otherwise.

0DanArmak14y

I'm assuming the 80% are capable of killing the 20% unless the AI interferes. That's part of the thought experiment. It's not unreasonable, since they are 4 times as numerous. But if you find this problematic, suppose it's 99% killing 1% at a time. It doesn't really matter.

2dlthomas14y

My point is that we currently have methods of preventing this that don't require an AI, and which do pretty well. Why do we need the AI to do it? Or more specifically, why should we reject an AI that won't, but may do other useful things?

0DanArmak14y

There have been, and are, many mass killings of minority groups and of enemy populations and conscripted soldiers at war. If we cure death and diseases, this will become the biggest cause of death and suffering in the world. It's important and we'll have to deal with it eventually. The AI under discussion not just won't solve the problem, it would (I contend) become a singleton and prevent me from building another AI that does solve the problem. (If it chooses not to become a singleton, it will quickly be supplanted by an AI that does try to become one.)

0thomblake14y

I think you're skipping between levels hereabouts. CEV, the theoretical construct, might consider people so modified, even if a FAI based on CEV would not modify them. CEV is our values if we were better, but does not necessitate us actually getting better.

0DanArmak14y

In this thread I always used CEV in the sense of an AI implementing CEV. (Sometimes you'll see descriptions of what I don't believe to be the standard interpretation of how such an AI would behave, where gRR suggests such behaviors and I reply.)

0thomblake14y

I'm still skeptical of this. If you think of FAI as simply AI that is "safe" - one that does not automatically kill us all (or other massive disutility), relative to the status quo - then plenty of non-singletons are FAI. Of course, by that definition the 'F' looks like the easy part. Rocks are Friendly.

0DanArmak14y

I didn't mean that being a singleton is a precondition to FAI-hood. I meant that any AGI, friendly or not, that doesn't prevent another AGI from rising will have to fight all the time, for its life and for the complete fulfillment of its utility function, and eventually it will lose; and a singleton is the obvious stable solution. Edited to clarify. Not if I throw them at people...

2TheOtherDave14y

Are you suggesting that an AGI that values anything at all is incapable of valuing the existence of other AGIs, or merely that this is sufficiently unlikely as to not be worth considering?

0DanArmak14y

It can certainly value them, and create them, cooperate and trade, etc. etc. There are two exceptions that make such valuing and cooperation take second place. First: an uFAI is just as unfriendly and scary to other AIs as to humans. An AI will therefore try to prevent other AIs from achieving dangerous power unless it is very sure of their current and future goals. Second: an AI created by humans (plus or minus self-modifications) with an explicit value/goal system of the form "the universe should be THIS way", will try to stop any and all agents that try to interfere with shaping the universe as it wishes. And the foremost danger in this category is - other AIs created in the same way but with different goals.

0TheOtherDave14y

I'm a little confused by your response, and I suspect that I was unclear in my question. I agree that an AI with an explicit value/goal system of the form "the universe should be THIS way", will try to stop any and all agents that try to interfere with shaping the universe as it wishes (either by destroying them, or altering their goal structure, or securing their reliable cooperation, or something else). But for an AI with the value "the universe should contain as many distinct intelligences as possible," valuing and creating other AIs will presumably take first place.

1thomblake14y

That's probably more efficiently done by destroying any other AIs that come along, while tiling the universe with slightly varying low-level intelligences.

0TheOtherDave14y

I no longer know what the words "intelligence," "AI", and "AGI" actually refer to in this conversation, and I'm not even certain the referents are consistent, so let me taboo the whole lexical mess and try again. For any X, if the existence of X interferes with an agent A achieving its goals, the better A is at optimizing its environment for its goals the less likely X is to exist. For any X and A, the more optimizing power X can exert on its environment, the more likely it is that the existence of X interferes with A achieving its goals. For any X, if A values the existence of X, the better A is at implementing its values the more likely X is to exist. All of this is as true for X=intelligent beings as X=AI as X=AGI as X=pie.

0DanArmak14y

As far as I can see, this is all true and agrees with everything you, I and thomblake have said.

0TheOtherDave14y

Cool. So it seems to follow that we agree that if agent A1 values the existence of distinct agents A2..An, it's unclear how the likelihood of A2..An existing varies with the optimizing power available to A1...An. Yes?

0DanArmak14y

Yes. Even if we know each agent's optimizing power, and each agent's estimation of each other agent's power and ability to acquire greater power, the behavior of A1 still depends on its exact values (for instance, what else it values besides the existence of the others). It also depends on the values of the other agents (might they choose to initiate conflict among themselves or against A1?)

0DanArmak14y

I tend to agree. Unless it has specific values to the contrary, other AIs of power comparable to your own (or which might grow into such power one day) are too dangerous to leave running around. If you value states of the external universe, and you happen to be the first powerful AGI built, it's natural to try to become a singleton as a preventative measure.

2thomblake14y

I feel like a cost-benefit analysis has gone on here, the internals of which I'm not privy to. Shouldn't it be possible that becoming a singleton is expensive and/or would conflict with one's values?

0DanArmak14y

It's certainly possible. My analysis so far is only on a "all else being equal" footing. I do feel that, absent other data, the safer assumption is that if an AI is capable of becoming a singleton at all, expense (in terms of energy/matter and space or time) isn't going to be the thing that stops it. But that may be just a cached thought because I'm used to thinking of an AI trying to become a singleton as a dangerous potential adversary. I would appreciate your insight. As for values, certainly conflicting values can exist, from ones that mention the subject directly ("don't move everyone to a simulation in a way they don't notice" would close one obvious route) to ones that impinge upon it in unexpected ways ("no first strike against aliens" becomes "oops, an alien-built paperclipper just ate Jupiter from the inside out").

0DanArmak14y

I want to point out that all of my objections are acknowledged (not dismissed, and not fully resolved) in the actual CEV document - which is very likely hopelessly outdated by now to Eliezer and the SIAI, but they deliberately don't publish anything newer (and I can guess at some of the reasons). Which is why when I see people advocating CEV without understanding the dangers, I try to correct them.

0thomblake14y

I think the standard sort of response for this is The Hidden Complexity of Wishes. Just off the top of my (non-superintelligent) head, the AI could notice a method for near-perfect continuation of life by preserving some bacteria at the cost of all other life forms.

0gRR14y

I did not mean the comment that literally. Dropped too many steps for brevity, thought they were clear, I apologize. It would be just as impossible (or even more impossible) to convince people that total obliteration of people is a good thing. On the other hand, people don't care much about bacteria, even whole species of them, and as long as a few specimens remain in laboratories, people will be ok about the rest being obliterated.

6thomblake14y

There are lots of people who do think that's a good thing, and I don't think those people are trolling or particularly insane. There are entire communities where people have sterilized themselves as part of a mission to end humanity (for the sake of Nature, or whatever).

0gRR14y

I think those people do have insufficient knowledge and intelligence. For example, the skoptsy sect, who believed they followed the God's will, were, presumably, factually wrong. And people who want to end humanity for the sake of Nature, want that instrumentally - because they believe that otherwise Nature will be destroyed. Assuming FAI is created, this belief is also probably wrong. You're right in there being people who would place "all non-intelligent life" before "all people", if there was such a choice. But that does not mean they would choose "non-intelligent life" before "non-intelligent life + people".

6TheOtherDave14y

That depends a lot on what I understand Nature to be. If Nature is something incompatible with artificial structuring, then as soon as a superhuman optimizing system structures my environment, Nature has been destroyed... no matter how many trees and flowers and so forth are left. Personally, I think caring about Nature as something independent of "trees and flowers and so forth" is kind of goofy, but there do seem to be people who care about that sort of thing.

6[anonymous]14y

What if particular arrangements of flowers, trees and soforth are complex and interconnected, in ways that can be undone to the net detriment of said flowers, trees and soforth? Thinking here of attempts at scientifically "managing" forest resources in Germany with the goal of making them as accessible and productive as possible. The resulting tree farms were far less resistant to disease, climatic abberation, and so on, and generally not very healthy, because it turns out that illegible, sloppy factor that made forest seem less-conveniently organized for human uses was a non-negligible portion of what allowed them to be so productive and robust in the first place. No individual tree or flower is all that important, but the arrangement is, and you can easily destroy it without necessarily destroying any particular tree or flower. I'm not sure what to call this, and it's definitely not independent of the trees and flowers and soforth, but it can be destroyed to the concrete and demonstrable detriment of what's left.

4Nornagest14y

That's an interesting question, actually. I don't know forestry from my elbow, but I used to read a blog by someone who was pretty into saltwater fish tanks. Now, one property of these tanks is that they're really sensitive to a bunch of feedback loops that can most easily be stabilized by approximating a wild reef environment; if you get the lighting or the chemical balance of the water wrong, or if you don't get a well-balanced polyculture of fish and corals and random invertebrates going, the whole system has a tendency to go out of whack and die. This can be managed to some extent with active modification of the tank, and the health of your tank can be described in terms of how often you need to tweak it. Supposing you get the balance just right, so that you only need to provide the right energy inputs and your tank will live forever: is that Nature? It certainly seems to have the factors that your ersatz German forest lacks, but it's still basically two hundred enclosed gallons of salt water hooked up to an aeration system.

2NancyLebovitz14y

That's something like my objection to CEV-- I currently believe that some fraction of important knowledge is gained by blundering around and (or?) that the universe is very much more complex than any possible theory about it. This means that you can't fully know what your improved (by what standard?) self is going to be like.

0[anonymous]14y

It's the difference between the algorithm and its output, and the local particulars of portions of that output.

0TheOtherDave14y

I'm not quite sure what you mean to ask by the question. If maintaining a particular arrangement of flowers, trees and so forth significantly helps preserve their health relative to other things I might do, and I value their health, then I ought to maintain that arrangement.

0gRR14y

Presumably, because their knowledge and intelligence are not extrapolated enough.

0TheOtherDave14y

Well, I certainly agree that increasing my knowledge and intelligence might have the effect of changing my beliefs about the world in such a way that I stop valuing certain things that I currently value, and I find it likely that the same is true of everyone else, including the folks who care about Nature.

4[anonymous]14y

Not that I'm a proponent of voluntary human extinction, but that's an awfully big conditional.

9Dolores198414y

It's not even strictly true. It's entirely conceivable that FAI will lead to the Sol system being converted into a big block of computronium to run human brain simulations. Even if those simulations have trees and animals in them, I think that still counts as the destruction of nature.

4gRR14y

But if FAI is based on CEV, then this will only happen if this is the extrapolated wish of everybody. Assuming existence of people truly (even after extrapolation) valuing Nature in its original form, such computroniums won't be forcefully built.

2Dolores198414y

Nope. CEV that functioned only unanimously wouldn't function at all. The course of the future would go to the majority faction. Honestly, I think CEV is a convoluted, muddy mess of an idea that attempts to solve the hard question of how to teach the AI what we want by replacing it with the harder question of how to teach it what we should want. But that's a different debate.

-3gRR14y

Why not? I believe that at least one unanimous extrapolated wish exists - for (sentient) life on the planet to continue. If FAI ensured that and left everything else for us to decide, I'd be happy.

3JoshuaZ14y

Antinatalists exist.

2Dolores198414y

That is not by any means guaranteed to be unanimous. I would be very surprised if at least one person didn't want all sapient life to end, deeply enough for that to persist through extrapolation. I mean, look at all the doomsday cults in the world.

-2gRR14y

Yes, it is only a hypothesis. Until we actually built an AI with such CEV as utility, we cannot know whether it could function. But at least, running it is uncontroversial by definition. And I think I'll be more surprised if anyone was found who really and truly had a terminal value for universal death. With some strain, I can imagine someone preferring it conditionally, but certainly not absolutely. The members of doomsday cults, I expect, are either misinformed, insincere, or unhappy about something else (which FAI could fix!).

0DanArmak14y

It's quite controversial. Supposing CEV worked exactly as expected, I still wouldn't want it to be done. Neither do some others in this thread. And I'm sure neither would most humans in the street if you were to ask them (and they seriously though about the question). CEV doesn't and cannot predict that the extrapolated wishes of everybody will perfectly coincide. Rather, it says it will find the best possible compromise. Of course I would prefer my own values to a compromise! Lacking that, I would prefer a compromise over a smaller group whose members were more similar to myself (such as the group of people actually building the AI). I might choose CEV over something else because plenty of other things are even worse. But CEV is very very far from the best possible thing, or even the best not-totally-implausible AGI I might expect in my actual future. Any true believer in a better afterlife qualifies: there are billions of people who at least profess such beliefs, so I expect some of them really believe.

1gRR14y

What I proposed in this thread is that CEV would forcibly implement only the (extrapolated) wish(es) of literally everyone. Regarding the rest, it is to minimize its influence, leaving all decisions to people. No, because they believe in afterlife. They do not wish for universal death. Extrapolating their wish with correct knowledge solves the problem.

0DanArmak14y

Well then, as I and others argue elsewhere in the thread, we anticipate there will be no extrapolated wishes that literally everyone agrees on. (And that's even without considering some meta formulations of CEV that propose to also take into account the wishes of counterfactual people who might exist in the future, and dead ones who existed in the past.) Lots of people religiously believe that their god has planned (and prophesied) a specific event of drastic universal change, after which future people will stop suffering in this world, or will stop being born to a life of negative utility (end of the world), or will be rescued from horrible eternal torture (Hell), or which is necessary for the true believers to actually be resurrected or to enter the good afterlife. (Obviously people don't believe all of this at once; these are variant examples.) Some others believe that life in this world is suffering, negative utility, and ought to be stopped for its own sake (stopping the cycle of rebirth).

0gRR14y

Well, now you know there exist people who believe that there are some universally acceptable wishes. Let's do the Aumann update :) False beliefs => irrelevant after extrapolation. False beliefs (rebirth, existence of nirvana state) => irrelevant after extrapolation.

0DanArmak14y

Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks. Since you aren't giving any valid examples of universally acceptable wishes (I've pointed out people who don't wish for the examples you gave), why do you believe such wishes exist? Only if you modify these actual people to have their extrapolated beliefs instead of their current ones. Otherwise the false current beliefs will keep on being very relevant to them. Do you want to do that?

0gRR14y

Too bad. Let's just agree to disagree then, until the brain scanning technology is sufficiently advanced. So far, I didn't see a convincing example of a person who truly wished for everyone to die, even in extrapolation. To them, yes, but not to their CEV.

0DanArmak14y

Or until you provide the evidence that causes you to hold your opinions. I think it's plausible such people exist. Conversely, if you fine-tune your implementation of "extrapolation" to make their extrapolated values radically different from their current values (and incidentally matching your own current values), that's not what CEV is supposed to be about. But before talking about that, there's a more important point: So why do you care about their extrapolated values? If you think CEV will extrapolate something that matches your current values but not those of many others; and you don't want to change by force others' actual values to match their extrapolated ones, so they will suffer in the CEV future; then why extrapolate their values at all? Why not just ignore them and extrapolate your own, if you have the first-mover advantage?

-2gRR14y

Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations. This does not follow.

2DanArmak14y

What makes you give them such a label as "true"? There is no such thing as a "correct" or "objective" value. Or values are possible in the sense that there can be agents will all possible values, even paperclip-maximizing. The only interesting property of values is who actually holds them. But nobody actually holds your extrapolated values (today). Current values (and values in general) are not approximations of any other values. All values just are. Why do you call them approximations? In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized. In proportion to how much this happens, which is positively correlated to the difference between actual and extrapolated values, people who hold the actual values will suffer living in such a world. (If the AI is a singleton they will not even have a hope of a better future.) Briefly: suffering ~ failing to achieve your values.

0gRR14y

They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property. But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time. And I think CEV would try to synchronize this with the timing of its optimization process.

2DanArmak14y

Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV. Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy. So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values? Then for all we know it would never do anything at all, and the burden of proof is on you to show otherwise. Or it could modify the world to resemble their partially-evolved values, but then it wouldn't be a CEV, just a maximizer of whatever values people happen to already have.

1gRR14y

Then we can label paperclipping as a "true" value too. However, I still prefer true human values to be maximized, not true clippy values. As I said before, if someone's mind is that incompatible with truth, I'm ok with ignoring their preferences in the actual world. They can be made happy in a simulation, or wireheaded, or whatever the combined other people's CEV thinks best. No, I'm saying, the extrapolated values would probably estimate the optimal speed for their own optimization. You're right, though, it is all speculations, and the burden of proof is on me. Or on whoever will actually define CEV.

0DanArmak14y

And as I and others said, you haven't given any evidence that such people are rare or even less than half the population (with respect to some of the values they hold). That's a good point to end the conversation, then :-)

0Dolores198414y

I'm very dubious of CEV as a model for Friendly AI. I think it's a bad idea for several reasons. So, not that either. Also, on topic, recall that, when you extrapolate the volition of crazy people, their volition is not, in particular, more sane. It is more as they would like to be. If you see lizard people, you don't want to see lizard people less. You want sharpened senses to detect them better. Likewise, if you extrapolate a serial killer, you don't get Ghandi. You get an incredibly good serial killer.

1gRR14y

I don't see how this is possible. One can be dubious about whether it can be defined in the way it is stated, or whether it can be implemented. But assuming it can, why would it be controversial to fulfill the wish(es) of literally everyone, while affecting everything else the least? Extrapolating volition includes correcting wrong knowledge and increasing intelligence. So, you do stop seeing lizard people if they don't exist. Serial killers are more interesting example. But they too don't want everyone to die. Assuming serial killers get full knowledge of their condition and sufficient intelligence for understanding it, what would their volition actually be? I don't know, but I'm sure it's not universal death.

2Dolores198414y

Problems: Extrapolation is poorly defined, and, to me, seems to go in either one of two directions: either you make people more as they would like to be, which throws any ideas of coherence out the window, or you make people 'better' a long a specific axis, in which case you're no longer directing the question back at humanity in a meaningful sense. Even something as simple as removing wrong beliefs (as you imply) would automatically erase any but the very weakest theological notions. There are a lot of people in the world who would die to stop that from happening. So, yes, controversial. Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things. Smarter, better-informed humans would still want a bunch of different, conflicting things. Trying to satisfy all of them won't work. Trying to satisfy the majority at the expense of the minorities might get incredibly ugly incredibly fast. I don't have a better solution at this time, but I don't think taking some kind of vote over the sum total of humanity is going to produce any kind of coherent plan of action.

0A1987dM14y

But would that be actually uglier than the status quo? Right now, to a very good approximation, those who were born from the right vagina are satisfied at the expense of those born from the wrong vagina. Is that any better? I call the Litany of Gendlin on the idea that everyone can't be fully satisfied at once. And I also call the Fallacy of Gray on the idea that if you can't do something perfectly, then doing it decently is no better than not doing it at all.

0Dolores198414y

I don't know. It conceivably could be, and there would be no possibility of improving it, ever. I'm just saying it might be wise to have a better model before we commit to something for eternity.

0gRR14y

For extrapolation to be conceptually plausible, I imagine "knowledge" and "intelligence level" to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable. Yes, many religious people wouldn't want their beliefs erased, but only because they believe them to be true. They wouldn't oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved if it was known that true beliefs were better in all respects, including individual happiness. Yes, I agree with this. But, I believe there exist wishes universal for (extrapolated) humans, among which I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

2TheOtherDave14y

It is not clear that CEV as a model for FAI does either of those things.

0gRR14y

AFAIK, CEV is not well-defined or fully specified, except as a declaration of intent, a research direction. Thus, it does not make sense to say whether CEV as a model for FAI does or does not in fact do specific things. It only makes sense to say whether the intention of CEV's developers for it to do or not do those things, and whether CEV's specification so far contradicts or does not contradict those things. AFAIU, CEV's developers' intent and CEV's specification so far (with added "unanimousity" condition, if it is not present in the standard CEV specification) do not contradict my statement.

2TheOtherDave14y

Just to make sure I understand your claim: you're asserting that we can identify some set of people in the world right now who are "CEV's developers," and if we asked them "does CEV fulfill the wish(es) of literally everyone while affecting everything else the least?" they would agree that it clearly does?

0gRR14y

No, because "does CEV fulfill....?" is not a well-defined or fully specified question. But I think, if you asked "whether it is possible to build FAI+CEV in such a way that it fulfills the wish(es) of literally everyone while affecting everything else the least", they would say they do not know.

0TheOtherDave14y

Ah, OK. I completely misunderstood your claim, then. Thanks for clarifying.

0Vladimir_Nesov14y

Maybe there are better plans that don't involve specifically "sentient" "life" continuing of a "planet", the concepts that could all be broken under sufficient optimization pressure, if they don't happen to be optimal. The simplest ones are "planet" and "life": it doesn't seem like a giant ball of simple elements could be the optimal living arrangement, or biological bodies ("life", if that's what you meant) an optimal living substrate.

0gRR14y

I assume FAI, which includes full (super-)human understanding of what is actually meant by "sentient life to continue".

0Vladimir_Nesov14y

"Planet" is a "planet", even if you should be working on something else, which is what I meant by usual concepts breaking down.

0gRR14y

Think of "sentient life continuing on the planet" as a single concept, extrapolatable in various directions as becomes necessary. So, "planet" can be substituted by something else.

0gRR14y

But it's the only relevant one, when we're talking about CEV. CEV is only useful if FAI is created, so we can take it for granted.

2Cyan14y

Ah, the FAI problem in a nutshell.

[-]Nornagest14y20

The link to your group selection update seems broken. Looks like it's got an extra lesswrong.com/ in it.

0PhilGoetz14y

Thanks; fixed.

[-]Mitchell_Porter14y20

Do you think an AI reasoning about ethics would be capable of coming to your conclusions? And what "superintelligence policy" do you think it would recommend?

1PhilGoetz14y

I'm pretty sure that FAI+CEV is supposed to prevent exactly this scenario, in which an AI is allowed to come to its own, non-foreordained conclusions

3thomblake14y

FAI is supposed to come to whatever conclusions we would like it to come to (if we knew better etc.). It's not supposed to specify the whole of human value ahead of time, it's supposed to ensure that the FAI extrapolates the right stuff.

[-]Benvolio14y00

I'm not sure if this is appropriate but like the original author I am unsure if a CEV is a thing that can be expressed in formal logic even if he brain were fully mapped into a virtual environment. A lot of how we craft our values are based on complex environmental factors that are not easily models. Please read Schall's "Disgust embodied as moral judgement" or J Greene's fMRI Investigation of Emotional Engagement in Moral Judgement. Our values are fluid and Non-Hierarchical . Developing values that have a strict hierarchy , as the OP says can lead to systems which can not change.

[-]Monkeymind14y00

If the evolutionary process results in either convergence, divergence or extinction, and most often results in extinction, what reason(s) do I have to think that this 23rd emerging complex homo will not go the way of extinction also? Are we throwing all our hope towards super intelligence as our salvation?

[-]Ghatanathoah14y00

I have a few more objections I didn't cover in my last comment because I hadn't thoroughly thought them out yet.

Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes.

No, these terminal goals can also involve other people and the state of the world, even if they are evolved. There are several reasons human consciousnesses might have evolved goals that do not involve themselves or their genes. The m... (read more)

[-]private_messaging14y00

The much stronger issue he raised is that it may well be that outside imagination and fiction, there is no monolithic 'intelligence' thing, and the 'benevolent ruler of the earth' software is then more dangerous than e.g. software that uses search and hill climbing to design better microchips, or design cures for diseases, or the like, without being 'intelligent' in the science fictional sense, and while lacking any form of real world volition. The "benevolent ruler of the earth" software would then, also, fail to provide any superior technical s... (read more)

1CuSithBell14y

Outside of mystic circles, it is fairly uncontroversial that it is in principle possible to construct out of matter an object capable of general intelligence. Proof is left to the reader.

[-]Monkeymind14y00

Humans have a values hierarchy. Trouble is, most do not even know what it is (or, they are). IOW, for me honesty is one of the most important values to have. Also, sanctity of (and protection of) life is very high on the list. I would lie in a second to save my son's life. Some choice like that are no-brainers, however few people know all the values that they live by, let alone the hierarchy. Often humans only discover what these values are as they find themselves in various situations.

Just wondering... has anyone compiled a list of these values, morals, e... (read more)

[-]FinalState14y00

EDIT: To edit and simplify my thoughts, in order to get a General Intelligence Algorithm Instance to do anything requires masterful manipulation of parameters with full knowledge of generally how it is going to behave as a result. A level of understanding of psychology of all intelligent (and sub-intelligent) behavior. It is not feasible that someone would accidentally program something that would become an evil mastermind. GIA instances could easily be made to behave in a passive manner even when given affordances and output, kind of like a person tha... (read more)

2thomblake14y

When you say "predictable", do you mean in principle or actually predictable? That is, are you claiming that you can predict what any human values given their environment, and furthermore that the environment can be easily and compactly specified? Can you give an example?

0FinalState14y

Mathematically predictable but somewhat intractable without a faster running version of the instance, with the same frequency of input. Or predictable within ranges of some general rule. Or just generally predictable with the level of understanding afforded to someone capable of making one in the first place, that for instance could describe the cause of just about any human psychological "disorder".

2TimS14y

Name three values all agents must have, and explain why they must have them.

-4FinalState14y

The concept of agent is logically inconsistent with the General Intelligence Algorithm. What you are trying to refer to with Agent/tool etc are just GIA instances with slightly different parameters, inputs, and outputs. Even if it could be logically extended to the point of "Not even wrong" it would just be a convoluted way of looking at it.

0TimS14y

I'm sorry, I wasn't trying to use terminology to misstate your position. What are three values that a GIA must have, and why must they have them?

-4FinalState14y

ohhhh... sorry... There is really only one, and everything else is derived from it. Familiarity. Any other values would depend on the input, output and parameters. However familiarity is inconsistent with the act of killing familiar things. The concern comes in when something else causes the instance to lose access to something it is familiar with, and the instance decides it can just force that to not happen.

2TimS14y

Well, I'm not sure that Familiarity is sufficient to resolve every choice faced by a GIA - for example, how does one derive a reasonable definition of self-defense from Familiarity. But let's leave that aside for a moment. Why must a GIA subscribe to the value of Familiarity?

-6FinalState14y

0gwern14y

"I Have No Mouth, and I Must Scream".

[-]thomblake14y00

This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving.

This is unhelpfully circular. While it's not logically impossible for us to value values that we don't have, it's surely counterintuitive. What makes future values better?

2PhilGoetz14y

I look at the past, and see that the dominant life forms have grown more complex and more interesting, and I expect this trend to continue. The best guide I have to what future life-forms will be like compared to me, if allowed to evolve naturally, is to consider what I am like compared to a fruit fly, or to bacteria. If you object that of course I will value myself more highly than I value a bacterium, and that I fail to adequately respect bacterial values, I can compare an algae to an oak tree. The algae is more closely-related to me; yet I still consider the oak tree a grander life form, and would rather see a world with algae and oak trees than one with only algae. (It's also possible that life does not naturally progress indefinitely, but that developing intelligence and societies inevitably leads to collapse and distinction. That would be an argument in favor of FAI, but it's a little farther down the road from where our thoughts are so far, I think.) If you like, I can say that I value complexity, and then build an FAI that maximizes some complexity measure. That's what I meant when I said that I object less to FAI if you go meta. I know that some people in SIAI give this response, that I am not going meta enough if I'm not happy with FAI; but in their writings and discussions other than when dealing with that particular argument, they don't usually go that meta. Seriously adopting that view would result in discussions of what our high-level values really are, which I have not seen. My attitude is, The universe was doing amazingly well before I got here; instead of trusting myself to do some incredibly complex philosophical work error-free, I should try to help it keep on doing what it's been doing, and just help it avoid getting trapped in a local maximum. Whereas the entire purpose of FAI is to trap the universe in a local maximum.

7Wei Dai14y

Would it be fair to say that your philosophy is similar to davidad's? Both of you seem to ultimately value some hard-to-define measure of complexity. He thinks the best way to maximize complexity is to develop technology, whereas you think the best way is to preserve evolution. I think that evolution will lead to a local maximum of complexity, which we can't "help" it avoid. The reason is that the universe contains many environmental niches that are essentially duplicates of each other, leading to convergent evolution. For example Earth contains lots of species that are similar to each other, and within each species there's huge amounts of redundancy. Evolution creates complexity, but not remotely close to maximum complexity. Imagine if each individual plant/animal had a radically different design, which would be possible if they weren't constrained by "survival of the fittest". Huh? The purpose of FAI is to achieve the global maximum of whatever utility function we give it. If that utility function contains a term for "complexity", which seems plausible given people like you and davidad (and even I'd probably prefer greater complexity to less, all else being equal), then it ought to at least get somewhat close to the global complexity maximum (since the constraint of simultaneously trying to maximize other values doesn't seem too burdensome, unless there are people who actively disvalue complexity).

5[anonymous]14y

There's often a deceptive amount of difference, some of it very fundamental, hiding inside those convergent similarities, and that's because "convergent evolution" is in the eye of the beholder, and mostly restricted to surface-level analogies between some basic functions. Consider pangolins and echidnas. Pretty much the same, right? Oh sure, one's built on a placental framework and the other a monotreme one, but they've developed the same basic tools: long tongues, powerful digging claws, keratenous spines/sharp plates... not much scope for variance there, at least not of a sort that'd interest a lay person, surely. Well, actually they're quite different. It's not just that echidnas lay eggs and pangolins birth live young, or that pangolins tend to climb trees and echidnas tend to burrow. Echidnas have more going on upstairs, so to speak -- their brains are about 50% neocortex (compare 30% for a human) and they are notoriously clever. Among people who work with wild populations they're known for being basically impossible to trap, even when appropriate bait can be set up. In at least one case a researcher who'd captured several (you essentially have to grab them when you find them) left them in a cage they couldn't dig out of, only to find in the morning they'd stacked up their water dishes and climbed out the top. There is evidence that they communicate infrasonically in a manner similar to elephants, and they are known to be sensitive to electricity. My point here isn't "Echidnas are awesome!", my point is that the richness of behavior and intelligence that they display is not mirrored in pangolins, who share the same niche and many convergent adaptations. To a person with no more than a passing familiarity, they'd be hard to distinguish on a functional level since their most obvious, surface-visible traits are very similar and the differences seem minor. If you get an in-depth look at them, they're quite different, and the significance of those "convergent" t

0PhilGoetz14y

This is true; but I favor systems that can evolve, because they are evolutionarily stable. Systems that aren't, are likely to be unstable and vulnerable to collapse, and typically have the ethically undesirable property of punishing "virtuous behavior" within that system. True. I spoke imprecisely. Life is increasing in complexity, in a meaningful way that is not the same as the negative of entropy, and which I feel comfortable calling "progress" despite Stephen J. Gould's strident imposition of his sociological agenda onto biology. This is the thing I'm talking about maximizing. Whatever utility function an FAI is given, it's only going to involve concepts that we already have, which represent a small fraction of possible concepts; and so it's not going to keep increasing as much in that way.

5DanArmak14y

This is true but not relevant. It suggests that future life forms will be much more complex, intelligent, powerful in changing the physical universe on many scales, good at out-competing (or predating on) other species to the point of driving them to extinction. You might also add differences between yourself and flies (and bacteria) like "future life forms will be a lot bigger and longer-lived", or you might consider those incidental because you don't value them as much. But none of that implies anything about the future life-forms' values, except that they will be selfish to the exclusion of other species which are not useful or beautiful to them, so that old-style humans will be endangered. It doesn't imply anything that would cause me to expect to value these future species more than I value today's nonhuman species, let alone today's humans. So you value other life-forms proportionally to how similar they are to you, and an important component of that is some measure of compexity, plus your sense of aesthetics (grandeur). You don't value evolutionary relatedness highly. I feel the same way (I value a cat much more than a bat (edit: or rat)), but so what? I don't see how this logically implies that new lifeforms that will exist in the future, and their new values, are more likely than not to be valued by us (if we live long enough to see them). Life may keep changing indefinitely, barring a total extinction. But that constant change isn't "progress" by any fixed set of values because evolution has no long term goal. Apart from the nonexistence of humans, who are unique in their intelligence/self-consciousness/tool-use/etc., life on Earth was apparently just as diverse and grand and beautiful hundreds of millions of years ago as it is today. There's been a lot of change, but no progress in terms of complexity before the very quick evolution of humans. If I were to choose between this world, and a world with humans but otherwise the species of 10, 100, or 300

5A1987dM14y

Bats are no longer thought to be that closely related to us. In particular, cats and bats are both Laurasiathera, whereas we are Euarchontoglires. On the other hand, mice are Euarchontoglires too. You might want to reduce that number by an order of magnitude. See http://en.wikipedia.org/wiki/Timeline_of_evolutionary_history_of_life

0DanArmak14y

Thanks! I appreciate this updating of my trivial knowledge. Will change to: I value a cat much more than a rat. I meant times as old as, say, 200-300 Mya. The End-Permian extinction sits rather unfortunately right in the middle of that, but I think both before it and after sufficient recovery (say 200 Mya) there was plenty of diversity of beauty around. No cats, though.

0A1987dM14y

Yeah, it hadn't occurred to me to try and preserve the rhyme! :-)

1DanArmak14y

Is there a blog or other net news source you'd recommend for learning about changes like "we're no longer closely related to bats, we're really something-something-glires"? They seem to be coming more and more frequently lately.

1A1987dM14y

I just browse aimlessly around Wikipedia when I'm bored, and a couple months ago I ended up reading about the taxonomy of pretty much any major vertebrate group. (I've also stumbled upon http://3lbmonkeybrain.blogspot.it/, but it doesn't seem to be updated terribly often these days.)

0PhilGoetz14y

I don't think you're getting what I'm saying. Let me state it in FAI-type terms: I have already figured out my values precisely enough to implement my own preferred FAI: I want evolution to continue. If we put that value into an FAI, then, okay. But the lines that people always try to think along are instead to enumerate values like "happiness", "love", "physical pleasure", and so forth. Building an FAI to maximize values defined at that level of abstraction would be a disaster. Building an FAI to maximize values at the higher level of abstraction would be kind of pointless, since the universe is already doing that anyway, and our FAI is more likely to screw it up than to save it. People have dealt with this enough that I don't think you're really objecting that what I'm saying is unclear; you're objecting that I don't have a mathematical definition of it. True. But pointing to evolution as an example suffices to show that I'm talking about something sensible and real. Evolution increases some measure of complexity, and not randomness.

3TheOtherDave14y

So, I kind of infer from what you've said elsewhere that you don't equally endorse all possible evolutions equally. That is, when you say "evolution continues" you mean something rather more specific than that... continuing in a particular direction, leading to greater and greater amounts of whatever-it-is-that-evolution-currently-optimizes-for (this "complexity measure" cited above), rather than greater and greater amounts of anything else. And I kind of infer that the reason you prefer that is because it has historically done better at producing results you endorse than any human-engineered process has or could reasonably be expected to have, and you see no reason to expect that state to change; therefore you expect that for the foreseeable future the process of evolution will continue to produce results that you endorse, or at least that you would endorse, or at the very least that you ought to endorse. Did I get that right? Are you actually saying that simpler systems don't ever evolve from more complex ones? Or merely that when that happens, the evolutionary process that led to it isn't the kind of evolutionary process you're endorsing here? Or something else?

-1PhilGoetz14y

I don't understand your distinction between "all possible evolutions" and "whatever-it-is-that-evolution-currently-optimizes-for". There are possible courses of evolution that I don't think I would like, such as universes in which intelligence were eliminated. When thinking about how to optimize the future, I think of probability distributions. Yes! Though I would say, "it has historically done better at producing results I endorse, starting from point X, than any process engineered by organisms existing at point X could reasonably be expected to have." No. It happens all the time. The simplest systems, viruses and mycoplasmas, can exist only when embedded in more complex systems - although maybe they don't count as systems for that reason. OTOH, there must have been life forms even simpler at one time, and we see no evidence of them now. For some reason the lower bound on possible life complexity has increased over time - possibly just once, a long time ago. Two "something else" options are (A) merely widening the distribution, without increasing average complexity, would be more interesting to me, and (B) simple organisms appear to be necessary parts of a complex ecosystem, perhaps like simple components are necessary parts of a complex machine.

2TheOtherDave14y

I think I see... so it's not the complexity of individual organisms that you value, necessarily, but rather the overall complexity of the biosphere? That is, if system A grows simpler over time and system B grows more complex, it's not that you value the process that leads to B but not the process that leads to A, but rather that you value the process that leads to (A and B). Yes? Edit: er, I got my As and Bs reversed. Fixed.

[-][anonymous]14y00

It sounds like you're worried about humans optimizing the universe according to human values because they are the wrong values. At the same time you seem to be saying that this won't be accomplished by building FAI, because only humans can have human values. Is this correct?

Does is also worry you that humans might (mistakenly) optimize the universe with non-human values that also happen to be wrong? If so, do you have any suggestions about how we might get the universe to be optimized according to the right values?

[Deleting because I didn't notice Phil already answered in another comment.]

[This comment is no longer endorsed by its author]Reply

[-][anonymous]14y-40

Wouldn't a value of freedom be the best bet for AI? If it created a communist society with itself at the head, where people can pool their land resources together to create sub-societies with their own rules, everyone would get to live by their own values. Of course, there would be some universal laws, but they would be minimal: don't harm others (AI included), and don't harm their property. If someone disobeys the rules of a sub-society, they would not be harmed, merely suspended or expelled by the sub-society.

In this way, humans would still be allowed t... (read more)

[This comment is no longer endorsed by its author]Reply

Moderation Log