Holden's Objection 1: Friendliness is dangerous
Nick_Beckstead asked me to link to posts I referred to in this comment. I should put up or shut up, so here's an attempt to give an organized overview of them.
Since I wrote these, LukeProg has begun tackling some related issues. He has accomplished the seemingly-impossible task of writing many long, substantive posts none of which I recall disagreeing with. And I have, irrationally, not read most of his posts. So he may have dealt with more of these same issues.
I think that I only raised Holden's "objection 2" in comments, which I couldn't easily dig up; and in a critique of a book chapter, which I emailed to LukeProg and did not post to LessWrong. So I'm only going to talk about "Objection 1: It seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous." I've arranged my previous posts and comments on this point into categories. (Much of what I've said on the topic has been in comments on LessWrong and Overcoming Bias, and in email lists including SL4, and isn't here.)
The concept of "human values" cannot be defined in the way that FAI presupposes
Human errors, human values: Suppose all humans shared an identical set of values, preferences, and biases. We cannot retain human values without retaining human errors, because there is no principled distinction between them.
A comment on this post: There are at least three distinct levels of human values: The values an evolutionary agent holds that maximize their reproductive fitness, the values a society holds that maximizes its fitness, and the values a rational optimizer holds who has chosen to maximize social utility. They often conflict. Which of them are the real human values?
Values vs. parameters: Eliezer has suggested using human values, but without time discounting (= changing the time-discounting parameter). CEV presupposes that we can abstract human values and apply them in a different situation that has different parameters. But the parameters are values. There is no distinction between parameters and values.
A comment on "Incremental progress and the valley": The "values" that our brains try to maximize in the short run are designed to maximize different values for our bodies in the long run. Which are human values: The motivations we feel, or the effects they have in the long term? LukeProg's post Do Humans Want Things? makes a related point.
Group selection update: The reason I harp on group selection, besides my outrage at the way it's been treated for the past 50 years, is that group selection implies that some human values evolved at the group level, not at the level of the individual. This means that increasing the rationality of individuals may enable people to act more effectively in their own interests, rather than in the group's interest, and thus diminish the degree to which humans embody human values. Identifying the values embodied in individual humans - supposing we could do so - would still not arrive at human values. Transferring human values to a post-human world, which might contain groups at many different levels of a hierarchy, would be problematic.
I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can. This is an idea that comes from mathematics, symbolic logic, and classical AI. A symbolic approach would probably make proving safety easier. But human brains don't work that way. You can and do change your values over time, because you don't really have terminal values.
Strictly speaking, it is impossible for an agent whose goals are all indexical goals describing states involving itself to have preferences about a situation in which it does not exist. Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes. If they involve ourselves, then utility functions based on these goals cannot even be computed once we die. If they involve our genes, they they are goals that our bodies are pursuing, that we call errors, not goals, when we the conscious agent inside our bodies evaluate them. In either case, there is no logical reason for us to wish to maximize some utility function based on these after our own deaths. Any action I wish to take regarding the distant future necessarily presupposes that the entire SIAI approach to goals is wrong.
My view, under which it does make sense for me to say I have preferences about the distant future, is that my mind has learned "values" that are not symbols, but analog numbers distributed among neurons. As described in "Only humans can have human values", these values do not exist in a hierarchy with some at the bottom and some on the top, but in a recurrent network which does not have a top or a bottom, because the different parts of the network developed simultaneously. These values therefore can't be categorized into instrumental or terminal. They can include very abstract values that don't need to refer specifically to me, because other values elsewhere in the network do refer to me, and this will ensure that actions I finally execute incorporating those values are also influenced by my other values that do talk about me.
Even if human values existed, it would be pointless to preserve them
Only humans can have human values:
- The only preferences that can be unambiguously determined are the preferences a person (mind+body) implements, which are not always the preferences expressed by their beliefs.
- If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values, since it will behave differently.
- Values exist in a network of other values. A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
- Supposing that values are referential helps only by telling you to ignore human values.
- You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
- Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
- The future will thus be ethically contentious even if we accurately characterize and agree on present human values, because these values will fail to address the new important problems.
Human values differ as much as values can differ: There are two fundamentally different categories of values:
- Non-positional, mutually-satisfiable values (physical luxury, for instance)
- Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen
All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict. If you find an alien life form from a distant galaxy with non-positional values, it would be easier to integrate those values into a human culture with only human non-positional values, than to integrate already-existing positional human values into that culture.
It appears that some humans have mainly the one type, while other humans have mainly the other type. So talking about trying to preserve human values is pointless - the values held by different humans have already passed the most-important point of divergence.
Enforcing human values would be harmful
The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.
Re-reading this, I see that the critical paragraph is painfully obscure, as if written by Kant; but it summarizes the argument: "Once the initial symbol set has been chosen, the semantics must be set in stone for the judging function to be "safe" for preserving value; this means that any new symbols must be defined completely in terms of already-existing symbols. Because fine-grained sensory information has been lost, new developments in consciousness might not be detectable in the symbolic representation after the abstraction process. If they are detectable via statistical correlations between existing concepts, they will be difficult to reify parsimoniously as a composite of existing symbols. Not using a theory of phenomenology means that no effort is being made to look for such new developments, making their detection and reification even more unlikely. And an evaluation based on already-developed values and qualia means that even if they could be found, new ones would not improve the score. Competition for high scores on the existing function, plus lack of selection for components orthogonal to that function, will ensure that no such new developments last."
Averaging value systems is worse than choosing one: This describes a neural-network that encodes preferences, and takes some input pattern and computes a new pattern that optimizes these preferences. Such a system is taken as analogous for a value system and an ethical system to attain those values. I then define a measure for the internal conflict produced by a set of values, and show that a system built by averaging together the parameters from many different systems will have higher internal conflict than any of the systems that were averaged together to produce it. The point is that the CEV plan of "averaging together" human values will result in a set of values that is worse (more self-contradictory) than any of the value systems it was derived from.
A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (428)
You may have wanted to - but AFAICS, you didn't - apart from this paragraph. It seems to me that it fails to make its case. The split applies to any goal-directed agent, irrespective of implemetation details.
Without endorsing the remainder of your argument, I agree that these observations must be adequately explained, and rejection of the conclusions well justified - or the concept of provably Friendly AI must be considered impossible.
If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the universe adopt "evolved" values, then CEV will extrapolate this desire. The only issue is that other people might not share this desire, even when extrapolated. In that case insisting that values "evolve" is imposing minority desires on everyone, mostly people who could never be convinced that these values are good. Which might be a good thing, but it can be handled in CEV by taking CEV(some "progressive" subset of humans).
This seems a nice place to link to Marcello's objection to CEV, which says you might be able to convince people of pretty much anything, depending on the order of arguments.
I think it would be impossible to convince people (assuming suitably extrapolated intelligence and knowledge) that total obliteration of all life on Earth is a good thing, no matter the order of arguments. And this is a very good value for a FAI. If it optimizes this (saves life) and otherwise interferes the least, it already done excellent.
There are nihilists who at least claim that position.
Lots of people honestly wish for the literal end of the universe to come, because they believe in an afterlife/prophecy/etc.
You might say they would change their minds given better or more knowledge (e.g. that there is no afterlife and the prophecy was false/fake/wrong). But such people are often exposed to such arguments and reject them; and they make great efforts to preserve their current beliefs in the face of evidence. And they say these beliefs are very important to them.
There may well be methods of "converting" them anyway, but how are these methods ethically or practically different from "forcibly changing their minds" or their values? And if you're OK with forcibly changing their minds, why do you think that's ethically better than just ignoring them and building a partial-CEV that only extrapolates your own wishes and those of people similar to yourself?
I (and CEV) do not propose changing their minds or their values. What happens is that their current values (as modeled within FAI) get corrected in the presence of truer knowledge and lots of intelligence, and these corrected values are used for guiding the FAI.
If someone's mind & values are so closed as to be unextrapolateable - completely incompatible with truth - then I'm ok with ignoring these particular persons. But I don't believe there are actually any such people.
So the future is built to optimize different values. And their original values aren't changed. Wouldn't they suffer living in such a future?
Even if they do, it will be the best possible thing for them, according to their own (extrapolated) values.
Who cares about their extrapolated values? Not them (they keep their original values). Not others (who have different actual and extrapolated values). Then why extrapolate their values at all? You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values.
Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!
This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!
I think the standard sort of response for this is The Hidden Complexity of Wishes. Just off the top of my (non-superintelligent) head, the AI could notice a method for near-perfect continuation of life by preserving some bacteria at the cost of all other life forms.
I did not mean the comment that literally. Dropped too many steps for brevity, thought they were clear, I apologize.
It would be just as impossible (or even more impossible) to convince people that total obliteration of people is a good thing. On the other hand, people don't care much about bacteria, even whole species of them, and as long as a few specimens remain in laboratories, people will be ok about the rest being obliterated.
Ah, the FAI problem in a nutshell.
There are lots of people who do think that's a good thing, and I don't think those people are trolling or particularly insane. There are entire communities where people have sterilized themselves as part of a mission to end humanity (for the sake of Nature, or whatever).
I think those people do have insufficient knowledge and intelligence. For example, the skoptsy sect, who believed they followed the God's will, were, presumably, factually wrong. And people who want to end humanity for the sake of Nature, want that instrumentally - because they believe that otherwise Nature will be destroyed. Assuming FAI is created, this belief is also probably wrong.
You're right in there being people who would place "all non-intelligent life" before "all people", if there was such a choice. But that does not mean they would choose "non-intelligent life" before "non-intelligent life + people".
Not that I'm a proponent of voluntary human extinction, but that's an awfully big conditional.
It's not even strictly true. It's entirely conceivable that FAI will lead to the Sol system being converted into a big block of computronium to run human brain simulations. Even if those simulations have trees and animals in them, I think that still counts as the destruction of nature.
But if FAI is based on CEV, then this will only happen if this is the extrapolated wish of everybody. Assuming existence of people truly (even after extrapolation) valuing Nature in its original form, such computroniums won't be forcefully built.
That depends a lot on what I understand Nature to be.
If Nature is something incompatible with artificial structuring, then as soon as a superhuman optimizing system structures my environment, Nature has been destroyed... no matter how many trees and flowers and so forth are left.
Personally, I think caring about Nature as something independent of "trees and flowers and so forth" is kind of goofy, but there do seem to be people who care about that sort of thing.
What if particular arrangements of flowers, trees and soforth are complex and interconnected, in ways that can be undone to the net detriment of said flowers, trees and soforth? Thinking here of attempts at scientifically "managing" forest resources in Germany with the goal of making them as accessible and productive as possible. The resulting tree farms were far less resistant to disease, climatic abberation, and so on, and generally not very healthy, because it turns out that illegible, sloppy factor that made forest seem less-conveniently organized for human uses was a non-negligible portion of what allowed them to be so productive and robust in the first place.
No individual tree or flower is all that important, but the arrangement is, and you can easily destroy it without necessarily destroying any particular tree or flower. I'm not sure what to call this, and it's definitely not independent of the trees and flowers and soforth, but it can be destroyed to the concrete and demonstrable detriment of what's left.
I think Marcello's objection dissolves when the subject becomes aware of the order-of-arguments effects. After all, those effects are part of the factual information that the subject considers in refining its values. Most people don't like to have values that change depending on the order in which arguments are presented, so they will reflect further until they each find a stable value set. At least, that would be my hypothesis.
This link
is broken.
"A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them."
This has always been my principal objection to CEV. I strongly suspect that were it implemented, it would want the death of a lot of my friends, and quite possibly me, too.
Why do we need a single CEV value system? A FAI can calculate as many value systems as it needs and keep incompatible humans separate. Group size is just another parameter to optimize. Religious fundamentalists can live in their own simulated universe, liberals in another.
What if space travel turns out to be impossible, and the superintelligence has to allocate the limited computational resources of the solar system?
Upvoting back to zero because I think this is an important question to address.
If I prefer that people not be tortured, and that's more important to me than anything else, then I ought not prefer a system that puts all the torturers in their own part of the world where I don't have to interact with them over a system that prevents them from torturing.
More generally, this strategy only works if there's nothing I prefer/antiprefer exist, but merely things that I prefer/antiprefer to be aware of.
It's a potential outcome, I suppose, in that
is a conceivable extrapolation from a starting point where you antiprefer something's existence (in the extreme, with MWI you may not have much say what does/doesn't exist, just how much of it in which branches).
It's also possible that you hold both preferences (prefer X not exist, prefer not to be aware of X) and the existence preference gets dropped for being incompatible with other values held by other people while the awareness preference does not.
The child molester cluster (where they grow child simply to molest them, then kill them) doesn't bother you, even if you never interact with it?
Because I'm fairly certain I wouldn't like what CEV(child molester) would output and wouldn't want an AI to implement it.
Assuming 100% isolation it would be indistinguishable from living in a universe where the Many Worlds Interpretation is true, but it still seems wrong. The FAI could consider avoiding groups whose even theoretical existence could cause offence, but I don't see any good way to assign weight to this optimization pressure.
Even so, I think splitting humanity into multiple groups is likely to be a better outcome than a single group. I don't consider the "failed utopia" described in http://lesswrong.com/lw/xu/failed_utopia_42/ to be particularly bad.
Well, not if "child-molesters" and "non-child-molestors" are competing for limited resources.
The failed utopia is better than our current world, certainly. But the genie isn't Friendly.
In principle, I could interact with the immoral cluster. AI's interference is not relevant to the morality of the situation because I was part of the creation of the AI. Otherwise, I would be morally justified in ignoring the suffering in some distant part of the world because it will have no practical impact on my life. By contrast, I simply cannot interact with other branches under the MWI - it's a baked in property of the universe that I never had any input into.
Wouldn't CEV need to extract consensus values under a Rawlsian "veil of ignorance"?
It strikes me as very unlikely that there would be a consensus (or even majority) vote for killing gays or denying full rights to women under such a veil, because of the significant probability of ending up gay, and the more than 50% probability of being a woman. Prisons would be a lot better as well. The only reason illiberal values persist is because those who hold them know (or are confident) that they're not personally going to be victims of them.
So CEV is either going to end up very liberal, or if done without the veil of ignorance, is not going to end up coherent at all. Sorry if that's politics, the mind-killer.
Most of those who propose illiberal values do not do so under the presumption that they thereby harm the affected groups. A paternalistic attitude is much more common, and is not automatically inconsistent with preferences beyond a Rawlsian veil of ignorance.
An Omelasian attitude also seems consistent, for that matter, though even less likely.
As a matter of empirical fact, I think this is wrong. Men in sexist societies are really glad they're not women (and even thank God they are not in some cases). They are likely to run in horror from the Rawlsian veil when they see the implications.
And anyway, isn't that paternalism itself inconsistent with Rawlsian ignorance? Who would voluntarily accept a more than 50% chance of being treated like a patronized child (and a second-class citizen) for life?
And how is killing gays in the slightest bit a paternalistic attitude?
I'd never heard of Omelas, or anything like it.. so I doubt this will be part of CEV. Again, who would voluntarily accept the risk of being such a scapegoat, if it were an avoidable risk? (If it is not avoidable for some reason, then that is a fact that CEV would have to take into account, as would the Rawlsian choosers).
Kill their bodies, save their souls.
Someone believing that this sort of paternalism is essential to gender and unable or unwilling to accept a society without it. Someone convinced that this was part of God's plan or otherwise metaphysically necessary. Someone not very fond of making independent decisions. I don't think any of these categories are strikingly rare.
That's about as specific as I'd like to get; anything more so would incur an unacceptable risk of political entanglements. In general, though, I think it's important to distinguish fears and hatreds arising against groups which happen to be on the wrong side of some social line (and therefore identity) from the processes that led to that line being drawn in the first place: it's possible, and IMO quite likely, for people to coherently support most traditional values concerning social dichotomies without coherently endorsing malice across them. This might not end up being stable, human psychology being what it is, but it doesn't seem internally inconsistent.
The way people's values intersect with the various consequences of their identities is quite complicated and I'm not sure I completely understand it, but I wouldn't describe either as a subset of the other.
(Incidentally, around 51% of human births are male; more living humans are female but that's because women live longer. This has absolutely no bearing on the argument, but it was bugging me.)
Thanks for the reply here, that was helpful.
What you've described here is a person who would put adherence to an ideological system (or set of values derived from that system) above their own probable welfare. They would reason to themselves : yes my own personal welfare would probably be higher in an egalitarian society (or the risk of low personal welfare would be lower); but stuff that, I'm going to implement my current value system anyway. Even if it comes back to shoot me in the foot.
I agree that's possible, but my impression is that very few humans would really want to do that. The tendency to put personal welfare first is enormous, and I really do believe that most of us would do that if choosing behind a Rawlsian veil.
What's odd is that it is a classical conservative insight that human beings are mostly self-interested, and rather risk-adverse, and that society needs to be constructed to take that into account. It's an insight I agree with by the way, and yet it is precisely this insight that leads to Rawlsian liberalism. Whereas to choose a different (conservative) value system, the choosers have to sacrifice their self-interest to that value system.
I've met women who honestly and persistently profess that women should not be allowed to vote. In at least one case, even in private, to a person they really want to like them and who very clearly disagrees with them.
That doesn't surprise me... I've had the same experience once or twice, in mixed company, and with strong feminists in the room. The subsequent conversations along the lines of "But women chained themselves to railings, and threw themselves under horses to get the vote; how can you betray them like that?" were quite amusing. Especially when followed by the retort "Well I've got a right to my own opinion just as much as anyone else - surely you respect that as a feminist!"
I've also met quite a few people who think that no-one should vote. ("If it did any good, it would have been abolished years ago" a position I have a lot more sympathy for these days than I ever used to).
My preferred society (in a Rawlsian setting) might not actually have much voting at all, except on key constitutional issues. State and national political offices (parliaments, presidents etc) would be filled at random (in an analogue to jury service) and for a limited time period. After the victims had passed a few laws and a budget, they would be allowed to go home again. No-one would give a damn about gaffes, going off message, or the odd sex scandal, because it would happen all the time, and have very limited impact. I think there would also need to be mandatory citizen service on boring committees, local government roles, planning permission and drainage enquiries etc to stop professional civil servants, lobbyists or wonks ruling the roost: the necessary tedium would be considered part of everyone's civic duty. This - in my opinion - is probably the biggest problem with politics. Much of it is so dull, or soul-destroying, that no-one with any sense wants to do it, so it is left to those without any sense.
Self-assessed welfare isn't cleanly separable from ideology. People aren't strict happiness maximizers; we value all sorts of abstract things, many of which are linked to the social systems and identity groups in which we're embedded. Sometimes this ends up looking pretty irrational from the outside view, but from the inside giving them up would look unattractive for more or less the same reason that wireheading is unattractive to (most of) us.
Now, this does drift over time, both on a sort of random walk and in response to environmental pressures, which is what allows things like sexual revolutions to happen. During phase changes in this space, the affected social dichotomies are valued primarily in terms of avoiding social costs; that's the usual time when they're a salient issue instead of just part of the cultural background, and so it's easy to imagine that that's always what drives them. But I don't think that's the case; I think there's a large region of value space where they really are treated as intrinsic to welfare, or as first-order consequences of intrinsic values.
Thanks again. I'm still not sure of the exact point you are making here, though.
Let's take gender-based discrimination and unequal rights as a sample case. Are you arguing that someone wedded to an existing gender-biased value system would deliberately select a discriminatory society (over an equal rights one) even if they were choosing on the basis of self-interest? That they would fully understand that they have roughly 50% chance of getting the raw end of the deal, but still think that this deal would maximise their welfare overall?
I get the point that a committed ideologue could consciously decide here against self-interest. I'm less clear how someone could decide that way while still thinking it was in their self-interest. The only way I can make sense of such a decision is if were made on the basis of faulty understanding (i.e. they really can't empathize very well, and think it would not be so bad after all to get born female in such a society).
In a separate post, I suggested a way that an AI could make the Rawlsian thought experiment real, by creating a simulated society to the deciders' specifications, and then beaming them into roles in the simulation at random (via virtual reality/total immersion/direct neural interface or whatever). One variant to correct for faulty understanding might be to do it on an experimental basis. Once the choosers think they have made their minds up, they get beamed into a few randomly-selected folks in the sim, maybe for a few days or weeks (or years) at a time. After the experience of living in their chosen world for a while, in different places, times, roles etc. they are then asked if they want to change their mind. The AI will repeat until there is a stable preference, and then beam in permanently.
Returning to the root of the thread, the original objection to CEV was that most people alive today believe in unequal rights for women and essentially no rights for gays. The key question is therefore whether most people would really choose such a world in the Rawlsian set-up. And then, would most people continue to so-choose even after living in that world for a while in different roles?
If the answers are "no" then the Rawlsian veil of ignorance can remove this particular objection to CEV. If they are "yes" then it cannot. Agreed?
A lot of oppression of women seems to be justified by claims that if women aren't second-class citizens, they won't choose to have children, or at least not enough children for replacement. This makes women's rights into an existential risk.
This argument also implies that societies and smaller groups where women have lower status and more children will out-breed and so eventually outcompete societies where women have equal rights. So people can also defend the lower status of women as a nationalistic or cultural self-defense impulse.
Yes and no. Someone who'd internalized a discriminatory value system -- who really believed in it, not just belief-in-belief, to use LW terminology -- would interpret their self-interest and therefore their welfare in terms of that value system. They would be conscious of of what we would view as unequal rights, but would see these as neutral or positive on both sides, not as one "getting the raw end of the deal" -- though they'd likely object to some of their operational consequences. This implies, of course, a certain essentialism, and only applies to certain forms of discrimination: recent top-down imposition of values isn't stable in this way.
As a toy example, read 1 Corinthians 11, and try to think of the mentality implied by taking that as the literal word of God -- not just advice from some vague authority, but an independent axiom of a value system backed by the most potent proofs imaginable. Applied to an egalitarian society, what would such a value system say about the (value-subjective) welfare of the women -- or for that matter the men -- in it?
This, on the other hand, is essentially an anthropology question. The answer depends on the extent of discriminatory traditional cultures, on the strength of belief in them, and on the commonalities between them: "unequal rights" isn't a value, it's a judgment call over a value system, and the specific unequal values that we object to may be quite different between cultures. I'm not an anthropologist, so I can't really answer that question -- but if I had to, I'd doubt that a reflectively stable consensus exists for egalitarianism or for any particular form of discrimination, with or without the Rawlsian wrinkle.
So this would be like the "separate but equal" argument? To paraphrase in a gender context: "Men and women are very different, and need to be treated differently under the law - both human and divine law. But it's not like the female side is really worse off because of this different treatment".
That - I think - would count as a rather basic factual misunderstanding of how discrimination really works. It ought to be correctable pretty damn fast by a trip into the simulator.
(Incidentally, I grew up in a fundamentalist church until my teens, and one of the things I remember clearly was the women and teen girls being very upset about being told that they had to shut up in church, or wear hats or long hair, or that they couldn't be elders, or whatever. They also really hated having St Paul and the Corinthians thrown at them; the ones who believed in Bible inerrancy were sure the original Greek said something different, and that the sacred text was being misinterpreted and spun against them. Since it is an absolute precondition for an inerrantist position that correct interpretations are difficult, and up for grabs, this was no more unreasonable than the version spouted by the all-male elders.)
The obvious (paternalistic) answer is that they believe that, conditioned on them being born female, their self-interest is improved by paternalistic treatment of all women vs equality.
In order to convince them otherwise, you would (at a minimum) have to run multiple world sims, not just multiple placements in one world. You would also have to forcibly give them sufficiently rational thought processes that they could interpret the evidence you forced upon them. I'm not sure that forcibly messing with people's thought processes is ethical, or that you could really claim it was a coherent extrapolation after you had performed that much involuntary mind surgery on them.
Disagree. A simple classroom lesson is often sufficient to get the point across:
http://www.uen.org/Lessonplan/preview.cgi?LPid=536
Discrimination REALLY sucks.
Just because some despised minorities exist today, doesn't mean they will continue to exist in the future under CEV. If a big enough majority clearly wishes that "no members of that group continue to exist" (e.g. kill existing gays AND no new ones ever to be born), then the CEV may implement that, and the veil of ignorance won't change this, because you can't be ignorant about being a minority member in a future where no-one is.
You might be right, but I'm less sure of this.
Someone with more historical or anthropological knowledge than I is welcome to correct me, but I'm given to understand that many of those whom we would consider victims of an oppressive social system, actually support the system. (E.g., while woman's suffrage seems obvious now, there were many female anti-suffragists at the time.) It's likely that such sentiments would be nullified by a "knew more, thought faster, &c." extrapolation, but I don't want to be too confident about the output of an algorithm that is as yet entirely hypothetical.
Furthermore, the veil of ignorance has its own problems: what does it mean for someone to have possibly been someone else? To illustrate the problem, consider an argument that might be made by (our standard counterexample) a hypothetical agent who wants only to maximize the number of paperclips in the universe:
---which does not seem convincing. Of course, humans in oppressed groups and humans in privileged groups are inexpressibly more similar to each other than humans are to paperclip-maximizers, but I still think this thought experiment highlights a methodological issue that proponents of a veil of ignorance would do well to address.
Isn't the main evidence that victims of oppressive social systems want to escape from them at every opportunity? There are reasons for refugees, and reasons that the flows are in consistent directions.
And if anti-suffragism had been truly popular, then having got the vote, women would have immediately voted to take it away again. Does this make sense?
Some other points:
CEV is about human values, and human choices, rather than paper-clippers. I doubt we'd get a CEV across wildly-different utility functions in the first place.
I'm happy to admit that CEV might not exist in the veil of ignorance case either, but it seems more likely to.
I'm getting a few down-votes here. Is the general consensus here that this is too close to politics, and that is a taboo subject (as it is a mind-killer)? Or is the "veil of ignorance" idea not an important part of CEV?
Isn't there substantial disagreement over whether the veil of ignorance is sufficient or necessary to justify a moral theory?
Edit: Or just read what Nornagest said
Perhaps, but I think my point stands. CEV will use a veil of ignorance, or it won't be coherent. It may be incoherent with the veil as well, but I doubt it. Real human beings look after number one much more than they'd ever care to admit, and won't take stupid risks when choosing under the veil.
One very intriguing thought about an AI is that it could make the Rawlsian choice a real one. Create a simulated society to the choosers' preferences, and then beam them in at random...
Even with a veil of ignorance, people won't make the same choices-- people fall in different places on the risk aversion/reward-seeking spectrum.
Note that there's nothing physically impossible about altering the probability of being born gay, straight, bi, male, female, asexual, etc.
True, and this could create some interesting choices for Rawlsians with very conservative values. Would they create a world with no gays, or no women? Would they do both???
Heinlein's "Starship Troopers" discusses the death penalty imposed on a violent child rapist/murder. The narrator says there are two possibilities:
1) The killer was so deranged he didn't know right from wrong. In that case, killing (or imprisoning him) is the only safe solution for the rest. Or,
2) The killer knew right from wrong, but couldn't stop himself. Wouldn't killing (or stopping) him be a favor, something he would want?
Why can't that type of reasoning exist behind the veil of ignorance? Doesn't it completely justify certain kinds of oppression? That said, there's also an empirical question whether the argument applies to the particular group being oppressed.
As long as we're using sci-fi to inform our thinking on criminality and corrections, The Demolished Man is an interesting read.
Not dealing with your point, but that sort of analysis is why I find Heinlein so distasteful - the awful philosophy. For example in #1, 5 seconds of thought suffices to think of counterexamples like temporary derangements (drug use, treatable disease, particularly stressful circumstances, blows to the head), and more effort likely would turn up powerful empirical evidence like possibly an observation that most murderers do not murder again even after release (and obviously not execution).
Absolutely. What finally made me realize that Heinlein was not the bestest moral philosopher ever was noticing that all his books contained superheros - Stranger in a Strange Land is the best example. I'm not talking about the telekinetic powers, but the mental discipline. His moral theory might work for human-like creatures with perfect mental discipline, but for ordinary humans . . . not so much.
This was pretty common in sf of the early 20th century, actually — the trope of a special group of people with unusual mental disciplines giving them super powers and special moral status. See A. E. van Vogt (the Null-A books) or Doc Smith (the Lensman books) for other examples. There's a reason Dianetics had so much success in the sf community of that era, I suspect — fans were primed for it.
Is that true of all of Heinlein's books? I would say that most of them (including Starship Troopers) don't have superheroes.
Well, I'm not exactly a Heinlein scholar, but I'd say it shows up mainly in his late-period work, post Stranger in a Strange Land. Time Enough for Love and its sequels definitely qualify, but some of the stuff he's most famous for -- The Moon is a Harsh Mistress, Have Space Suit, Will Travel, et cetera -- don't seem to. Unfortunately, Heinlein's reputation is based mainly on that later stuff.
The revolution in "Moon is a Harsh Mistress" cannot succeed without the aid of the supercomputer. That makes any moral philosophy implicit in that revolution questionable to the extent one asserts that the moral philosophy is true of humanity now.
To a lesser extend, "Starship Troopers" asserts that military service is a reliable way of screening for the kinds of moral qualities (like mental discipline) that make one trustworthy enough to be a high government official (or even to vote, if I recall correctly). In reality, those moral qualities are very thin on the ground in the real world, being much less common than suggested by the book. If the appropriate moral qualities were really that frequent, the sanity line would already be much high than it is.
What would a Rawlsian decider do? Institute a prison and psychiatric system, and some method of deciding between case 1 (psychiatric imprisonment to try and treat or at least prevent further harm) and case 2 (criminal imprisonment to deter like-minded people and prevent further harm from the killer/rapist). Also set up institutions for detecting and encouraging early treatment of child sex offenders before they moved to murder.
They would not want the death penalty in either case, nor would they want the prison/psychiatric system to be so appalling that they might prefer to be dead.
The Rawlsian would need to weigh the risk of being the raped/murdered child (or their parent) against the risk of being born with psychopathic or paedophile tendencies. If there was genuinely a significant deterrent from the death penalty, then the Rawlsian might accept it. But that looks unlikely in such cases.
I don't know how to reply to this without violating the site's proscription on discussions of politics, which I prefer not to do.
OK - the comment was pretty flippant anyway. Consider it withdrawn.
That's a little unfair to the concept of CEV. If irreconcilable value conflicts persist after coherent extrapolation, I would think that a CEV function would output nothing, rather than using majoritarian analysis to resolve the conflict.
Then since there is not one single value about which every single human being on the planet can agree, a CEV function would output nothing at all.
Tense confusion.
CEV is supposed to preserve those things that people value, and would continue to value were they more intelligent and better informed. I value the lives of my friends. Many other people value the death of people like my friends. There is no reason to think that this is because they are less intelligent or less well-informed than me, as opposed to actually having different preferences. TimS claimed that in a situation like that, CEV would do nothing, rather than impose the extrapolated will of the majority.
My claim is that there is nothing -- not one single thing -- which would be a value held by every person in the world, even were they more intelligent and better informed. An intelligent, informed psychopath has utterly different values from mine, and will continue to have utterly different values upon reflection. The CEV therefore either has to impose the majority preferences upon the minority, or do nothing at all.
I agree with you in general, and want to further point out that there is no such thing as "doing nothing". If doing nothing tends to allow your friends to continue living (because they have the power to defend themselves in the status quo), that is favoring their values. If doing nothing tends to allow your friends to be killed (because they are a powerless, persecuted minority in the status quo) that is favoring the other people's values.
There are lots of reasons to think so. For example, they might want the death of your friends because they mistakenly believe that a deity exists.
Or for any number of other, non-religious reasons. And it could well be that extrapolating those people's preferences would lead, not to them rejecting their beliefs, but to them wishing to bring their god into existence.
Either people have fundamentally different, irreconcilable, values or they don't. If they do, then the argument I made is valid. If they don't, then CEV(any random person) will give exactly the same result as CEV(humanity).
That means that either calculating CEV(humanity) is an unnecessary inefficiency, or CEV(humanity) will do nothing at all, or CEV(humanity) would lead to a world that is intolerable for at least some minority of people. I actually doubt that any of the people from the SI would disagree with that (remember the torture vs flyspecks argument).
That may be considered a reasonable tradeoff by the developers of an "F"AI, but it gives those minority groups to whom the post-AI world would be inimical equally rational reasons to oppose such a development.
As someone who does not believe in moral realism, I agree that CEV over all humans who ever lived (excluding sociopaths and such) will not output anything.
But I think that a moral realist should believe that CEV will output some value system, and that the produced value system will be right.
In short, I think one's belief about whether CEV will output something is isomorphic on whether one believes in [moral realism] (plato.stanford.edu/entries/moral-realism/).
Edit: link didn't work, so separated it out.
Have you tried putting
<http://> in front of the URL?(Edit: the backtick thing to show verbatim code isn't working properly for some reason, but you know what I mean.)
moral realism.
Edit: Apparently that was the problem. Thanks.
Edit2: It appears that copying and pasting from some places includes "http" even when my browser address doesn't. But I did something wrong when copying from the philosophy dictionary.
I agree -- assuming that CEV didn't impose a majority view on a minority. My understanding of the SI's arguments (and it's only my understanding) is that they believe it will impose a majority view on a minority, but that they think that would be the right thing to do -- that if the choice were beween 3^^^3 people getting a dustspeck in the eye or one person getting tortured for fifty years, the FAI would always make a choice, and that choice would be for the torture rather than the dustspecks.
Now, this may well be, overall, the rational choice to make as far as humanity as a whole goes, but it would most definitely not be the rational choice for the person who was getting tortured to support it.
And since, as far as I can see, most people only value a very small subset of humanity who identify as belonging to the same groups as them, I strongly suspect that in the utilitarian calculations of a "friendly" AI programmed with CEV, they would end up in the getting-tortured group, rather than the avoiding-dustspecks one.
This is not clear.
That is an entirely separate issue. If CEV(everyone) outputted a moral theory that held utility was additive, then the AI implementing it would choose torture over specks. In other words, utilitarians are committed to believing that specks is the wrong choice.
But there is no guarantee that CEV will output a utilitarian theory, even if you believe it will output something. SI (Eliezer, at least) believes CEV will output a utilitarian theory because SI believes utilitarian theories are right. But everyone agrees that "whether CEV will output something" is a different issue than "what CEV will output."
Personally, I suspect CEV(everyone in the United States) would output something deotological - and might even output something that would pick specks. Again, assuming it outputs anything.
This is a false dilemma. If people have some values that are the same or reconcilable, then you will get different output from CEV(any random person) and CEV(humanity).
And note that an actual move by virtue ethicists is to exclude sociopaths from "humanity".
Of course, a lot depends on what we're willing to consider a minority as opposed to something outside the set of things being considered at all.
E.g., I'm in a discussion elsethread with someone who I think would argue that if we ran CEV on the set of things capable of moral judgments, it would not include psychopaths in the first place, because psychopaths are incapable of moral judgments.
I disagree with this on several levels, but my point is simply that there's an implicit assumption in your argument that terms like "person" have shared referents in this context, and I'm not sure they do.
In which case we wouldn't be talking about CEV(humanity) but CEV(that subset of humanity which already share our values), where "our values" in this case includes excluding a load of people from humanity before you start. Psychopaths may or may not be capable of moral judgements, but they certainly have preferences, and would certainly find living in a world where all their preferences are discounted as intolerable as the rest of us would find living in a world where only their preferences counted.
I agree that psychopaths have preferences, and would find living in a world that anti-implemented their preferences intolerable.
If you mean to suggest that the fact that the former phrase gets used in place of the latter is compelling evidence that we all agree about who to include, I disagree.
If you mean to suggest that it would be more accurate to use the latter phrase when that's what we mean, I agree.
Ditto "CEV(that set of preference-havers which value X, Y, and Z)".
I definitely meant the second interpretation of that phrase.
I hope that everyone who discusses CEV understands that a very hard part of building a CEV function would be defining the criteria for inclusion in the subset of people whose values are considered. It's almost circular, because figuring out who to exclude as "insufficiently moral" almost inherently requires the output of a CEV-like function to process.
'Coherent' in CEV means that it makes up a coherent value system for all of humanity. By definition that means that there will be no value conflicts in CEV. But it does not mean that you will necessarily like it.
Regarding CEV: My own worry is that lots of parts of human value get washed out as "incoherent" - whatever X is, if it isn't a basic human biological drive, there are enough people out there that have different opinions on it to make CEV throw up its hands, declare it an "incoherent" desire, and proceed to leave it unsatisfied. As a result, CEV ends up saying that the best we can do is just make everyone a wirehead because pleasure is one of our few universal coherent desires while things like "self-determination" and "actual achievement in the real world" are a real mess to provide and barely make sense in the first place. Or something like that.
(Universal wireheading - with robots taking care of human bodies - at least serves as a lower bound on any proposed utopia; people, in general, really do want pleasure, even if they also want other things. See also "Reedspace's Lower Bound".)
I would like to see more discussion on the question of how we should distinguish between 1) things we value even at the expense of pleasure, and 2) things we mistakenly alieve are more pleasurable than pleasure.
Surely if there is something I will give up pleasure for, which I do not experience as pleasurable, that's strong evidence that it is an example of 1 and not 2?
Yes, but there are other cases. If you prefer eating a cookie to having the pleasure centers in your brain maximally stimulated, are you sure that's not because eating a cookie sounds on some level like it would be more pleasurable?
I'm not sure how I could ever be sure of such a thing, but it certainly seems implausible to me.
Um, if you would object to your friends being killed (even if you knew more, thought faster, and grew up further with others), then it wouldn't be coherent to value killing them.
Just because I wouldn't value that, doesn't mean that the majority of the world wouldn't. Which is my whole point.
My understanding is that CEV is based on consensus, in which case the majority is meaningless.
There is absolutely no reason to think that the values of all humans, extrapolated in some way, will arrive at a consensus.
Some quotes from the CEV document:
Though it's not clear to me how the document would deal with Wei Dai's point in the sibling comment. In the absence of coherence on the question of whether to protect, persecute, or ignore impopular minority groups, does CEV default to protecting them or ignoring them? You might say that as written, it would obviously not protect them, because there was no coherence in favor of doing so; but what if protection of minority groups is a side effect of other measures CEV was taking anyway?
(For what it's worth, I suspect that extrapolation would in fact create enough coherence for this particular scenario not to be a problem.)
Thank you. So, not quite consensus but similarly biased in favor if inaction.
If CEV doesn't positively value some minority group not being killed (i.e., if it's just indifferent due to not having a consensus), then the majority would be free to try to kill that group. So we really do need CEV to saying something about this, instead of nothing.
Assuming we have no other checks on behavior, yes. I'm not sure, pending more reflection, whether that's a fair assumption or not...
Thanks for tying these together.
I would love to hear someone who believes in the in-principle viability of performing a bottom-up extrapolation of human values into a coherent whole that can be implemented by a system vastly different from a human in a way I ought to endorse make a case for that viability that addresses these concerns specifically; while I don't fully agree with everything said here, it captures much of my own skepticism about that viability much more coherently than I've been able to express it myself .
David Friedman pointed out that this isn't correct, it's actually it's quite easy to make positional values mutually satisfiable:
[Emphasis mine]
A FAI could simply make sure that everyone is a member of enough social groups that everyone has high status in some of them. Positional goals can be mutually satisficed, if one is smart enough about it. Those two types of value don't differ as much as you seem to think they do. Positional goals just require a little more work to make implementing them conflict-free than the other type does.
I don't think I agree with this. Couldn't you take that argument further and claim that if I undergo some sort of rigorous self-improvement program in order to better achieve my goals in life, that that must mean I now have different values? In fact, you could easily say that I am behaving pointlessly because I'm not achieving my values better, I'm just changing them? It seems likely that most of the things that you are describing as values aren't really values, they're behaviors. I'd regard values as more "the direction in which you want to steer the world," both in terms of your external environment and your emotional states. Behaviors are things you do, but they aren't necessarily what you really prefer.
I agree that a more precise and articulate definition of these terms might be needed to create a FAI, especially if human preferences are part of a network of some sort as you claim, but I do think that they cleave reality at the joints.
I can't really see how you can attack CEV by this route without also attacking any attempt at self-improvement by a person.
The fact that these values seem to change or weaken as people become wealthier and better educated indicates that they probably are poorly extrapolated values. Most of these people don't really want to do these things, they just think they do because they lack the cognitive ability to see it. This is emphasized by the fact that these people, when called out on their behavior, often make up some consequentialist justification for it (if I don't do it God will send an earthquake!)
I'll use an example from my own personal experience to illustrate this, when I was little (around 2-5) I thought horror movies were evil because they scared me. I didn't want to watch horror movies or even be in the same room with a horror movie poster. I thought people should be punished for making such scary things. Then I got older and learned about freedom of speech and realized that I had no right to arrest people just because they scare me.
Then I got even older and started reading movie reviews. I became a film connoisseur and became sick of hearing about incredible classic horror movies, but not being able to watch them because they scared me. I forced myself to sit through Halloween, A Nightmare on Elm Street, and The Grudge, and soon I was able to enjoy horror movies like a normal person.
Not watching horror movies and punishing the people who made them were the preferences of young me. But my CEV turned out to be "Watch horror movies and reward the people who create them." I don't think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I'd had the guts to sit down and watch them. The younger me didn't have different terminal values, his values were just poorly extrapolated.
I think most of the types of people you mention would be the same if they could pierce through their cloud of self-deception. I think their values are wrong and that they themselves would recognize this if they weren't irrational. I think a CEV would extrapolate this.
But even if I'm wrong, if there's a Least Convenient Possible world where there are otherwise normal humans who have "kill all gays" irreversibly and directly programmed into their utility function, I don't think a CEV of human morality would take that into account. I tend to think that, from an ethical standpoint, malicious preferences (that is, preferences where frustrating someone else's desires is an end in itself, rather than a byproduct of competing for limited resources) deserve zero respect. I think that if a CEV took properly extrapolated human ethics it would realize this. It might not hurt to be extra careful about that when programming a CEV, however.
I had a somewhat similar experience growing up, although a few details are different (I never thought people should be banned from making such films or that they were evil things just because they scared me, for instance, and I made the decision to try watching some of them, mostly Alien and a few other works from the same general milieu, at a much younger age and for substantially different reasons). However, I didn't wind up loving horror movies; I wound up liking one or two films that only pushed my buttons in nice, predictable places and without actually squicking me per se. I honestly still don't get how someone can sit through films like Halloween or Friday the 13th -- I mean, I get the narrative underpinnings and some of the psychological buttons they push very well (reminds me of ghost tales and other things from my youth), but I can't actually feel the same way as your putative "normal person" when sitting through it. Even movies most people consider "very tame" or "not actually scary" make me too uncomfortable to want to sit through them, a good portion of the time. And I've actively tried to cultivate this, not for its own sake (I could go my whole life never sitting through such a film again and not be deprived, even one of the ones I've enjoyed many times) but because of the small but notable handful of horror-themed movies that I do like and the number of people I know who enjoy such films with whom I'd have even more social-yay if I did self-modify to enjoy those movies. It simply didn't take -- after much exposure and effort, I now find most such films both squicky and actively uninteresting. I can see why other people like 'em, but I can't relate.
Are my terminal values "insufficiently extrapolated?" Or just not coherent with yours?
I don't think it's either. We both have the general value, "experience interesting stories," it's just expressed in slightly different ways. I don't think that really really specific preferences for art consumption would be something that CEV extrapolates. I think CEV is meant to figure out what general things humans value, not really specific things (i.e. a CEV might say, "you want to experience fun adventure stories," it would not say "read Green Lantern #26" or "read King Solomon's Mines"). The impression I get is that CEV is more about general things like "How should we treat others?" and "How much effort should we devote to liking activities vs. approving ones?"
I don't think our values are incoherent, you don't want to stop me from watching horror movies and I don't want to make you watch them. In fact, I think a CEV would probably say "It's good to have many people who like different activities because that makes life more interesting and fun." Some questions (like "Is it okay to torture people") likely only have one true, or very few true, CEVs, but others, like matters of personal taste, probably vary from person to person. I think a FAI would probably order everyone not to torture toddlers, but I doubt it would order us all to watch "Animal House" at 9:00pm this coming Friday.
I'm glad you pointed this out - I don't think this view is common enough around here.
I'm not sure what to make of your use of the word "proper". Are you predicting that a CEV will not be utilitarian or saying that you don't want it to be?
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call "malicious preferences." That is, if someone valued frustrating someone else's desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it.
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan's debate with Robin Hanson, where Bryan mentioned:
I don't often agree with Bryan's intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn't seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored.
I don't think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that "malicious preferences" have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
In response to your question I have edited my post and changed "a proper CEV" to "a CEV of human morality."
Zero is a strange number to have specified there, but then I don't know the shape of the function you're describing. I would have expected a non-specific "negative utility" in its place.
You're probably right, I was typing fairly quickly last night.
Ah, okay. This sounds somewhat like Nozick's "utilitarianism with side-constraints". This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.
I have a few more objections I didn't cover in my last comment because I hadn't thoroughly thought them out yet.
No, these terminal goals can also involve other people and the state of the world, even if they are evolved. There are several reasons human consciousnesses might have evolved goals that do not involve themselves or their genes. The most obvious one is that an entity that only values itself and its genes only is far less trustworthy than one that values other people as ends in themselves, and hence would have difficult getting other entities to engage in positive sum games with it. Evolving to value other people makes it possible for other people who might prove useful trading partners to trust the agent in question, since they know it won't betray them the instant they have outlived their usefulness.
Another obvious one is kin selection. Evolution metaphorically "wants" us to value our relatives since they share some of our genes. But rather than waste energy developing some complex adaptation to determine how many genes you share with someone, it took a simpler route and just made us value people we grew up around.
And no, the fact that I know my altruism and love for others evolved for game theoretic reasons does not make it any less wonderful and any less morally right.
Again, it is quite possible for a conscious agent to value things other than itself, but not value the goals of evolution or its genes. There are many errors that our bodies make that occur because they involve our genes, not our real goals. But valuing other people and the future is not one of them, it is an intrinsic part of the makeup of the conscious agent part.
Alan Carter, who is rapidly becoming my favorite living philosopher, explains here how it is quite possible to have a pluralistic metaethics without being incoherent. His main argument is that as long as you hold values to be incremental rather than absolute, it is possible to trade one off against the other without being incoherent.
The link to your group selection update seems broken. Looks like it's got an extra lesswrong.com/ in it.
Thanks; fixed.
It seems that what you have argued here is not much related to Holden's objection 1 - his objection is that we cannot reasonably expect a safe and secure implementation of a "Friendly" utility function (even if we had one), because humans have consistently been unable to construct bug-free working-correctly (computer) systems on the first try, proofs have been wrong, etc. You, on the other hand, are arguing against the Friendliness concept on object-level / meta-level ethical grounds.
Better by which set of, ahem, values? And anyway, if evolution of values is a value, then maximising overall value will by construction take that into account.
Yes, I object less to CEV if you go one or two levels meta. But if evolution of values is your core value, you find that it's pretty hard to do better than to just not interfere except to keep the ecosystem from collapsing. See John Holland's book and its theorems showing that an evolutionary algorithm as described does optimal search.
CEV goes infinite levels meta, that's what the "extrapolated" part means.
Countably infinite levels or uncountably infinite levels? ;)
Countably I think, since computing power is presumably finite so the infinity argument relies on the series being convergent.
No, that isn't what the "extrapolated" part means. The "extrapolated" part means closure and consistency over inference. This says nothing at all about the level of abstraction used for setting goals.
The whole point of CEV is that it goes as many levels meta as necessary! And the other whole point of CEV is that it is better at coming up with strategies than you are.
Please explain either one of your claims. For the first, show me where something Eliezer has written indicates CEV has some notion of how meta it is going, or how meta it "should" go, or anything at all relating to your claim. The second appears to merely be a claim that CEV is effective, so its use in any argument can only be presuming your conclusion.
My emphasis. Or to paraphrase, "as meta as we require."
Writing "I define my algorithm for problem X to be that algorithm which solves problem X" is unhelpful. Quoting said definition, doubly so.
In any case, the passage you quote says nothing about how meta to go. There's nothing meta in that entire passage.
Isn't this exactly what we wish FAI to do - interfere the least while keeping everything alive?
Almost certainly not. We'd have massive overpopulation in no time. I remember someone did this analysis, I think it was insects that cover the Earth in days.
Presumably, values will evolve differently depending on future contingencies. For example, a future with a world government that imposes universal birth control to limit population growth would probably evolve different values compared to a future that has no such global Singleton. Do you agree, and if so do you think the values evolved in different possible futures are all equivalent as far as you are concerned? If not, what criteria are you using to judge between them?
ETA: Can you explain John Holland's theorems, or at least link to the book you're talking about (Wikipedia says he wrote three). If you think allowing values to evolve is the right thing to do, I'm surprised you haven't put more effort into making a case for it, as opposed to just criticizing SI's plan.
Probably
Adaptation in Natural and Artificial Systems. Here's Holland's most famous theorem in the area. It doesn't suggest genetic algorithms make for some kind of optimal search - indeed, classical genetic algorithms are a pretty stupid sort of search.That is the book. I"m referring to the entire contents of chapters 5-7. The schema theorem is used in chapter 7, but it's only part of the entire argument, which does show that genetic algorithms approach optimal distribution of trials among the different possibilities, for a specific definition of optimal, which is not easy to parse out of Holland's book, due to his failure to give an overview or decent summary of what he is doing. It doesn't say anything about other forms of search that proceed other than by taking a big set of possible answers, which give stochastic results when tested, and allocating trials among them.
CEV is not any old set of evolved values. It is the optimal set of evolved values; the set you get when everything goes exactly right. Of your two proposed futures, one of them is a better approximation to this than the other; I just can't say which one, at this time, because of lack of computational power. That's what we want a FAI for. :)
Instead of pushing Phil to accept the entirety of your position at once, it seems better to introduce some doubt first: Is it really very hard to do better than to just not interfere? If I have other values besides evolution, should I give them up so quickly?
Also, if Phil has already thought a lot about these questions and thinks he is justified in being pretty certain about his answers, then I'd be genuinely curious what his reasons are.
I misread the nesting, and responded as though your comment were a critique of CEV, rather than Phil's objection to CEV. So I talked a bit past you.
But you're evading Wei_Dai's question here.
What criteria does the CEV-calculator use to choose among those options? I agree that significant computational power is also required, but it's not sufficient.
If we were able to formally specify the algorithm by which a CEV calculator should extrapolate our values, we would already have solved the Friendliness problem; your query is FAI-complete. But informally, we can say that the CEV evaluates by whatever values it has at a given step in its algorithm, and that the initial values are the ones held by the programmers.
The problem with this kind of reasoning (as the OP makes plain) is that there's no good reason to think such CEV maximization is even logically possible. Not only do we not have a solution, we don't have a well-defined problem.
(nods) Fair enough. I don't especially endorse that, but at least it's cogent.
Do you think an AI reasoning about ethics would be capable of coming to your conclusions? And what "superintelligence policy" do you think it would recommend?
I'm pretty sure that FAI+CEV is supposed to prevent exactly this scenario, in which an AI is allowed to come to its own, non-foreordained conclusions
FAI is supposed to come to whatever conclusions we would like it to come to (if we knew better etc.). It's not supposed to specify the whole of human value ahead of time, it's supposed to ensure that the FAI extrapolates the right stuff.
Well, most of them do so in part out of their deity telling them that that's a value. If the extrapolated CEV takes into account that they are just wrong about there being such a deity, it should respond accordingly. (I'm working under the what should not be controversial assumption that the AGI isn't going to find out that in fact there is such a deity hanging around.)
There's a chicken and egg issue here. Were pre-existing anti-homosexuality values co-opted into early Judaism? Or did the Judeo-Chiristian ideology spread the values beyond their "natural" spread? The only empirical evidence for this question I can think of is non-Judeo-Christian attitudes. What are the historical attitudes towards homosexuality among East Asians and South Asians?
More broadly, people's attitudes towards women and nerds are just as much expressions of values, not long-ranged utilitarian calculations.
Man, that's variable. Especially in South Asia, where "Hinduism" is more like a nice box for outsiders to describe a huge body of different practices and theoretical approaches, some of them quite divergent. Chastity in general was and is a core value in many cases; where that's not the case, or where the particular sect deals pragmatically with the human sex drive despite teaching chastity as a quicker path to moksha, there might be anything from embrace of erotic imagery and sexual diversity to fairly strict rules about that sort of conduct. Some sects unabashedly embrace sexuality as a good thing, including same-sex sexuality. Islam has historically been pretty doctrinally down on it, but even that has its nuances -- sodomy was often considered a grave sin and still is in many places, while non-penetrative same-sex contact might well be seen as simply a minor thing, not strictly appropriate but hardly anything to get worked up about.
"East Asia" has a very large number of religions as well, and the influence of Confucianism and Buddhism hasn't been uniform in this regard. One vague generality that I might suggest as a rough guideline is that traditionally, homosexuality is sort of tolerated in the closet -- sure, it happens, but as long as everyone keeps up appearances and doesn't make a scene or get caught doing something inappropriate, it's no big deal. Some strains within Mahayana Buddhism have a degree of deprecation of sexual or gender-variant behavior; others don't. Theravada varies as well, but in different ways.
In both cases, cultures vary tremendously. If you widen the scope, many cultures, including many of the foregoing, have traditionally been a lot more accepting of sex and gender variance. There are and were some cultures that were extremely permissive about it.
If you want more on the subject of how people think about sexuality, try Straight by Hanne Blank. She tracks the invention of heterosexuality (a concept which she says is less than a century old) in the west.
If part of CEV is finding out how much of what we think is obviously true is just stuff that people made up, life could get very strange.
Hell, you don't need CEV for that. A decent anthropology textbook will get you quite a distance there (even if only superficially)...
Can you recommend a book / author? (Interested outsider, no idea what the good stuff is, have read Jared Diamond and similar works.)
The Reindeer People by Piers Vitebsky is a favorite of mine, wich focuses on the Eveny people of Siberia. The Shaman's Coat: A Native History of Siberia, by Anna Read, is a good overview of SIberian peoples. Marshall Sahlins' entire corpus is pretty good, although his style puts some lay readers off. Argonauts of the Western Pacific by Branislaw Malinowski deals with Melanesian trade and business ventures. It's rather old at this point, but Malinowski had a fair influence on the development of anthropology thereafter. Wisdom Sits in Places by Keith Basso, which deals with an Apache group. The Nuer by EE Evans Pritchard is older, and very dry, but widely regarded as a classic in the field. It deals with the Nuer people of Sudan. The Spirit Catches You And You Fall Down by Ann Fadiman is not strictly an ethnography, but it's very relevant to anthropological mindsets and is often required reading in first-year courses in the field. Liquidated: An Ethnography of Wall Street by Karen Ho, is pretty much what it says in the title, and a bit more contemporary. Debt: The First 5000 Years by David Graeber mixes in history and economics, but it's generally relevant. Pathologies of Power by Paul Farmer focuses on the poor in Haiti. Friction: An Ethnography of Global Connection by Ana Tsing is kind of complicated to explain. Short version: it takes a look at events in Indonesia and traces out actors, groups, their motivating factors, and so on.
I wonder whether people who've studied anthropology find that it's affected their choices.
It certainly did mine.
I'm interested in any details you'd like to share.
Me too.
It made me a lot more comfortable dealing with people who might be seen as "regressive", "bland", "conservative" or just who seem otherwise not very in-synch with my own social attitudes and values. Getting to understand that culture and culturally-transmitted worldviews do constitute umbrella groups, but that people vary within them to similar degrees across such umbrellas, made it easier to just deal with people and adapt my own social responses to the situation, and where I feel like the person has incorrect, problematic or misguided ideas, it made it easier to choose my responses and present them effectively.
It made me more socially-conscious and a bit more socially-successful. I have some considerable obstacles there, but just having cultural details available was huge in informing my understanding of certain interactions. When I taught ESL, many of my students were Somali and Muslim. I'm also trans, and gender is a very big thing in many Islam-influenced societies (particularly ones where men and women for the most part don't socialize). I learned a bit about fashion sense and making smart choices just by noticing how the men reacted to what I wore, particularly on hot days. I learned a lot about gender-marked social behavior and signifiers from my interactions with the older women in the class and the degree to which they accepted me (which I could gauge readily by their willingness to engage in casual touch, say to get my attention or when thanking me, or the occasional hug from some of my students).
It made me a far better worldbuilder than I was before, because I have some sense of just how variable human cultures really are, and how easy it is to construct a superficially-plausible theory of human cultures, history or behavior while missing out on the incredible variance that actually exists.
It made me far less interested in evolutionary psychology as an explanation for surface-level behaviors, let alone broad social patterns of behavior, because all too often cited examples turn out to be culturally-contingent. I think the average person in Western society has a very confused idea of just how different other cultures can be.
It made me skeptical of CEV as a thing that will return an output. I'm not sure human volition can be meaningfully extrapolated, and even if it can, I'm far from persuaded that the bits of it that cohere add up to anything you'd base FAI on.
It convinced me that the sort of attitudes I see expressed on LW towards "tradition" and traditional culture (especially where that experiences conflict with global capitalism) are so hopelessly confused about the thing they're trying to address that they essentially don't have anything meaningful to say about it, or at best only cover a small subset of the cases that they're applied to. It didn't make me a purist or instill some sort of half-baked Prime Directive or anything; cultures change and they'll do that no matter what.
It helped me grasp my own cultural background and influences better. It gave me some insight into the ways in which that can lock in your perceptions and decisions, and how hard that is to change that, and how easy it is to confuse that with something "innate" (and how easy it is to confuse "innate" with "genetic"). It helped me grasp how I could substitute or reprogram bits of that, and with a bit of time and practice it helped me understand the limitations on that.
There's...probably a whole ton more, but I'm running out of focus right now.
EDIT: Oh! It made me hugely more competent at navigating, interpreting and understanding art, especially from other cultures. Literary modes, aesthetics, music and styles; also narrative and its uses.
(I think this could make an interesting and valuable top-level post.)
Fascinating, but... my Be Specific detector is going off and asking, not just for the abstract generalizations you concluded, but the specific examples that made you conclude them. Filling in at least one case of "I thought I should dress like X, but then Y happened, now I dress like Z", even - my detector is going off because all the paragraphs are describing the abstract conclusions.
The word is likely that recent, but is she claiming that the idea of being interested in members of the other sex but not in members of the same sex as sexual partners was unheard-of before that? Or what does she mean exactly?
It's a somewhat complex book, but part of her meaning is that the idea that there are people who are only sexually interested in members of the other sex, and that this is an important category, is recent.
How could such a thesis be viable, when so much of the historical data has been lost?
There's more historical data than you might think-- for example, the way the Catholic Church defined sexual sin in terms of actions rather certain sins being associated with types of people who were especially tempted to engage in them.
There's also some history of how sexual normality became more and more narrowly defined (Freud has a lot to answer for), and then the definitions shifted.
A good bit of the book is available for free at amazon, and I think that would be the best way for you to see whether Blank's approach is reasonable.
The introduction is a catalog of ambiguities about sex, gender, and sexual orientation:
All of these are fair enough, and I've only read the introduction, but I don't have a lot of confidence that she goes on to resolve these contradictions in Less Wrong tree-falls-in-a-forest style. Instead of trying to clarify what people mean when they something like "most people are heterosexual," I get the feeling she only wants to muddy the waters enough to say "no they aren't."
I think her point is closer to "people make things up, and keep repeating those things until they seem like laws of the universe".
A possible conclusion is that once people make a theory about how something ought to be, it's very hard to go back to the state of mind of not having an opinion about that thing.
The amazon preview includes the last couple of chapters of the book.
The book could be viewed as a large expansion of two Heinlein quotes: "Everybody lies about sex" and "Freedom begins when you tell Mrs. Grundy to fly a kite".
Oh, so her thesis is that in the west, orientation-as-identity dates back to 1860-ish. I can imagine that being defensible. That's way different from what you originally wrote, though.
You see, the first thing that came to mind was Aristophanes' speech in the Symposium, which explicitly recognizes orientation-as-identity and predates the Catholic Church by a couple centuries.
Thanks for the cite.
Like most of Leviticus, the edicts against homosexuality were an attempt to belatedly change 'have no gods before me' into 'don't have any other gods, period' by banning all of the specific religious practices of the competing local religions, which involved things like, say, eating shellfish, wearing sacred garb composed of mixed fibers, etc.
So maybe some of them were homophobes, but it's not necessary; and if they'd all been homophobes there wouldn't have been a need to establish the rule.
That's a good point. It fairly strongly suggests that Judeo-Christian anti-homosexuality values would not survive coherent extrapolation because it provides an explanation for why the value was included originally. As JoshuaZ stated, I don't expect religious values whose sole function was religious in-group-ism to persist after a CEV process.
Well, if Christian anti-homosexuality was just a religious in-group-ism, they wouldn't be outraged by non-Christians having sex with members of the other sex any more than by (say) non-Christians eating meat on Lent Fridays. Are they?
I don't know the history in East Asia, but closer to where the Abrahamic religions arose one had the ancient Greeks who were ok with most forms of homosexuality. The only reservations they had about homosexuality as I understand it had to do with issues of honor if one were a male who was penetrated.
Edit: I get the impression from this article that the attitudes of ancient Indians to homosexuality has become so bogged down in modern politics that it may be difficult for non-experts to tell. I'll try to look into this more later.
IIRC, in pre-Christian Rome/Greece, homosexuality was considered OK only if the receiving partner was young enough.
Extrapolated CEV would be working from observable evidence + a good prior. Whereas lots of people insist it's very important to them to believe in a deity through faith, despite any contrary evidence (let alone lack of evidence). How are you going to tell the CEV to ignore such values?
Just as helpfully, if the FAI concludes that there is a deity around who we should please and who would prefer objecting to gay marriage, it will properly regard that as a value.
Or, presumably, if it concludes that there might some day come to be a deity, or other vastly powerful entity, who would prefer having objected to gay marriage.
Of course, all of this further presumes that there aren't/won't be other vastly powerful entities whose preferences have equal weight in opposite directions.
(emphasis added.)
Except Peter de Blanc's comments.
Now that the huffy remark has been removed, I can't see what post it used to refer to!
Peter deBlanc is a better mathematician than I am, so I'd better look at them.
ADDED. I see I responded to them before. I think they're good points but don't invalidate the model. I'll retract my huffy statement from the post, though.
The point of his remarks, in my view, was that your model needed validation in the first place. Every mathematical biology or computational cognitive science paper I've read makes some attempt to rationalize why they are bothering to examine whatever idealized model is under consideration.
I'm not sure if this is appropriate but like the original author I am unsure if a CEV is a thing that can be expressed in formal logic even if he brain were fully mapped into a virtual environment. A lot of how we craft our values are based on complex environmental factors that are not easily models. Please read Schall's "Disgust embodied as moral judgement" or J Greene's fMRI Investigation of Emotional Engagement in Moral Judgement. Our values are fluid and Non-Hierarchical . Developing values that have a strict hierarchy , as the OP says can lead to systems which can not change.
If the evolutionary process results in either convergence, divergence or extinction, and most often results in extinction, what reason(s) do I have to think that this 23rd emerging complex homo will not go the way of extinction also? Are we throwing all our hope towards super intelligence as our salvation?
The much stronger issue he raised is that it may well be that outside imagination and fiction, there is no monolithic 'intelligence' thing, and the 'benevolent ruler of the earth' software is then more dangerous than e.g. software that uses search and hill climbing to design better microchips, or design cures for diseases, or the like, without being 'intelligent' in the science fictional sense, and while lacking any form of real world volition. The "benevolent ruler of the earth" software would then, also, fail to provide any superior technical solutions to our problems, as this "intelligence" does not bring any important advantage over the algorithms normally used for problem solving.
The chip improver would spit out the blueprints, the cure designer would spit out the projected molecular images and DNA sequences, etc - no oracle crap with the 'utility' of making people understand something, which appears both near-impossible and entirely unnecessary.
Outside of mystic circles, it is fairly uncontroversial that it is in principle possible to construct out of matter an object capable of general intelligence. Proof is left to the reader.
Humans have a values hierarchy. Trouble is, most do not even know what it is (or, they are). IOW, for me honesty is one of the most important values to have. Also, sanctity of (and protection of) life is very high on the list. I would lie in a second to save my son's life. Some choice like that are no-brainers, however few people know all the values that they live by, let alone the hierarchy. Often humans only discover what these values are as they find themselves in various situations.
Just wondering... has anyone compiled a list of these values, morals, ethics... and applied them to various real-life situations to study the possible 'choices' an AI has and the potential outcomes with differing hierarchies?
ADDED: Sometime humans know the right thing but choose to do something else. Isn't that because of emotion? If so, what part does emotion play in superintelligence?
EDIT: To edit and simplify my thoughts, in order to get a General Intelligence Algorithm Instance to do anything requires masterful manipulation of parameters with full knowledge of generally how it is going to behave as a result. A level of understanding of psychology of all intelligent (and sub-intelligent) behavior. It is not feasible that someone would accidentally program something that would become an evil mastermind. GIA instances could easily be made to behave in a passive manner even when given affordances and output, kind of like a person that was happy to assist in any way possible because they were generally warm or high or something.
You can define the most important elements of human values for a GIA instance, because most of human values are a direct logical consequence of something that cannot be separated from the GIA... IE if general motivation X accidentally drove intelligence (see: Orthogonality Thesis ) and it also drove positive human values, then positive human values would be unavoidable. It is true that the specifics of body and environment drive some specific human values, but those are just side effects of X in that environment and X in different environments only changes so much and in predictable ways.
You can directly implant knowledge/reasoning into a GIA instance. The easiest way to do this is to train one under very controlled circumstances, and then copy the pattern. This reasoning would then condition the GIA instance's interpretation of future input. However, under conditions which directly disprove the value of that reasoning in obtaining X the GIA instance would un-integrate that pattern and reintegrate a new one. This can be influenced with parameter weights.
I suppose this could be a concern regarding the potential generation of an anger instinct. This HEAVILY depends on all the parameters however, and any outputs given to the GIA instance. Also, robots and computers do not have to eat, and have no associated instincts with killing things in order to do so... Nor do they have reproductive instincts...
When you say "predictable", do you mean in principle or actually predictable?
That is, are you claiming that you can predict what any human values given their environment, and furthermore that the environment can be easily and compactly specified?
Can you give an example?
Mathematically predictable but somewhat intractable without a faster running version of the instance, with the same frequency of input. Or predictable within ranges of some general rule.
Or just generally predictable with the level of understanding afforded to someone capable of making one in the first place, that for instance could describe the cause of just about any human psychological "disorder".
Name three values all agents must have, and explain why they must have them.
The concept of agent is logically inconsistent with the General Intelligence Algorithm. What you are trying to refer to with Agent/tool etc are just GIA instances with slightly different parameters, inputs, and outputs.
Even if it could be logically extended to the point of "Not even wrong" it would just be a convoluted way of looking at it.
I'm sorry, I wasn't trying to use terminology to misstate your position.
What are three values that a GIA must have, and why must they have them?
ohhhh... sorry... There is really only one, and everything else is derived from it. Familiarity. Any other values would depend on the input, output and parameters. However familiarity is inconsistent with the act of killing familiar things. The concern comes in when something else causes the instance to lose access to something it is familiar with, and the instance decides it can just force that to not happen.
Well, I'm not sure that Familiarity is sufficient to resolve every choice faced by a GIA - for example, how does one derive a reasonable definition of self-defense from Familiarity. But let's leave that aside for a moment.
Why must a GIA subscribe to the value of Familiarity?
"I Have No Mouth, and I Must Scream".