Kaj_Sotala comments on Yet more "stupid" questions - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (340)
Every now and then, there are discussions or comments on LW where people talk about finding a "correct" morality, or where they argue that some particular morality is "mistaken". (Two recent examples: [1] [2]) Now I would understand that in an FAI context, where we want to find such a specification for an AI that it won't do something that all humans would find terrible, but that's generally not the context of those discussions. Outside such a context, it sounds like people were presuming the existence of an objective morality, but I thought that folks on LW rejected that. What's up with that?
Objective morality in one (admiitedly rather long) sentence: For any moral dilemma, there is some particular decision you would make after a thousand years of collecting information, thinking, upgrading your intelligence, and reaching reflective equilibrium with all other possible moral dilemmas; this decision is the same for all humans, and is what we refer to when we say that an action is 'correct'.
I find that claim to be very implausible: to name just one objection to it, it seems to assume that morality is essentially "logical" and based on rational thought, whereas in practice moral beliefs seem to be much more strongly derived from what the people around us believe in. And in general, the hypothesis that all moral beliefs will eventually converge seems to be picking out a very narrow region in the space of possible outcomes, whereas "beliefs will diverge" contains a much broader space. Do you personally believe in that claim?
I'm not sure what I was expecting, but I was a little surprised after seeing you say you object to objective morality. I probably don't understand CEV well enough and I am pretty sure this is not the case, but it seems like there is so much similarity between CEV and some form of objective morality as described above. In other words, if you don't think moral beliefs will eventually converge, given enough intelligence, reflection, and gathering data, etc, then how do you convince someone that FAI will make the "correct" decisions based on the extrapolated volition?
CEV in its current form is quite under-specified. I expect that there would exist many, many different ways of specifying it, each of which would produce a different CEV that would converge at a different solution.
For example, Tarleton (2010) notes that CEV is really a family of algorithms which share the following features:
He comments:
Although one of Eliezer's desired characteristics for CEV was to ”avoid creating a motive for modern-day humans to fight over the initial dynamic”, a more rigorous definition of CEV will probably require making many design choices for which there will not be any objective answer, and which may be influenced by the designer's values. The notion that our values should be extrapolated according to some specific criteria is by itself a value-laden proposal: it might be argued that it was enough to start off from our current-day values just as they are, and then incorporate additional extrapolation only if our current values said that we should do so. But doing so would not be a value-neutral decision either, but rather one supporting the values of those who think that there should be no extrapolation, rather than of those who think there should be.
I don't find any of these issues to be problems, though: as long as CEV found any of the solutions in the set-of-final-values-that-I-wouldn't-consider-horrible, the fact that the solution isn't unique isn't much of an issue. Of course, it's quite possible that CEV will hit on some solution in that set that I would judge to be inferior to many others also in that set, but so it goes.
That argument seems like it would apply equally well to non-moral beliefs.
It seems there are two claims: One, that each human will be reflectively self-consistent given enough time; two, that the self-consistent solution will be the same for all humans. I'm highly confident of the first; for the second, let me qualify slightly:
With those qualifications, yes, I believe the second claim with, say, 85% confidence.
I find the first claim plausible though not certain, but I would expect that if such individual convergence happens, it will lead to collective divergence not convergence.
When we are young, our moral intuitions and beliefs are a hodge-podge of different things, derived from a wide variety of sources, probably reflecting something like a "consensus morality" that is the average of different moral positions in society. If/when we begin to reflect on these intuitions and beliefs, we will find that they are mutually contradictory. But one person's modus ponens is another's modus tollens: faced with the fact that a utilitarian intuition and a deontological intuition contradict each other, say, we might end up rejecting the utilitarian conclusion, rejecting the deontological conclusion, or trying to somehow reconcile them. Since logic by itself does not tell us which alternative we should choose, it becomes determined by extra-logical factors.
Given that different people seem to arrive at different conclusions when presented with such contradictory cases, and given that their judgement seems to be at least weakly predicted by their existing overall leanings, I would guess that the choice of which intuition to embrace would depend on their current balance of other intutions. Thus, if you are already leaning utilitarian, the intuitions which are making you lean that way may combine together and cause you to reject the deontological intuition, and vice versa if you're learning deontologist. This would mean that a person who initially started with an even mix of both intuitions would, by random drift, eventually end up in a position where one set of intuitions was dominant, after which there would be a self-reinforcing trajectory towards an area increasingly dominated by intuitions compatible with the ones currently dominant. (Though of course the process that determines which intuitions get accepted and which ones get rejected is nowhere as simple as just taking a "majority vote" of intuitions, and some intuitions may be felt so strongly that they are almost impossible to reject.) This would mean that as people carried out self-reflection, their position would end up increasingly idiosyncratic and distant from the consensus morality. This seems to be roughly compatible with what I have anecdotally observed in various people, though my sample size is relatively small.
I feel that I have personally been undergoing this kind of a drift: I originally had the generic consensus morality that one adopts by spending their childhood in a Western country, after which I began reading LW, which worked to select and reinforce my existing set of utilitarian intuitions - but had I not already been utilitarian-leaning, the utilitarian emphasis on LW might have led me to reject those claims and seek out a (say) more deontological influence. But as time has gone by, I have become increasingly aware of the fact that some of my strongest intuitions lean towards negative utilitarianism, whereas LW is more akin to classical utilitarianism. Reflecting upon various intuitions has led me to gradually reject various intuitions that I previously took to support classical rather than negative utilitarianism, thus causing me to move away from the general LW consensus. And since this process has caused some of the intuitions that previously supported a classical utilitarian position to lose their appeal, I expect that moving back towards CU is less likely than continued movement towards NU.
Seconding Kaj_Sotala's question. Is there a good argument why self-improvement doesn't have diverging paths due to small differences in starting conditions?
Dunno. CEV actually contains the phrase, "and had grown up farther together," which the above leaves out. But I feel a little puzzled about the exact phrasing, which does not make "were more the people we wished we were" conditional on this other part - I thought the main point was that people "alone in a padded cell," as Eliezer puts it there, can "wish they were" all sorts of Unfriendly entities.
I assume the same but instead of "all humans" the weaker "the people participating in this conversation".
I don't think even that's a sufficient definition.
It's that all observers (except psychos), no matter their own particular circumstances and characteristics, would assign approval/disapproval in exactly the same way.
Psychopaths are quite capable of perceiving objective truths. In fact if there was an objective morality I expect it would work better for psychopaths than for anyone else.
I believe Rolf has excommunicated psychopaths (and Clippy) from the set of agents from whom "human morality" is calculated.
First they purged the psychopaths...
Me, I don't think everyone else converges to the same conclusions. All non psychopaths just aren't all made out of the same moral cookie cutter. It's not that we have to "figure out" what is right, it's that we have different values. If casual observation doesn't convince you of this, Haidt's quantified approach should.
That's one but not the only one possible definition of objective morality.
At least some of the prominent regulars seem to believe in objective morality outside of any FAI context, I think (Alicorn? palladias?).
The connotations of "objective" (also discussed in the other replies in this thread) don't seem relevant to the question about the meaning of "correct" morality. Suppose we are considering a process of producing an idealized preference that gives different results for different people, and also nondeterministically gives one of many possible results for each person. Even in this case, the question of expected ranking of consequences of alternative actions according to this idealization process applied to someone can be asked.
Should this complicated question be asked? If the idealization process is such that you expect it to produce a better ranking of outcomes than you can when given only a little time, then it's better to base actions on what the idealization process could tell you than on your own guess (e.g. desires). To the extent your own guess deviates from your expectation of the idealization process, basing your actions on your guess (desires) is an incorrect decision.
A standard example of an idealization dynamic is what you would yourself decide given much more time and resources. If you anticipate that the results of this dynamic can nondeterministically produce widely contradictory answers, this too will be taken into account by the dynamic itself, as the abstract you-with-more-time starts to contemplate the question. The resulting meta-question of whether taking the diverging future decisions into account produces worse decisions can be attacked in the same manner, etc. If done right, such process can reliably give a better result than you-with-little-time can, because any problem with it that you could anticipate will be taken into account.
A hypothetical idealization dynamic may not be helpful in actually making decisions, but its theoretical role is that it provides a possible specification of the "territory" that moral reasoning should explore, a criterion of correctness. It is a hard-to-use criterion of correctness, you might need to build a FAI to actually access it, but at least it's meaningful, and it illustrates the way in which many ways of thinking about morality are confused.
(As an analogy, we might posit the problem of drawing an accurate map of the surface of Pluto. My argument amounts to pointing out that Pluto can be actually located in the world, even if we don't have much information about the details of its surface, and won't be able to access it without building spacecraft. Given that there is actual territory to the question of the surface of Pluto, many intuition-backed assertions about it can already be said to be incorrect (as antiprediction against something unfounded), even if there is no concrete knowledge about what the correct assertions are. "Subjectivity" may be translated as different people caring about surfaces of different celestial bodies, but all of them can be incorrect in their respective detailed/confident claims, because none of them have actually observed the imagery from spacecraft.)
I think that such a specification probably isn't the correct specification of the territory that moral reasoning should explore. By analogy, it's like specifying the territory for mathematical reasoning based on idealizing human mathematical reasoning, or specifying the territory for scientific reasoning based on idealizing human scientific reasoning. (As opposed to figuring out how to directly refer to some external reality.) It seems like a step that's generally tempting to take when you're able to informally reason (to some extent) about something but you don't know how to specify the territory, but I would prefer to just say that we don't know how to specify the territory yet. But...
Maybe I'm underestimating the utility of having a specification that's "at least meaningful" even if it's not necessarily correct. (I don't mind "hard-to-use" so much.) Can you give some examples of how it illustrates the way in which many ways of thinking about morality are confused?
The usual Typical Mind Fallacy which is really REALLY pervasive.
I wrote a post to try to answer this question. I talk about "should" in the post, but it applies to "correct" as well.
I just assumed it meant "My extrapolated volition" and also "your extrapolated volition" and also the implication those are identical.
Here is a decent discussion of objective morality.
People are often wrong about what their preferences are + most humans have roughly similar moral hardware. Not identical, but close enough to behave as if we all share a common moral instinct.
When you make someone an argument and they change their mind on a moral issue, you haven't changed their underlying preferences...you've simply given them insight as to what their true preferences are.
For example, if a neurotypical human said that belief in God was the reason they don't go around looting and stealing, they'd be wrong about themselves as a matter of simple fact.
-as per the definition of preference that I think makes the most sense.
-Alternatively, you might actually be re-programming their preferences...I think it's fair to say that at least some preferences commonly called "moral" are largely culturally programmed.