Six Plausible Meta-Ethical Alternatives

Wei Dai

LESSWRONG
LW

Six Plausible Meta-Ethical Alternatives — LessWrong

108 Six Plausible Meta-Ethical Alternatives

by Wei Dai

6th Aug 2014

AI Alignment Forum

3 min read

108 Ω 24

In this post, I list six metaethical possibilities that I think are plausible, along with some arguments or plausible stories about how/why they might be true, where that's not obvious. A lot of people seem fairly certain in their metaethical views, but I'm not and I want to convey my uncertainty as well as some of the reasons for it.

Most intelligent beings in the multiverse share similar preferences. This came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts. There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.
Facts about what everyone should value exist, and most intelligent beings have a part of their mind that can discover moral facts and find them motivating, but those parts don't have full control over their actions. These beings eventually build or become rational agents with values that represent compromises between different parts of their minds, so most intelligent beings end up having shared moral values along with idiosyncratic values.
There aren't facts about what everyone should value, but there are facts about how to translate non-preferences (e.g., emotions, drives, fuzzy moral intuitions, circular preferences, non-consequentialist values, etc.) into preferences. These facts may include, for example, what is the right way to deal with ontological crises. The existence of such facts seems plausible because if there were facts about what is rational (which seems likely) but no facts about how to become rational, that would seem like a strange state of affairs.
None of the above facts exist, so the only way to become or build a rational agent is to just think about what preferences you want your future self or your agent to hold, until you make up your mind in some way that depends on your psychology. But at least this process of reflection is convergent at the individual level so each person can reasonably call the preferences that they endorse after reaching reflective equilibrium their morality or real values.
None of the above facts exist, and reflecting on what one wants turns out to be a divergent process (e.g., it's highly sensitive to initial conditions, like whether or not you drank a cup of coffee before you started, or to the order in which you happen to encounter philosophical arguments). There are still facts about rationality, so at least agents that are already rational can call their utility functions (or the equivalent of utility functions in whatever decision theory ends up being the right one) their real values.
There aren't any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one "wins" overall.

(Note that for the purposes of this post, I'm concentrating on morality in the axiological sense (what one should value) rather than in the sense of cooperation and compromise. So alternative 1, for example, is not intended to include the possibility that most intelligent beings end up merging their preferences through some kind of grand acausal bargain.)

It may be useful to classify these possibilities using labels from academic philosophy. Here's my attempt: 1. realist + internalist 2. realist + externalist 3. relativist 4. subjectivist 5. moral anti-realist 6. normative anti-realist. (A lot of debates in metaethics concern the meaning of ordinary moral language, for example whether they refer to facts or merely express attitudes. I mostly ignore such debates in the above list, because it's not clear what implications they have for the questions that I care about.)

One question LWers may have is, where does Eliezer's metathics fall into this schema? Eliezer says that there are moral facts about what values every intelligence in the multiverse should have, but only humans are likely to discover these facts and be motivated by them. To me, Eliezer's use of language is counterintuitive, and since it seems plausible that there are facts about what everyone should value (or how each person should translate their non-preferences into preferences) that most intelligent beings can discover and be at least somewhat motivated by, I'm reserving the phrase "moral facts" for these. In my language, I think 3 or maybe 4 is probably closest to Eliezer's position.

Ethics & MoralityMoral uncertaintyPhilosophyMetaethicsWorld Modeling

Frontpage

108 Ω 24

New Comment

42 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:28 AM

[-]Eliezer Yudkowsky12y320

Given your terminology without dispute, and then ignoring all debates about what ordinary human language refers to, yes 3-4. I think we have enough knowledge at this point to reject internalism out of hand, and if I were going to dispute your terminology then I would say that 2 is also internalism, just weaker internalism, and that the internalism/externalism debate shouldn't ought to be said to have things to do with realism, see e.g. "An Introduction to Contemporary Metaethics" in which externalist theories are still classified as realistic; I think a lot of what feels like a naively necessary quality of cognitivism/realism is actually particular kinds of non-naturalism in the standard schema. E.g. I would consider "a fact such that knowledge of it is inherently motivating to every possible mind" to be non-reductionist because it's a kind of Mind Projection Fallacy of the quality of motivating-ness that facts have to us, but that has nothing to do with whether our own morals have the property of cognitivism/realism. If I were further going to dispute terminology, I would replace a lot of what you would call "facts" with what I would call "validities" and try to ground them in values every time they involved any kind of preference or betterness or choice, since the laws of physics contain no little < or > signs. But on your scheme, yes 3-4.

[-]_will_1y80

Great post! I find myself coming back to it—especially possibility 5—as I sit here in 2025 thinking/worrying about AI philosophical competence and the long reflection.

On 6,^[1] I’m curious if you’ve seen this paper by Joar Skalse? It begins:

I present an argument and a general schema which can be used to construct a problem case for any decision theory, in a way that could be taken to show that one cannot formulate a decision theory that is never outperformed by any other decision theory.

^{^}
Pasting here for easy reference (emphasis my own):
6. There aren’t any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one “wins” overall.

[-]Noosphere891y20

This post gives a pretty short proof, and my main takeaway is that intelligence and consciousness converges to look-up tables which are infinitely complicated, so as to deal with every possible situation:

https://www.lesswrong.com/posts/2LvMxknC8g9Aq3S5j/ldt-and-everything-else-can-be-irrational

I agree with this implication for optimization:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment#N3avtTM3ESH4KHmfN

[-]ShardPhoenix12y80

I found this a useful summary of (at least some of) the possibilities.

A small issue is that I find that it's clearer to avoid using the word 'should' as much as possible when discussing meta-ethics, since leaving the goal implied ('should' do X for what end?) can be ambiguous. I think it's clearer to talk about what will happen if a being uses such-and-such a decision theory/ethics/etc, and when translating that to 'should' to be very clear what goal is being targeted.

edit: To give a specific example, using an unqualified 'should' can lead to (intentionally or unintentionally) equivocating between the values of the being under discussion and the values of its species/humanity/the author/some other ideal.

[-]Wei Dai12y20

I wrote a post on the meaning of unqualified 'should', and my usage here is in line with that.

[-]ShardPhoenix12y20

As far as I can tell that posts just demonstrates that 'should' really is too ambiguous to use in this kind of technical discussion where precise communication is desired.

[-]Slider12y-10

As I read should here unqualified it means that natural selection favours certain kinds of beliefs ie those that help prosperity. Althought for some people it also means spesifications on in which prosperity to shoot for. I tend to be very suspicious of claiming one direction of radiation being better than another (before / unrelated to the genocide mechanism).

Although my concept deconstruction might have erased any "should" and might be a bit unstandard. It's not anymore about "good" and "evil" but what is possible and what's impossible. Some actions are highly dangerous and resource depleting being very nearly impossible and need to be offset by a lot of enabling features. Thus it's about which forms of life are maintainable, under which conditions and which have a half-life. In this way whether helium happens more than uranium is the same kind of question whether a code of conduct results in a prosperous society (say two forms of goverment) but just orders of magnitude more harder to answer. But does it mean that uranium is more evil than helium as elements? Humans know how to use uranium as part of global security so as natural resources it clearly can produce a human good.

[-]Peter Wildeford12y60

“Most intelligent beings in the multiverse share similar preferences. This came about because there are facts about what preferences one should have”

Notably, this could come about for other reasons, such as evolutionary.

[-]cousin_it12y20

Well, our preferences did come from evolution. I suppose an interesting question is how evolution manages to "elicit" normative facts, if those exist.

[-]Azathoth12312y40

The same way it elicits physical facts.

[-]Shmi12y60

7 We don't have nearly enough knowledge to reason about multiverses and intergalactic civilizations and picking 6 possibilities which you happened to think of is privileging 6 hypotheses (described in poorly defined terms) out of a countless number. Maybe we ought to concentrate on models which can be tested/simulated/falsified instead.

[-][anonymous]12y90

It seems to me that Wei_Dai's six hypotheses do a good job of covering a lot of the logical space. A good enough job that even though I've been professionally trained to think about this problem, I can't come up with any significantly different suggestions.

But maybe I'm being unimaginative (a side effect of training, often enough). If you think these are merely six of countless hypotheses, do you think you could come up with, say, two more?

[-]blacktrance12y30

If you think these are merely six of countless hypotheses, do you think you could come up with, say, two more?

Two more possible positions:

There is a great variety of possible consistent preferences that intelligent beings can have, and there are no facts about what one should value that apply to all possible intelligent beings. However, there are still facts about rationality that do apply to all intelligent beings. Also, if you narrow the scope from "intelligent beings" to "humans", most humans , when consistent, share similar preferences, and there exist facts about what they should value. (So, 4 or 5 for intelligent beings in general, but 1 for humans.)
Morality has nothing to do with value.

[-]Wei Dai12y40

Your first suggestion isn't an additional alternative, it's just a subdivision within 4 or 5.

I'm not sure I understand the second one. Are you trying to draw the distinction between consequentialism and non-consequentialist moralities? If so, I think that is usually considered to be a distinction in normative ethics rather than metaethics. Although I repeatedly use "preferences" and "values" in this post, that was just for convenience rather than trying to imply that morality must have something to do with values.

[-]blacktrance12y10

Your first suggestion isn't an additional alternative, it's just a subdivision within 4 or 5.

Perhaps, but it seems like there's a substantive difference between those who believe there are no facts about what all intelligent beings should value and between those who believe that in addition to that, there are also no facts about what humans should value.

Although I repeatedly use "preferences" and "values" in this post, that was just for convenience rather than trying to imply that morality must have something to do with values.

Could you give an example of one of these positions put in terms that would be inclusive of both consequentialist and non-consequentialist ethical theories?

[-]Wei Dai12y20

Could you give an example of one of these positions put in terms that would be inclusive of both consequentialist and non-consequentialist ethical theories?

Sure. 1. Most intelligent beings in the multiverse end up sharing similar moralities. This came about because there are facts about what morals one should have. For example, suppose there are facts about what preferences one should have along with facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts. There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.

[-]Shmi12y10

do you think you could come up with, say, two more?

OP discusses "facts about what everyone should value", (which is an odd use of the term "fact", by the way). His classification is:

There is a unique set of values which

is a limit
is an attractor of sorts
There is no unique set of values
(I failed to understand what this item says)
but you can come up with your own "consistent" (in some sense) set of preferences to optimize for
you cannot come up with a consistent set of values (preferences?), though you can optimize for each one separately
value is not something you can optimize for at all.

Eliezer's position is something like "1. but limited to humans/FAI only", which seems like a separate hypothesis. Other options off the top of my head are that there can be multiple self-consistent limits or attractors, or that the notion of value only makes sense for humans or some subset of them.

Or maybe a hard enough optimization attempt disturbs the value enough to change it, so one can only optimize so much without changing preferences. Or maybe the way to meta-morality is maximizing the diversity of moralities by creating/simulating a multiverse with all the ethical systems you can think of, consistent or inconsistent. Or maybe we should (moral "should") matrix-like break out of the simulation we are living in and learn about the level above us. Or that the concept of "intelligent being" is inconsistent to begin with. Or...

Options are many and none are testable, so, while it's good to ask grand questions, it's silly to try to give grand answers or classification schemes.

[-]VAuroch12y20

To fill in the gap in 3: There is no unique set of values, but there is a unique process for deriving an optimal set of consistent preferences (up to some kind of isomorphism), though distinct individuals will get different results after carrying out this process.

As opposed to 4, which states that there is some set of processes that can derive consistent preferences but that no claims about which of these processes is best can be substantiated.

And as I said above, Eliezer believes something like 3, but insists on the caveat if we consider only humans, all consistent sets of preferences generated will substantially overlap, and that therefore we can create an FAI whose consistent preferences will entirely overlap that set.

[-]Wei Dai12y60

Maybe we ought to concentrate on models which can be tested/simulated/falsified instead.

Can you give some examples of what you'd like to concentrate on instead?

[-]Shmi12y40

Say, trying to understand how the abstractions we call "value" or "preference" emerge and under what conditions would be a start. For example, does Deep Blue have values? It certainly has preferences. What is the difference? How would one write an algorithm which has both/either? How competitive would these be and under what conditions? Maybe run a simulation or a dozen to test it.

You know, concentrate on answerable questions.

[-]VAuroch12y50

I would characterize Eliezer's metaethics slightly differently; I'd say he believes that 'moral facts' as conceived of by humans are a human-specific notion with no relevance to any other type of mind, but that they exist, and that he would place it between 2 and 3. Or more specifically, he'd endorse 3 with the caveat that if you restrict the domain of 'everyone' to humans, 2 would also be true.

I'd tentatively agree but don't feel informed enough to have a strong opinion or motivated to form one.

[-]Wei Dai12y20

Or more specifically, he'd endorse 3

The reason I said 3 or 4 is that it's not clear to me to what extent Eliezer thinks there are facts about how one ought to translate non-preferences into preferences (in a sense that is relevant to everyone, not just humans). I don't know if he has taken any position on this question.

with the caveat that if you restrict the domain of 'everyone' to humans, 2 would also be true.

Yes, assuming you mean to also restrict the domain of "most intelligent beings" to humans. However I think he would deny 2 as written.

[-]VAuroch12y30

You are of course correct about the intended domain-restriction.

I'd be surprised to hear an argument for how 4 was compatible with CEV or something like it, since lack of rigid general preference-creation would make convergence on a broad scale fairly implausible. And that conclusion does seem at odds with statements he's made. But I do see your point.

[-]Lukas_Gloor12y40

facts about what preferences one should have

The "should" here is not defined clearly enough (or at all!), even though this seems to be the central point in the debate. We have the intuition that the question is meaningful, but I suspect that it really isn't. I don't understand what this could possibly mean -- expect for trivial cases where you already specify a goal. I would leave it at "Most intelligent beings in the multiverse share similar preferences", with perhaps adding a qualifier like "evolved/intelligently designed". Note that this would then be answering a slightly different question than 3., 4. and 5.

My own view is roughly a 4.3 on the spectrum from 4. to 5.

The way "complexity of value" is used by Eliezer seems to suggest that he adheres to view 3, although I could well imagine him also going for 4 or 5.

I'm unsure about 6; I suspect/hope that you can just define "winning" clearly enough in whatever utility function you're interested in and decision theory will sort itself out. But maybe it's more complicated.

[-]Tyrrell_McAllister12y30

I'm not getting the essential difference between (3) and (4). It seems like (4) is just a special case of (3), in that (4)'s "process of reflection [that] is convergent at the individual level" could just be (3)'s "how to translate non-preferences ... into preferences".

Is the difference that, if (3) held, then there could be only one correct "process of reflection" of the kind described in (4), so that this one process would be the only correct path for all pre-rational intelligent agents to take to become rational?

[-]Wei Dai12y40

Is the difference that, if (3) held, then there could be only one correct "process of reflection" of the kind described in (4), so that this one process would be the only correct path for all pre-rational intelligent agents to take to become rational?

Yes, or at least there are right and wrong ways to reflect on what one wants, so that even if someone were to reach reflective equilibrium via a convergent process, it would make sense to say that they did it wrong and ended up with wrong values (or "wrong values for them").

[-]Tyrrell_McAllister12y10

even if someone were to reach reflective equilibrium via a convergent process, it would make sense to say that they did it wrong and ended up with wrong values (or "wrong values for them").

Thanks. I can sort-of imagine that some, but not all, ways of reaching reflective equilibrium could be "wrong", even if the values held in that equilibrium state could not be said to be "wrong".

But, under hypotheses (3), that's the most we could say, right? How could we go on to say that the agent ended up with "wrong values" if, under hypothesis (3), there is no fact of the matter about which values are "wrong"?

Or maybe a scenario (2.5) could be added intermediate between your (2) and your (3). Under this scenario, as in (3), there are no facts about what everyone should value. Nonetheless, for each individual agent, there is a fact about what that agent should value. However, as in (2), the typical agent will not converge exactly on its "correct" values. Instead, the typical agent will converge on its values "along with idiosyncratic values".

[-]manueldelrio4mo20

This was a nice and relatively short post as well. I started reading it from what I assume is an anti-realist position (ethics as something constructed, a framework of agreements between rational agents to enable cooperation and mutual benefit, and therefore something mostly procedural and contractual. Probably aligned with Hobbes and Gauthier, once I find the time to read them). I was unsurprised that, having chosen your 5 as the most similar to my views, you described it as 'moral anti-realist'. I have the impression that EAs and perhaps a lot of Rationalists seem to resonate a lot with Utilitarianism. I appreciate any suggestions of older posts to read in this regard (I am new to all this).

[-]TAG8mo20

Which problem are you trying to solve? What metaethics is, or what rational behaviour is?.

[-]Canaletto9mo10

I would also add:

Most intelligent beings in the multiverse share similar preferences, because the process by which they were created / come to exist share major parts / have similar constrains / it's a logical fact that some preferences are more likely to arise. So it's not a fact about the logical structure of "having preferences, being and agent" thing, but of "coming to exist in the multiverse" thing. And if you discover what kinds of preferences are likely you can invert it without major obstacles and create an atypical agent, but you yourself is an agent who wouldn't want to do that, most likely.

Kind of like #2? But I'm not sure.

[-]Cole Wyeth1y10

I believe 3 is about right in principle but 5 describes humans today.

[-]KnaveOfAllTrades12y10

Thanks for posting this! Is this list drawn up hodge-podge, or is there some underlying process that generated it? How likely do you think it is to be exhaustive?

It looks like your list is somewhat methodical, arising from combinations of metaethical desiderata/varyingly optimistic projections of the project of value loading?

Are you able to put probabilities to the possibilities?

For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one "wins" overall.

I'm very confident that no decision theory does better than every other in every situation, insomuch as the decision theories are actually implemented. For any implementing agent, I can put that agent in the adversarial world where Omega instantly destroys any agent implementing that decision theory, assuming it has an instantiation in some world where that makes sense (e.g. where 'instantly' and 'destroy' make sense). This is what we would expect on grounds of Created Already In Motion and general No Free Lunch principles.

The only way I currently see to resolve this is something along the lines of having a measure of performance over instantations of the decision theory, and some scoring rule over that measure over instantiations. Might be other ways, though.

Eliezer says that there are moral facts about what values every intelligence in the multiverse should have, but only humans are likely to discover these facts and be motivated by them.

Just to check, you mean

(A) For all I, there exists some M such that I should observe M

and not

(B) There exists some set of moral facts M, such that for each intelligence I, I should observe M

right?

[-]Tyrrell_McAllister12y10

Just to check, you mean

(A) For all I, there exists some M such that I should observe M

and not

(B) There exists some set of moral facts M, such that for each intelligence I, I should observe M

right?

Eliezer uses "should" in an idiosyncratic way, which he thought (and maybe still thinks) would prevent a particular kind of confusion.

On this usage of "should", Eliezer would probably* endorse something very close to (B). However, the "should" is with respect to the moral values towards which human CEV points (in the actual world, not in some counterfactual or future world in which the human CEV is different). These values make up the M that is asserted to exist in (B). And, as far as M is concerned, it would probably be best if all intelligent agents observed M.

* I'm hedging a little bit because maybe, under some perverse circumstances, it would be moral for an agent to be unmoved by moral facts. To give a fictional example, apparently God was in such a circumstance when he hardened the heart of Pharaoh.

[-]Nectanebo12y10

So is this is roughly one aspect of why MIRI's position on AI safety concerns are different to similar parties? - that they're generally more sympathetic to possibilities futher away from 1 than their peers? I don't really know, but that's what the pebblesorters/value-is-fragile strain of thinking seems to suggest for me.

[-][anonymous]12y20

That's one reason. As an example, Goertzel seems to fall somewhat in (1) with his cosmist manifesto.

But more importantly I think are issues of hard takeoff timeline and AGI design. The mainstream opinion, I think, is that a hard-takeoff would take years at the minimum, and there would be both sufficient time to recognize what is going on and to stop the experiment. Also MIRI seems for some reason to threat-model its AGI's as some sort of perfectly rational alien utility-maximizer, whereas real AGIs are implemented with all sorts of heuristic tricks that actually do a better job of emulating the quirky way humans think. Combined with the slow takeoff, projects like OpenCog intend to teach robot children in a preschool like environment, thereby value-loading them in the same way that we value-load our children.

[-]torekp12y20

Also MIRI seems for some reason to threat-model its AGI's as some sort of perfectly rational alien utility-maximizer, whereas real AGIs are implemented with all sorts of heuristic tricks that actually do a better job of emulating the quirky way humans think.

This is extremely important, and I hope you will write a post about it.

[-]Nectanebo12y10

Yeah, I was thinking of Goertzel as well.

So you don't think MIRI's work is all that useful? What probability would you assign to hard-takeoff happening of the speed they're worried about?

[-][anonymous]12y00

Indistinguishable from zero, at least with current levels of technology. The mind is an immensely complex machine capable of processing information orders of magnitude faster than the largest HPC clusters. Why should we expect an early dumb intelligence running on mediocre hardware to recursively self-improve so quickly? The burden of proof rests with MIRI, I believe. (And I'm still waiting.)

[-]Manfred12y10

Well, then I'll attempt to mindread, and guess Eliezer's position is more like 5.

I'm not sure how much that's just an overly complicated way of saying "my own position is ~5, and projection is a thing," though. Does this mean I can reverse-mindread you and guess that, if you absolutely had to pick one of the six, you would pick 3 or maybe 4?

[-]Mestroyer12y10

~5, huh? Am I to credit?

[-]Manfred12y10

Nay, all credit (okay, not actually all credit) goes to the blue-minimizing robot.

[-]hairyfigment12y00

Since Eliezer somewhere said he wants FAI to extrapolate multiple past versions of you - I think he said, as many as possible - he seems to allow for 5.

Moderation Log