Six Plausible Meta-Ethical Alternatives
In this post, I list six metaethical possibilities that I think are plausible, along with some arguments or plausible stories about how/why they might be true, where that's not obvious. A lot of people seem fairly certain in their metaethical views, but I'm not and I want to convey my uncertainty as well as some of the reasons for it.
- Most intelligent beings in the multiverse share similar preferences. This came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts. There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.
- Facts about what everyone should value exist, and most intelligent beings have a part of their mind that can discover moral facts and find them motivating, but those parts don't have full control over their actions. These beings eventually build or become rational agents with values that represent compromises between different parts of their minds, so most intelligent beings end up having shared moral values along with idiosyncratic values.
- There aren't facts about what everyone should value, but there are facts about how to translate non-preferences (e.g., emotions, drives, fuzzy moral intuitions, circular preferences, non-consequentialist values, etc.) into preferences. These facts may include, for example, what is the right way to deal with ontological crises. The existence of such facts seems plausible because if there were facts about what is rational (which seems likely) but no facts about how to become rational, that would seem like a strange state of affairs.
- None of the above facts exist, so the only way to become or build a rational agent is to just think about what preferences you want your future self or your agent to hold, until you make up your mind in some way that depends on your psychology. But at least this process of reflection is convergent at the individual level so each person can reasonably call the preferences that they endorse after reaching reflective equilibrium their morality or real values.
- None of the above facts exist, and reflecting on what one wants turns out to be a divergent process (e.g., it's highly sensitive to initial conditions, like whether or not you drank a cup of coffee before you started, or to the order in which you happen to encounter philosophical arguments). There are still facts about rationality, so at least agents that are already rational can call their utility functions (or the equivalent of utility functions in whatever decision theory ends up being the right one) their real values.
- There aren't any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one "wins" overall.
(Note that for the purposes of this post, I'm concentrating on morality in the axiological sense (what one should value) rather than in the sense of cooperation and compromise. So alternative 1, for example, is not intended to include the possibility that most intelligent beings end up merging their preferences through some kind of grand acausal bargain.)
It may be useful to classify these possibilities using labels from academic philosophy. Here's my attempt: 1. realist + internalist 2. realist + externalist 3. relativist 4. subjectivist 5. moral anti-realist 6. normative anti-realist. (A lot of debates in metaethics concern the meaning of ordinary moral language, for example whether they refer to facts or merely express attitudes. I mostly ignore such debates in the above list, because it's not clear what implications they have for the questions that I care about.)
One question LWers may have is, where does Eliezer's metathics fall into this schema? Eliezer says that there are moral facts about what values every intelligence in the multiverse should have, but only humans are likely to discover these facts and be motivated by them. To me, Eliezer's use of language is counterintuitive, and since it seems plausible that there are facts about what everyone should value (or how each person should translate their non-preferences into preferences) that most intelligent beings can discover and be at least somewhat motivated by, I'm reserving the phrase "moral facts" for these. In my language, I think 3 or maybe 4 is probably closest to Eliezer's position.
Three Approaches to "Friendliness"
I put "Friendliness" in quotes in the title, because I think what we really want, and what MIRI seems to be working towards, is closer to "optimality": create an AI that minimizes the expected amount of astronomical waste. In what follows I will continue to use "Friendly AI" to denote such an AI since that's the established convention.
I've often stated my objections MIRI's plan to build an FAI directly (instead of after human intelligence has been substantially enhanced). But it's not because, as some have suggested while criticizing MIRI's FAI work, that we can't foresee what problems need to be solved. I think it's because we can largely foresee what kinds of problems need to be solved to build an FAI, but they all look superhumanly difficult, either due to their inherent difficulty, or the lack of opportunity for "trial and error", or both.
When people say they don't know what problems need to be solved, they may be mostly talking about "AI safety" rather than "Friendly AI". If you think in terms of "AI safety" (i.e., making sure some particular AI doesn't cause a disaster) then that does looks like a problem that depends on what kind of AI people will build. "Friendly AI" on the other hand is really a very different problem, where we're trying to figure out what kind of AI to build in order to minimize astronomical waste. I suspect this may explain the apparent disagreement, but I'm not sure. I'm hoping that explaining my own position more clearly will help figure out whether there is a real disagreement, and what's causing it.
The basic issue I see is that there is a large number of serious philosophical problems facing an AI that is meant to take over the universe in order to minimize astronomical waste. The AI needs a full solution to moral philosophy to know which configurations of particles/fields (or perhaps which dynamical processes) are most valuable and which are not. Moral philosophy in turn seems to have dependencies on the philosophy of mind, consciousness, metaphysics, aesthetics, and other areas. The FAI also needs solutions to many problems in decision theory, epistemology, and the philosophy of mathematics, in order to not be stuck with making wrong or suboptimal decisions for eternity. These essentially cover all the major areas of philosophy.
For an FAI builder, there are three ways to deal with the presence of these open philosophical problems, as far as I can see. (There may be other ways for the future to turns out well without the AI builders making any special effort, for example if being philosophical is just a natural attractor for any superintelligence, but I don't see any way to be confident of this ahead of time.) I'll name them for convenient reference, but keep in mind that an actual design may use a mixture of approaches.
- Normative AI - Solve all of the philosophical problems ahead of time, and code the solutions into the AI.
- Black-Box Metaphilosophical AI - Program the AI to use the minds of one or more human philosophers as a black box to help it solve philosophical problems, without the AI builders understanding what "doing philosophy" actually is.
- White-Box Metaphilosophical AI - Understand the nature of philosophy well enough to specify "doing philosophy" as an algorithm and code it into the AI.
The problem with Normative AI, besides the obvious inherent difficulty (as evidenced by the slow progress of human philosophers after decades, sometimes centuries of work), is that it requires us to anticipate all of the philosophical problems the AI might encounter in the future, from now until the end of the universe. We can certainly foresee some of these, like the problems associated with agents being copyable, or the AI radically changing its ontology of the world, but what might we be missing?
Black-Box Metaphilosophical AI is also risky, because it's hard to test/debug something that you don't understand. Besides that general concern, designs in this category (such as Paul Christiano's take on indirect normativity) seem to require that the AI achieve superhuman levels of optimizing power before being able to solve its philosophical problems, which seems to mean that a) there's no way to test them in a safe manner, and b) it's unclear why such an AI won't cause disaster in the time period before it achieves philosophical competence.
White-Box Metaphilosophical AI may be the most promising approach. There is no strong empirical evidence that solving metaphilosophy is superhumanly difficult, simply because not many people have attempted to solve it. But I don't think that a reasonable prior combined with what evidence we do have (i.e., absence of visible progress or clear hints as to how to proceed) gives much hope for optimism either.
To recap, I think we can largely already see what kinds of problems must be solved in order to build a superintelligent AI that will minimize astronomical waste while colonizing the universe, and it looks like they probably can't be solved correctly with high confidence until humans become significantly smarter than we are now. I think I understand why some people disagree with me (e.g., Eliezer thinks these problems just aren't that hard, relative to his abilities), but I'm not sure why some others say that we don't yet know what the problems will be.
What do professional philosophers believe, and why?
LessWrong has twice discussed the PhilPapers Survey of professional philosophers' views on thirty controversies in their fields — in early 2011 and, more intensively, in late 2012. We've also been having some lively debates, prompted by LukeProg, about the general value of contemporary philosophical assumptions and methods. It would be swell to test some of our intuitions about how philosophers go wrong (and right) by looking closely at the aggregate output and conduct of philosophers, but relevant data is hard to come by.
Fortunately, Davids Chalmers and Bourget have done a lot of the work for us. They released a paper summarizing the PhilPapers Survey results two days ago, identifying, by factor analysis, seven major components consolidating correlations between philosophical positions, influences, areas of expertise, etc.
1. Anti-Naturalists: Philosophers of this stripe tend (more strongly than most) to assert libertarian free will (correlation with factor .66), theism (.63), the metaphysical possibility of zombies (.47), and A theories of time (.28), and to reject physicalism (.63), naturalism (.57), personal identity reductionism (.48), and liberal egalitarianism (.32).
Anti-Naturalists tend to work in philosophy of religion (.3) or Greek philosophy (.11). They avoid philosophy of mind (-.17) and cognitive science (-.18) like the plague. They hate Hume (-.14), Lewis (-.13), Quine (-.12), analytic philosophy (-.14), and being from Australasia (-.11). They love Plato (.13), Aristotle (.12), and Leibniz (.1).
2. Objectivists: They tend to accept 'objective' moral values (.72), aesthetic values (.66), abstract objects (.38), laws of nature (.28), and scientific posits (.28). Note 'Objectivism' is being used here to pick out a tendency to treat value as objectively binding and metaphysical posits as objectively real; it isn't connected to Ayn Rand.
A disproportionate number of objectivists work in normative ethics (.12), Greek philosophy (.1), or philosophy of religion (.1). They don't work in philosophy of science (-.13) or biology (-.13), and aren't continentalists (-.12) or Europeans (-.14). Their favorite philosopher is Plato (.1), least favorites Hume (-.2) and Carnap (-.12).
3. Rationalists: They tend to self-identify as 'rationalists' (.57) and 'non-naturalists' (.33), to accept that some knowledge is a priori (.79), and to assert that some truths are analytic, i.e., 'true by definition' or 'true in virtue of 'meaning' (.72). Also tend to posit metaphysical laws of nature (.34) and abstracta (.28). 'Rationalist' here clearly isn't being used in the LW or freethought sense; philosophical rationalists as a whole in fact tend to be theists.
Rationalists are wont to work in metaphysics (.14), and to avoid thinking about the sciences of life (-.14) or cognition (-.1). They are extremely male (.15), inordinately British (.12), and prize Frege (.18) and Kant (.12). They absolutely despise Quine (-.28, the largest correlation for a philosopher), and aren't fond of Hume (-.12) or Mill (-.11) either.
4. Anti-Realists: They tend to define truth in terms of our cognitive and epistemic faculties (.65) and to reject scientific realism (.6), a mind-independent and knowable external world (.53), metaphysical laws of nature (.43), and the notion that proper names have no meaning beyond their referent (.35).
They are extremely female (.17) and young (.15 correlation coefficient for year of birth). They work in ethics (.16), social/political philosophy (.16), and 17th-19th century philosophy (.11), avoiding metaphysics (-.2) and the philosophies of mind (-.15) and language (-.14). Their heroes are Kant (.23), Rawls (.14), and, interestingly, Hume (.11). They avoid analytic philosophy even more than the anti-naturalists do (-.17), and aren't fond of Russell (-.11).

5. Externalists: Really, they just like everything that anyone calls 'externalism'. They think the content of our mental lives in general (.66) and perception in particular (.55), and the justification for our beliefs (.64), all depend significantly on the world outside our heads. They also think that you can fully understand a moral imperative without being at all motivated to obey it (.5).
6. Star Trek Haters: This group is less clearly defined than the above ones. The main thing uniting them is that they're thoroughly convinced that teleportation would mean death (.69). Beyond that, Trekophobes tend to be deontologists (.52) who don't switch on trolley dilemmas (.47) and like A theories of time (.41).
Trekophobes are relatively old (-.1) and American (.13 affiliation). They are quite rare in Australia and Asia (-.18 affiliation). They're fairly evenly distributed across philosophical fields, and tend to avoid weirdo intuitions-violating naturalists — Lewis (-.13), Hume (-.12), analytic philosophers generally (-.11).
7. Logical Conventionalists: They two-box on Newcomb's Problem (.58), reject nonclassical logics (.48), and reject epistemic relativism and contextualism (.48). So they love causal decision theory, think all propositions/facts are generally well-behaved (always either true or false and never both or neither), and think there are always facts about which things you know, independent of who's evaluating you. Suspiciously normal.
They're also fond of a wide variety of relatively uncontroversial, middle-of-the-road views most philosophers agree about or treat as 'the default' — political egalitarianism (.33), abstract object realism (.3), and atheism (.27). They tend to think zombies are metaphysically possible (.26) and to reject personal identity reductionism (.26) — which aren't metaphysically innocent or uncontroversial positions, but, again, do seem to be remarkably straightforward and banal approaches to all these problems. Notice that a lot of these positions are intuitive and 'obvious' in isolation, but that they don't converge upon any coherent world-view or consistent methodology. They clearly aren't hard-nosed philosophical conservatives like the Anti-Naturalists, Objectivists, Rationalists, and Trekophobes, but they also clearly aren't upstart radicals like the Externalists (on the analytic side) or the Anti-Realists (on the continental side). They're just kind of, well... obvious.
Conventionalists are the only identified group that are strongly analytic in orientation (.19). They tend to work in epistemology (.16) or philosophy of language (.12), and are rarely found in 17th-19th century (-.12) or continental (-.11) philosophy. They're influenced by notorious two-boxer and modal realist David Lewis (.1), and show an aversion to Hegel (-.12), Aristotle (-.11), and and Wittgenstein (-.1).
An observation: Different philosophers rely on — and fall victim to — substantially different groups of methods and intuitions. A few simple heuristics, like 'don't believe weird things until someone conclusively demonstrates them' and 'believe things that seem to be important metaphysical correlates for basic human institutions' and 'fall in love with any views starting with "ext"', explain a surprising amount of diversity. And there are clear common tendencies to either trust one's own rationality or to distrust it in partial (Externalism) or pathological (Anti-Realism, Anti-Naturalism) ways. But the heuristics don't hang together in a single Philosophical World-View or Way Of Doing Things, or even in two or three such world-views.
There is no large, coherent, consolidated group that's particularly attractive to LWers across the board, but philosophers seem to fall short of LW expectations for some quite distinct reasons. So attempting to criticize, persuade, shame, praise, or even speak of or address philosophers as a whole may be a bad idea. I'd expect it to be more productive to target specific 'load-bearing' doctrines on dimensions like the above than to treat the group as a monolith, for many of the same reasons we don't want to treat 'scientists' or 'mathematicians' as monoliths.
Another important result: Something is going seriously wrong with the high-level training and enculturation of professional philosophers. Or fields are just attracting thinkers who are disproportionately bad at critically assessing a number of the basic claims their field is predicated on or exists to assess.
Philosophers working in decision theory are drastically worse at Newcomb than are other philosophers, two-boxing 70.38% of the time where non-specialists two-box 59.07% of the time (normalized after getting rid of 'Other' answers). Philosophers of religion are the most likely to get questions about religion wrong — 79.13% are theists (compared to 13.22% of non-specialists), and they tend strongly toward the Anti-Naturalism dimension. Non-aestheticians think aesthetic value is objective 53.64% of the time; aestheticians think it's objective 73.88% of the time. Working in epistemology tends to make you an internalist, philosophy of science tends to make you a Humean, metaphysics a Platonist, ethics a deontologist. This isn't always the case; but it's genuinely troubling to see non-expertise emerge as a predictor of getting any important question in an academic field right.
EDIT: I've replaced "cluster" talk above with "dimension" talk. I had in mind gjm's "clusters in philosophical idea-space", not distinct groups of philosophers. gjm makes this especially clear:
The claim about these positions being made by the authors of the paper is not, not even a little bit, "most philosophers fall into one of these seven categories". It is "you can generally tell most of what there is to know about a philosopher's opinions if you know how well they fit or don't fit each of these seven categories". Not "philosopher-space is mostly made up of these seven pieces" but "philosopher-space is approximately seven-dimensional".
I'm particularly guilty of promoting this misunderstanding (including in portions of my own brain) by not noting that the dimensions can be flipped to speak of (anti-anti-)naturalists, anti-rationalists, etc. My apologies. As Douglas_Knight notes below, "If there are clusters [of philosophers], PCA might find them, but PCA might tell you something interesting even if there are no clusters. But if there are clusters, the factors that PCA finds won't be the clusters, but the differences between them. [...] Actually, factor analysis pretty much assumes that there aren't clusters. If factor 1 put you in a cluster, that would tell pretty much all there is to say and would pin down your factor 2, but the idea in factor analysis is that your factor 2 is designed to be as free as possible, despite knowing factor 1."
Normativity and Meta-Philosophy
I find Eliezer's explanation of what "should" means to be unsatisfactory, and here's an attempt to do better. Consider the following usages of the word:
- You should stop building piles of X pebbles because X = Y*Z.
- We should kill that police informer and dump his body in the river.
- You should one-box in Newcomb's problem.
All of these seem to be sensible sentences, depending on the speaker and intended audience. #1, for example, seems a reasonable translation of what a pebblesorter would say after discovering that X = Y*Z. Some might argue for "pebblesorter::should" instead of plain "should", but it's hard to deny that we need "should" in some form to fill the blank there for a translation, and I think few people besides Eliezer would object to plain "should".
Normativity, or the idea that there's something in common about how "should" and similar words are used in different contexts, is an active area in academic philosophy. I won't try to survey the current theories, but my current thinking is that "should" usually means "better according to some shared, motivating standard or procedure of evaluation", but occasionally it can also be used to instill such a standard or procedure of evaluation in someone (such as a child) who is open to being instilled by the speaker/writer.
It seems to me that different people (including different humans) can have different motivating standards and procedures of evaluation, and apparent disagreements about "should' sentences can arise from having different standards/procedures or from disagreement about whether something is better according to a shared standard/procedure. In most areas my personal procedure of evaluation is something that might be called "doing philosophy" but many people apparently do not share this. For example a religious extremist may have been taught by their parents, teachers, or peers to follow some rigid moral code given in their holy books, and not be open to any philosophical arguments that I can offer.
Of course this isn't a fully satisfactory theory of normativity since I don't know what "philosophy" really is (and I'm not even sure it really is a thing). But it does help explain how "should" in morality might relate to "should" in other areas such as decision theory, does not require assuming that all humans ultimately share the same morality, and avoids the need for linguistic contortions such as "pebblesorter::should".
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)