I'm generally a fan of pursuing this sort of moral realism of the ideals, but I want to point out one very hazardous amoral hole in the world that I don't think it will ever be able to bridge over for us, lest anyone assume otherwise, and fall into the hole by being lax and building unaligned AGI because they think it will be kinder than it will.
(I don't say this lightly: Confidently assuming kindness that we wont get as a result of overextended faith in moral realism, and thus taking on catastrophically bad alignment strategies, is a pattern I see shockingly often in abstract thinkers I have known. It's a very real thing.)
There seems to be a rule that inevitable power differentials actually have to be allowed to play out.
It only seems to apply to inevitable power differentials, it's interesting that it doesn't seem to apply to situations where power differentials emerge due to happenstance (for instance, differentials favoring whichever tribe unknowingly took residence in up on copper-rich geographies before anyone knew about smelting). In those situations, FDT agents might choose to essentially redistribute, to consummate an old insurance policy against ending up on the bad end of colonization, to swap land, to send metal tools, to generally treat the less fortunate tribes equitably, to share their power. They certainly will if their utility function gives diminishing returns to power, and wealth in humans often seems to be that way, maybe relating to benefits from trade or something, (when the utility function gives increasing returns, on the other hand... well, let's not talk about that.)
But the insurance policy can't apply in every situation, consider: it seems obviously wrong to extend moral equity to, for example, a hypothetical or fictional species that can't possibly emerge naturally, which you'd then have to abiogenerate.
And this seems to apply to descendents of non-fictional extinct species too. You have to accept that the species who evolved to strongly select themselves for, say, over-exploiting their environment to irrecoverable degrees, and starved, even if they existed, their descendants now don't, and couldn't have, so you don't owe them anything now.
It's obvious with chosen differentials (true neartermists, for instance, choose to continuously sell their power, because power over the future is less valuable to them than flourishing in the present). But I don't really know how to draw the line as crisply as we need it, between accidental and inevitable differentials.
I'll keep thinking about it.
Because of these shifts, a “selfish” agent using FDT can end up making choices more similar to the choices of an altruistic CDT agent than a selfish CDT agent, for reasons closely related to the traditional moral intuition of universalizability.
Can't you just make decisions using functions which optimize outcomes for specific implementation? You'll need to choose how to aggregate scores under uncertainty, but this choice doesn't need to converge.
The debate over moral realism is often framed in terms of a binary question: are there ever objective facts about what’s moral to do in a given situation? The broader question of normative realism is also framed in a similar way: are there ever objective facts about what’s rational to do in a given situation? But I think we can understand these topics better by reframing them in terms of the question: how much do normative beliefs converge or diverge as ontologies improve? In other words: let’s stop thinking about whether we can derive normativity from nothing, and start thinking about how much normativity we can derive from how little, given that we continue to improve our understanding of the world. The core intuition behind this approach is that, even if a better understand of science and mathematics can’t directly tell us what we should value, it can heavily influence how our values develop over time.
Values under ontology improvements
By “ontology” I mean the set of concepts which we use to understand the world. Human ontologies are primarily formulated in terms of objects which persist over time, and which have certain properties and relationships. The details have changed greatly throughout history, though. To explain fire and disease, we used to appeal to spirits and curses; over time we removed them and added entities like phlogiston and miasmas; now we’ve removed those in turn and replaced them with oxidation and bacteria. In other cases, we still use old concepts, but with an understanding that they’re only approximations to more sophisticated ones - like absolute versus relative space and time. In other cases, we’ve added novel entities - like dark matter, or complex numbers - in order to explain novel phenomena.
I’d classify all of these changes as “improvements” to our ontologies. What specifically counts as an improvement (if anything) is an ongoing debate in the philosophy of science. For now, though, I’ll assume that readers share roughly common-sense intuitions about ontology improvement - e.g. the intuition that science has dramatically improved our ontologies over the last few centuries. Now imagine that our ontologies continue to dramatically improve as we come to better understand the world; and that we try to reformulate moral values from our old ontologies in terms of our new ontologies in a reasonable way. What might happen?
Here are two extreme options. Firstly, very similar moral values might end up in very different places, based on the details of how that reformulation happens, or just because the reformulation is quite sensitive to initial conditions. Or alternatively, perhaps even values which start off in very different places end up being very similar in the new ontology - e.g. because they turn out to refer to different aspects of the same underlying phenomenon. These, plus intermediate options between them, define a spectrum of possibilities. I’ll call the divergent end of this spectrum (which I’ve defended elsewhere) the “moral anti-realism” end, and the convergent end the “moral realism” end.
This will be much clearer with a few concrete examples (although note that these are only illustrative, because the specific beliefs involved are controversial). Consider two people with very different values: an egoist who only cares about their own pleasure, and a hedonic utilitarian. Now suppose that each of them comes to believe Parfit’s argument that personal identity is a matter of degree, so that now the concept of their one “future self” is no longer in their ontology. How might they map their old values to their new ontology? Not much changes for the hedonic utilitarian, but a reasonable egoist will start to place some value on the experiences of people who are “partially them”, who they previously didn’t care about at all. Even if the egoist’s priorities are still quite different from the utilitarian’s, their values might end up significantly closer together than they used to be.
An example going the other way: consider two deontologists who value non-coercion, and make significant sacrifices to avoid coercing others. Now consider an ontological shift where they start to think about themselves as being composed of many different subagents which care about different things - career, relationships, morality, etc. The question arises: does it count as “coercion” when one subagent puts a lot of pressure on the others, e.g. by inducing a strong feeling of guilt? It’s not clear that there’s a unique reasonable answer here. One deontologist might reformulate their values to only focus on avoiding coercion of others, even when they need to “force themselves” to do so. The other might decide that internal coercion is also something they care about avoiding, and reduce the extent to which they let their “morality” subagent impose its will on the others. So, from a very similar starting point, they’ve diverged significantly under (what we’re assuming is an) ontological improvement.
Other examples of big ontological shifts: converting from theism to atheism; becoming an illusionist about consciousness; changing one’s position on free will; changing one’s mind about the act-omission distinction (e.g. because the intuitions for why it’s important fall apart in the face of counterexamples); starting to believe in a multiverse (which has implications for infinite ethics); and many others which we can’t imagine yet. Some of these shifts might be directly prompted by moral debate - but I think that most “moral progress” is downstream of ontological improvements driven by scientific progress. Here I’m just defining moral progress as reformulating values into a better ontology, in any reasonable way - where a person on the anti-realist side of the spectrum expects that there are many possible outcomes of moral progress; but a person on the realist side expects there are only a few, or perhaps just one.
Normative realism
So far I’ve leaned heavily on the idea of a “reasonable” reformulation. This is necessary because there are always some possible reformulations which end up very divergent from others. (For example, consider the reformulation “given a new ontology, just pretend to have the old ontology, and act according to the old values”.) So in order for the framework I’ve given so far to not just collapse into anti-realism, we need some constraints on what’s a “reasonable” or “rational” way to shift values from one ontology to another.
Does this require that we commit to the existence of facts about what’s rational or irrational? Here I’ll just apply the same move as I did in the moral realism case. Suppose that we have a set of judgments or criteria about what counts as rational, in our current ontology. For example, our current ontology includes “beliefs”, “values”, “decisions”, etc; and most of us would classify the claim “I no longer believe that ‘souls’ are a meaningful concept, but I still value people’s souls” as irrational. But our ontologies improve over time. For example, Kahneman and Tversky’s work on dual process theory (as well as the more general distinction between conscious and unconscious processing) clarifies that “beliefs” aren’t a unified category - we have different types of beliefs, and different types of preferences too. Meanwhile, the ontological shifts I mentioned before (about personal identity, and internal subagents) also have ramifications for what we mean when talking about beliefs, values, etc. If we try to map our judgements of what’s reasonable into our new ontology in a reflectively consistent way (i.e. a way that balances between being “reasonable” according to our old criteria, and “reasonable” according to our new criteria), what happens? Do different conceptions of rationality converge, or diverge? If they strongly converge (the “normative realist” position) then we can just define reasonableness in terms of similarity to whatever conception of rationality we’d converge to under ontological improvement. If they strongly diverge, then…well, we can respond however we’d like; anything goes!
I’m significantly more sympathetic to normative realism as a whole than moral realism, in particular because of various results in probability theory, utility theory, game theory, decision theory, machine learning, etc, which are providing increasingly strong constraints on rational behavior (e.g. by constructing different types of dutch books). In the next section, I’ll discuss one theory which led me to a particularly surprising ontological shift, and made me much more optimistic about normative realism. Having said that, I’m not as bullish on normative realism as some others; my best guess is that we’ll make some discoveries which significantly improve our understanding of what it means to be rational, but others which show us that there’s no “complete” understanding to be had (analogous to mathematical incompleteness theorems).
Functional decision theory as an ontological shift
There’s one particular ontological shift which inspired this essay, and which I think has dragged me significantly closer to the moral/normative realist end of the spectrum. I haven’t mentioned it so far, since it’s not very widely-accepted, but I’m confident enough that there’s something important there that I’d like to discuss it now. The ontological shift is the one from Causal Decision Theory (CDT) to Functional Decision Theory (FDT). I won’t explain this in detail, but in short: CDT tells us to make decisions using an ontology based on the choices of individual agents. FDT tells us to make decisions using an ontology based on the choices of functions which may be implemented in multiple agents (and by expanding the concepts of causation and possible worlds to include logical causation and counterpossible worlds).
Because of these shifts, a “selfish” agent using FDT can end up making choices more similar to the choices of an altruistic CDT agent than a selfish CDT agent, for reasons closely related to the traditional moral intuition of universalizability. FDT is still a very incomplete theory, but I find this a very surprising and persuasive example of how ontological improvements might drive convergence towards some aspects of morality, which made me understand for the first time how moral realism might be a coherent concept! (Another very interesting but more speculative point: one axis on which different versions of FDT vary is how “updateless” they are. Although we don’t know how to precisely specify updatelessness, increasingly updateless agents behave as if they’re increasingly altruistic, even towards other agents who could never reciprocate.)
Being unreasonable
Suppose an agent looks at a reformulation to a new ontology, and just refuses to accept it - e.g. “I no longer believe that ‘souls’ are a meaningful concept, but I still value people’s souls”. Well, we could tell them that they were being irrational; and most such agents care enough about rationality that this is a forceful objection. I think the framing I’ve used in this document makes this argument particularly compelling - when you move to a new ontology in which your old concepts are clearly inadequate or incoherent, then it’s pretty hard to defend the use of those old concepts. (This is a reframing of the philosophical debate on motivational internalism.)
But what if they said “I believe that I am being irrational, but I just refuse to stop being irrational”; how could we respond then? The standard answer is that we say “you lose” - we explain how we’ll be able to exploit them (e.g. via dutch books). Even when abstract “irrationality” is not compelling, “losing” often is. Again, that’s particularly true under ontology improvement. Suppose an agent says “well, I just won’t take bets from Dutch bookies”. But then, once they’ve improved their ontology enough to see that all decisions under uncertainty are a type of bet, they can’t do that - or at least they need to be much unreasonable to do so.
None of this is particularly novel. But one observation that I haven’t seen before: the “you lose” argument becomes increasingly compelling the bigger the world is. Suppose you and I only care about our wealth, but I use a discount rate 1% higher than yours. You tell me “look, in a century’s time I’ll end up twice as rich as you”. It might not be that hard for me to say “eh, whatever”. But suppose you tell me “we’re going to live for a millennium, after which I’ll predictably end up 20,000 times richer than you” - now it feels like a wealth-motivated agent would need to be much more unreasonable to continue applying high discounts. Or suppose that I’m in a Pascal’s mugging scenario where I’m promised very high rewards with very low probability. If I just shrug and say “I’m going to ignore all probabilities lower than one in a million”, then it might be pretty tricky to exploit me - a few simple heuristics might be able to prevent myself being dutch-booked. But suppose now that we live in a multiverse where every possible outcome plays out, in proportion to how likely it is. Now ignoring small probabilities could cause you to lose a large amount of value in a large number of multiverse branches - something which hooks into our intuitive sense of “unreasonableness” much more strongly than the idea of “ignoring small probabilities” does in the abstract. (Relatedly, I don’t think it’s a coincidence that utilitarianism has become so much more prominent in the same era where we’ve become so much more aware of the vastness of the universe around us.)
Why am I talking so much about “reasonableness” and moral persuasion? After all, agents which are more rational will tend to survive more often, acquire more resources, and become more influential: in the long term, evolution will do the persuasion for us. But it’s not clear that the future will be shaped by evolutionary pressures - it might be shaped by the decisions of goal-directed agents. Our civilization might be able to “lock in” certain constraints - like enough centralization of decision-making that the future is steered by arguments rather than evolution. And thinking about convergence towards rationality also gives us a handle for reasoning about artificial intelligence. In particular, it would be very valuable to know how much applying a minimal standard of reasonableness to their decisions would affect how goal-directed they’ll be, and how aligned their goals will be with our own.
How plausible is this reasoning?
I’ve been throwing around a lot of high-level concepts here, and I wouldn’t blame readers for feeling suspicious or confused. Unfortunately, I don’t have the time to make them clearer. In lieu of that, I’ll briefly mention three intuitions which contribute towards my belief that the position I’ve sketched in this document is a useful one.
Firstly, I see my reframing as a step away from essentialism, which seems to me to be the most common mistake in analytic philosophy. Sometimes it’s pragmatically useful to think in terms of clear-cut binary distinctions, but in general we should almost always aim to be able to ground out those binary distinctions in axes of continuous variation, to avoid our standard bias towards essentialism. In particular, the moral realism debate tends to focus on a single binary question (do agents converge to the same morality given no pre-existing moral commitments?) whereas I think it’d be much more insightful to focus on a less binary question (how small or large is the space of pre-existing moral commitments which will converge)?
Secondly, there’s a nice parallel between the view of morality which I’ve sketched out here, and the approach some mathematicians take, of looking at different sets of axioms to see whether they lead to similar or different conclusions. In our case, we’d like to understand whether similar starting intuitions and values will converge or diverge under a given approach to ontological reformulation. (I discuss the ethics-mathematics analogy in more detail here.) If we can make progress on meta-ethics by actually answering object-level questions like “how would my values change if I believed X?”, that helps address another common mistake in philosophy - failing to link abstract debates to concrete examples which can be deeply explored to improve philosophers’ intuitions about the problem.
And thirdly, I think this framing fits well with our existing experiences. Our values are strongly determined by evolved instincts and emotions, which operate using a more primitive ontology than the rest of our brains. So we’ve actually got plenty of experience in struggling to shift various values from one ontology to another, and the ways in which some people manage to do so, and some remain unreasonable throughout. We just need to imagine this process continuing as we come to understand the world far better than we do today.