Thinking about how these ideas are useful, you might be interested in my treatment of issues around metaethical uncertainty in this paper set to appear in the current issue of Journal of Consciousness Studies. I plan to write up something summarizing it soon on LW in light of its publication, but you can find posts that led to or were the product of some of its content here, here, here, here, and here.
(Note that this would involve not just predicting but actually causing changes in values. This should be done cautiously, if at all.)
Spreading information about octopi neurology doesn't seem to be an immoral act.
Epistemic status: This is basically a new categorisation scheme for, and analysis of, ideas that other people have proposed previously (both in relation to moral philosophy and in relation to AI alignment). I’m not an expert on the topics I cover here, and I’d appreciate feedback or comments in relation to any mistakes, unclear phrasings, etc. (and just in general!).
We are often forced to make decisions under conditions of uncertainty. This may be empirical uncertainty (e.g., what is the likelihood that nuclear war would cause human extinction?), or it may be moral uncertainty (e.g., is the wellbeing of future generations morally important?).
But what if you don’t believe that “morally important” is a coherent concept? What if you’re a moral antirealist and/or subjectivist, and thus reject the idea that there are any (objective) moral facts? Would existing work on moral uncertainty (see my prior posts) still be relevant to you?
I think that a lot of it is, to a large extent, for the reasons discussed in this footnote.[1] But I think that, to directly discuss how that work is relevant to antirealists and/or subjectivists, it would help to speak not of moral uncertainty but of value uncertainty (VU; i.e., uncertainty about what one “values” or “prefers”). Doing so also helps us to categorise different types of VU, and potential ways of resolving each of these types of VU. A final benefit is that such an analysis of VUs also has substantial relevance for moral realists, and for work on AI alignment.
So this post will:
Values
I should clarify a few points about what I mean by “values” in this post:
I mean what a person actually values (or would if they knew more, or will in the future, or would if “idealised”), not what a person’s explicitly endorsed moral theory suggests they should value.
For example, after a bunch of caveats and moral uncertainty, I roughly identify as a classical utilitarian. However, in reality, psychologically speaking, I also value things other than whatever maximally increases the wellbeing of conscious beings.
I’ll often talk about “a person’s” values, which could mean one’s own values, someone else’s values, a group’s values, or humanity as a whole’s values.
Types of value uncertainty
I’ll now name, briefly describe, and briefly suggest responses to four (overlapping) types of VU, and then two types of situations that aren’t VU but could appear to be VU. I hope to later write a post for each type of VU (and the two related situations), where I’ll go into more detail and highlight more connections to prior work (e.g., by Kaj Sotala and Justin Shovelain).
Note that this is not the only (and perhaps not the best) way to categorise types of VU or frame this sort of discussion. It also may leave out important types. I’m open to feedback about all of this, and in fact one motivation for summarising this categorisation scheme here and then later writing more about each type is that doing so allows those later posts to be influenced by feedback on this one.
Present
Description/cause: Present VU is uncertainty about a person’s (or group’s) current values. This occurs when multiple different sets of underlying values could explain the data (i.e., the behaviours you’ve observed from the person),[2] essentially creating a standard curve fitting problem.
One cause of Present VU is a lack of knowledge about something about the person other than their values, such as:
But Present VU can also occur when, even holding constant all of the above factors, different sets of values would lead to the same behaviours (e.g., breathing, or following convergent instrumental goals) in a given circumstance.
This type of VU seems similar to the focus of inverse reinforcement learning. However, with Present VU, the “learner” may not be an AI. (In fact, Present VU may involve you trying to learn your own values, in which case it could perhaps be thought of as “Introspective” VU.)
Potential ways to resolve this: Gather more data, or do more thinking, regarding the person’s decision theory, rationality, capabilities, and/or beliefs.
Think about the assumptions you’re making about those factors. Try making different assumptions, and/or “minimal” assumptions. (Similar ideas, and some difficulties with them, have been discussed before by Armstrong and Worley, among others.)
Observe more of the person’s behaviours, ideally under different circumstances.
(Similar data and thinking regarding other people’s behaviours, rationality, etc. may also help to some extent. E.g., evidence about the degree to which people in general hyperbolically discount things could help you interpret the behaviour of some other person for whom there is no such data.)
Informational
Description/cause: Informational VU is uncertainty about what a person’s (or group’s) values would be if their knowledge or beliefs improved.[3]
We could divide the potential sources of improved knowledge or beliefs into three categories:
Potential ways to resolve this: Think about (or use models/simulations to work out) what new experiences, new facts, or improvements in ontologies would be most likely to affect the person’s values, and how the person’s values would change in response. (This could perhaps be informed by ideas from value of information analysis and sensitivity analysis.)
Try to expose the person to, or teach them about, these (or other) new experiences, new facts, and improvements in ontologies. (Note that this would involve not just predicting but actually causing changes in values. This should be done cautiously, if at all.)
For resolving uncertainty about how your own values would change if your knowledge or beliefs improved, this might look like just learning a lot, particularly about things you’re especially uncertain about and that seem especially relevant to your values.
Predictive
Description/cause: Predictive VU is uncertainty about what a person’s (or group’s) values will be in the future.[5]
This overlaps with Present and Informational VU, because:
But Predictive VU also includes uncertainty about changes that will occur in a person’s values for other reasons, such as:
Potential ways to resolve this: The potential methods for resolving Present and Informational VU are relevant for parts of Predictive VU. E.g., thinking about what a person will learn about, and how it will affect their values, can help resolve parts of both Informational and Predictive VU.
Also, for all parts of this VU, techniques that are effective for prediction in general (e.g., reference class forecasting) should be useful. E.g., an aspiring effective altruist could predict that their values are likely to shift away from “typical EA values” over time, based on data indicating that that’s a common pattern, as well as the more general end-of-history illusion.
Idealised
Description/cause: Idealised VU is uncertainty about:
(Note that it seems to me that it’s hard to actually specify those key terms, and I won’t properly try to do so here; more details can be found in the links.)
This overlaps with the other types of VU, in that:
Potential ways to resolve this: This depends substantially on what we mean by the hard-to-specify terms involved. Also, it might be impossible or highly impractical to actually work out what values would result from the idealisation or extrapolation process, due to issues such as limited computing power.
But we can perhaps try to approximate such an idealisation or extrapolation process, or predict approximately what it would result in, using methods like:
Learning more
Engaging in more moral reflection
Trying to forecast (using best practices) what many simulations of the person would say if they’d had lots of time to learn and reflect more (see Muehlhauser)
Thinking about what apparent “moral progress” in the past has looked like, and what changes from current values might result from similar processes of change
(For details, see the sources linked to at the start of this subsection.)
Situations that could look like value uncertainty
I’ll now briefly discuss two other types of situations in which a person (or group) actually isn’t uncertain about their values, but could appear to be, or could even believe themselves to be.
Value conflict
Description/cause: Value conflict (VC) is when some or all of the values a person (or group) actually has are in conflict with each other. It’s like the person has multiple, competing utility functions, or different “parts of themselves” pushing them in different directions.
E.g., Dana is someone whose values include both maximising welfare and absolutely respecting people’s rights; it’s not simply that she’s uncertain which value she actually has deep down.
In some ways, the results of this can be the same as the results of VU (particularly Present VU). For this reason, the person’s situation may be misdiagnosed as VU by themselves or by others. (E.g., Dana may try to figure out which of those somewhat conflicting values she “really” has, rather than realising that she really has both.)
Potential ways to respond: It seems unclear whether VC is a “problem”, as opposed to an acceptable result of the fragility and complexity of our value systems. It thus also seems unclear whether and how one should try to “solve” it. That said, it seems like three of the most obvious options for “solving” it are to:
Engage in, approximate, or estimate the results of “idealisation” with regards to the conflicting values
(Note that, if one is trying to help someone else “solve” their VC, one might instead encourage or help that person to use this or the following options)
Use approaches similar to those meant for decision-making under moral uncertainty (see also this), except that here the person is actually certain about their values, so the “weight” given to the values is based on something like how “important” those values feel, rather than degree of belief in them
Embrace moral pluralism
(Related discussion can be found in the “Moral pluralism” section of this post.)
If the goal is just to understand the person or predict their behaviours (rather than helping them to “resolve” their conflict), then one might instead think about, model, or simulate what would happen if the person used one or more of the above options.
Merely professed VU (or VC)
Description/cause: Merely professed VU (or merely professed VC) is when a person claims to be uncertain about their values (or to have multiple, conflicting values), despite this not being the case. They may do this for game-theoretic, signalling, or bargaining reasons.
An example of merely professed VU: Eric is certain about his values, and really wants to influence you to have similar values. But he also thinks that, if you believe that he’s uncertain and open to changing his mind, you’ll be more open to talking about values with him and to changing your mind. Thus, he feigns VU.
An example of merely professed VC: Fatma actually knows that she values only her own wellbeing, but she wishes to gain resources from altruists. To do so, she claims that there’s “part of her” that values benefiting only herself, and “another part of her” that values helping others.
(Perhaps this sort of thing could also play out on an unconscious level, so that the person themselves genuinely believes that they have VU or VC. But then it seems hard to disentangle this from actual VU or VC.)
Potential ways to respond: I haven’t thought much about this, and I think how to respond would depend a lot on what one wishes the responses to achieve and on the specific situation. It seems like often one should respond in the same ways that are generally useful when someone may be lying to you or trying to manipulate you.
Who and what are these ideas useful for?
Antirealists and/or subjectivists
As noted in the introduction, one purpose of this post is to explicitly discuss the ways in which (something like) moral uncertainty is relevant for moral antirealists and/or subjectivists.
Roughly speaking (see Joyce for details), a moral antirealist is someone who accepts one of the three following claims:
Noncognitivism: The position that moral sentences are neither true nor false; they are not beliefs or factual claims. For example, moral sentences might express one’s emotions (e.g., “Murder is bad” might mean something like “Murder - boo!”).
Error theory: The position that moral sentences are meant to be beliefs or factual claims, but are just never true, as there are simply no moral facts. Error theory is similar to “nihilism”.
Subjectivism (or non-objectivism): “moral facts exist and are mind-dependent (in the relevant sense)” (Joyce). In other words, moral claims can be true, but their truth or falsity depends on someone’s judgement, rather than being simply an objective fact about the universe.
(In contrast, a moral realist is someone who rejects all three of those claims. Thus, moral realists believe that moral sentences do (at least sometimes) reflect beliefs or factual claims, that they can sometimes be true, and that their truth or falsity is objective.[6]
Of these types of antirealism, VU (and VC) is most clearly relevant in the case of subjectivism. For example, many subjectivists think that their own values (or their future or idealised values) are at least part of what determines the truth or falsity of moral claims. For these people, resolving uncertainty about their own values should seem very important. Other subjectivists may want to resolve uncertainty about the present, future, or idealised values of their society, of humanity as a whole, of all intelligent life, or something else like that (depending on what they think determines moral truth).
It’s less clear what special relevance VU would have for noncognitivists or error theorists (ignoring the argument that they should be metaethically uncertain about those positions). That said:
Moral realists
For moral realists, standard work on moral uncertainty is already clearly relevant. That said, VU still has additional relevance even for moral realists, because:
AI alignment
Ideally, we want our AIs to act in accordance with what we truly value (or what we’d value after some process of idealisation or CEV). Depending on definitions, this may be seen as the core of AI alignment, as one important part of AI alignment, or as at least a nice bonus (e.g., if we use Paul Christiano’s definition).
As such, recognising and resolving VUs (and VCs) seems of very clear relevance to AI alignment work. This seems somewhat evidenced by how many VU-related ideas I found in previous alignment-related work (e.g., value learning, inverse reinforcement learning, Stuart Armstrong’s research agenda, and CEV). Indeed, a major reason why I’m interested in the topic of VU is its relevance to AI alignment, and I hope that this post can provide useful concepts and framings for others who are also interested in AI alignment.
As mentioned earlier, please do comment if you think there are better categorisations/framings for this topic, better names, additional types worth mentioning, mistakes I’ve made, or whatever.
My thanks to Justin Shovelain and David Kristoffersson of Convergence Analysis for helpful discussions and feedback on this post.
Firstly, even someone “convinced” by antirealism and/or subjectivism probably shouldn’t be certain about those positions. Thus, such people should probably act as if metaethically uncertain, and that requires concepts and responses somewhat similar to those discussed in existing work on moral uncertainty. (See this post’s section on “Metaethical uncertainty”.)
Relevantly, MacAskill writes: “even if one endorsed a meta-ethical view that is inconsistent with the idea that there’s value in gaining more moral information, one should not be certain in that meta-ethical view. And it’s high-stakes whether that view is true — if there are moral facts out there but one thinks there aren’t, that’s a big deal! Even for this sort of antirealist, then, there’s therefore value in moral information, because there’s value in finding out for certain whether that meta-ethical view is correct.”
Secondly, a lot of existing work on moral uncertainty isn’t (explicitly) premised on moral realism.
Thirdly, in practice, many similar concepts and principles will be useful for:
(This third point has been discussed by, for example, Stuart Armstrong.) ↩︎
Here I use the term “behaviour” very broadly, to include not just our “physical actions” but also what decisions we make, what we say, and what we think (at least on a conscious level). This is because any of these could provide data about underlying values. So some examples of what I’d count as “behaviours” include:
I haven’t listed a specific type of VU for uncertainty about what a person’s values would be if their knowledge of beliefs changed, whether for the better or not. This is because I don’t see that as being particularly worth knowing about in its own right, separate from the other types of VU. But the next type of VU (Predictive VU) does incorporate uncertainty about what a person’s values will be after the changes that will occur to their knowledge or beliefs (whether or not these changes are improvements). ↩︎
Note the (somewhat fuzzy) distinction from Present VU:
One could argue that it doesn’t make sense to talk of “a person’s values changing”. Such arguments could be based on the idea that an “agent” is partly defined by its values (or utility function, or whatever), or the idea that people don’t fundamentally retain the “same identity” over time anyway. For this post, I wish to mostly set aside such complexities, and lean instead on the fact that it’s often useful to think and speak as if “the same person” persists over time (and even despite partial changes in values).
But I do think those complexities are worth acknowledging. One reason is that they can remind us to not take it for granted that a person will or should currently care about “their future self’s values” (or “their future self’s” ability to act on their values). This applies especially if the person doesn’t see any reason to care about other people’s values (or abilities to achieve their values). Ruairi Donnelly discusses similar points, and links this to the concept of value drift.
Similar points could also be raised regarding the type of VU I’ll cover next: Idealised VU. (See Armstrong for somewhat related ideas.) ↩︎
However, some philosophers classify subjectivists as moral realists instead of as antirealists. To account for this, some sources distinguish between minimal moral realism (which includes subjectivists) and robust moral realism (which excludes them). This is why I sometimes write “antirealists and/or subjectivists”.) ↩︎