I don't know what my values are. I don't even know how to find out what my values are. But do I know something about how I (or an FAI) may be able to find out what my values are? Perhaps... and I've organized my answer to this question in the form of an "Outline of Possible Sources of Values". I hope it also serves as a summary of the major open problems in this area.
- External
- god(s)
- other humans
- other agents
- Behavioral
- actual (historical/observed) behavior
- counterfactual (simulated/predicted) behavior
- Subconscious Cognition
- model-based decision making
- ontology
- heuristics for extrapolating/updating model
- (partial) utility function
- model-free decision making
- identity based (adopt a social role like "environmentalist" or "academic" and emulate an appropriate role model, actual or idealized)
- habits
- reinforcement based
- model-based decision making
- Conscious Cognition
- decision making using explicit verbal and/or quantitative reasoning
- consequentialist (similar to model-based above, but using explicit reasoning)
- deontological
- virtue ethical
- identity based
- reasoning about terminal goals/values/preferences/moral principles
- responses (changes in state) to moral arguments (possibly context dependent)
- distributions of autonomously generated moral arguments (possibly context dependent)
- logical structure (if any) of moral reasoning
- object-level intuitions/judgments
- about what one should do in particular ethical situations
- about the desirabilities of particular outcomes
- about moral principles
- meta-level intuitions/judgments
- about the nature of morality
- about the complexity of values
- about what the valid sources of values are
- about what constitutes correct moral reasoning
- about how to explicitly/formally/effectively represent values (utility function, multiple utility functions, deontological rules, or something else) (if utility function(s), for what decision theory and ontology?)
- about how to extract/translate/combine sources of values into a representation of values
- how to solve ontological crisis
- how to deal with native utility function or revealed preferences being partial
- how to translate non-consequentialist sources of values into utility function(s)
- how to deal with moral principles being vague and incomplete
- how to deal with conflicts between different sources of values
- how to deal with lack of certainty in one's intuitions/judgments
- whose intuition/judgment ought to be applied? (may be different for each of the above)
- the subject's (at what point in time? current intuitions, eventual judgments, or something in between?)
- the FAI designers'
- the FAI's own philosophical conclusions
- decision making using explicit verbal and/or quantitative reasoning
Using this outline, we can obtain a concise understanding of what many metaethical theories and FAI proposals are claiming/suggesting and how they differ from each other. For example, Nyan_Sandwich's "morality is awesome" thesis can be interpreted as the claim that the most important source of values is our intuitions about the desirability (awesomeness) of particular outcomes.
As another example, Aaron Swartz argued against "reflective equilibrium" by which he meant the claim that the valid sources of values are our object-level moral intuitions, and that correct moral reasoning consists of working back and forth between these intuitions until they reach coherence. His own position was that intuitions about moral principles are the only valid source of values and we should discount our intuitions about particular ethical situations.
A final example is Paul Christiano's "Indirect Normativity" proposal (n.b., "Indirect Normativity" was originally coined by Nick Bostrom to refer to an entire class of designs where the AI's values are defined "indirectly") for FAI, where an important source of values is the distribution of moral arguments the subject is likely to generate in a particular simulated environment and their responses to those arguments. Also, just about every meta-level question is left for the (simulated) subject to answer, except for the decision theory and ontology of the utility function that their values must finally be encoded in, which is fixed by the FAI designer.
I think the outline includes most of the ideas brought up in past LW discussions, or in moral philosophies that I'm familiar with. Please let me know if I left out anything important.
When you say "values", do you mean instrumental values, or do you mean terminal values? If the former then the answer is simple. This is what we spend most of our time doing. Will tweaking my diet in this way cause me to have more energy? Will asking my friend in this particular way cause them to accept my request? Etc. This is as mundane as it gets.
If the latter, the answer is a bit more complicated, but really it shouldn't be all that confusing. As agents, we're built with motivation systems, where out of all possible sensory patterns, some present to us as neutral, others as inherently desirable, and the last subset as inherently undesirable. Some things can be more desirable or less desirable, etc., thus these sensory components each run on at least one dimension.
Sensory patterns that present originally as inherently neutral may either be left as irrelevant (these are the things put on auto-ignore, which are apt to make a return to one's conscious awareness if certain substances are taken, or if careful introspection is engaged in), or otherwise acquire a 'secondary' desirability or undesirability via being seen to be in causal connection with something that presents as inherently one way or the other, for example finding running enjoyable because of certain positive benefits acquired in the past from the activity.
Thus to discover one's terminal values, one must simply identify these inherently desirable sensory patterns, and figure out which ones would top the list as 'most desirable' (in terms of nothing other than how it strikes one's perception). A good heuristic for this would be to see what other people consider enjoyable or fun, and then try it, and see what happens, but at the same time making sure to disambiguate any identity issues from the whole thing, such as sexual hangups making one unable to enjoy something widely considered to have one of the strongest effects in terms of 'wanting to engage in this behavior because it's so great'--sexual or romantic interaction.
But at the most fundamental, there's nothing to the task of figuring out one's terminal values other than simply figuring out what sensory patterns are most 'enjoyable' in the most basic sort of way imaginable, on a timescale sufficiently long-term to be something one would be unlikely to refer to as 'akrasia'. Even someone literally physically unable to experience certain positive sensory patterns, such as someone with extremely low libido because of physiological problems, would most likely qualify as making a 'good choice' if they engage in a course of action apt to cause them to begin to be able to experience these sensory patterns, such as that person implementing a particular lifestyle protocol likely to fix their physiological issues and bring them libido to a healthy level.
It gets somewhat confusing when you factor in the fact that the sensory patterns one is able to experience can shift over time, such as libido increasing or decreasing, or going through puberty, or something like that, along with factoring in akrasia, and other problems that make us seem less 'coherent' of agents, but I believe all the fog can be cut through if one simply makes the observation that sensory patterns present to us as either neutral, inherently desirable, or inherently undesirable, and that the latter two run on a dimension of 'more or less'. Neutral sensory patterns acquire 'secondary' quality on these dimensions depending on what the agent believes to be their causal connections to other sensory patterns, each ultimately needing to run up against an 'inherently motivating' sensory pattern to acquire significance.
While I sympathize with you, I think you should decrease your threshold for apparent difficulty of problems.
For example, you should be able to choose between things that will make no sensory difference to you, such as the well-being of people in Xela. And of course you dodge the question of what is "enjoyab... (read more)