An Attempt at Preference Uncertainty Using VNM
(This is a (possibly perpetual) draft of some work that we (I) did at the Vancouver meetup. Thanks to my meetup buddies for letting me use their brains as supplementary computational substrate. Sorry about how ugly the LaTeX is; is there a way to make this all look a bit nicer?)
(Large swaths of this are obsolete. Thanks for the input, LW!)
The Problem of Decision Under Preference Uncertainty
Suppose you are uncertain whether it is good to eat meat or not. It could be OK, or it could be very bad, but having not done the thinking, you are uncertain. And yet you have to decide what to eat now; is it going to be the tasty hamburger or the morally pure vegetarian salad?
You have multiple theories about your preferences that contradict in their assessment, and you want to make the best decision. How would you decide, even in principle, when you have such uncertainty? This is the problem of Preference Uncertainty.
Preference Uncertainty is a daily fact of life for humans; we simply don't have introspective access to our raw preferences in many cases, but we still want to make the best decisions we can. Just going with our intuitions about what seems most awesome is usually sufficient, but on higher-stakes decisions and theoretical reasoning, we want formal methods with more transparent reasoning processes. We especially like transparent formal methods if we want to create a Friendly AI.
There is unfortunately very little formal analysis of the preference uncertainty problem, and what has been done is incomplete and more philosophical than formal. Nonetheless, there has been some good work in the last few years. I'll refer you to Crouch's thesis if you're interested in that.
Using VNM
I'm going to assume VNM. That is, that rational preferences imply a utility function, and we decide between lotteries, choosing the one with highest expected utility.
The implications here are that the possible moral theories () each have an associated utility function (
) that represents their preferences. Also by VNM, our solution to preference uncertainty is a utility function
.
We are uncertain between moral theories, so we have a probability distribution over moral theories .
To make decisions, we need a way to compute the expected value of some lottery . Each lottery is essentially a probability distribution over the set of possible outcomes
.
Since we have uncertainty over multiple things (), the domain of the final preference structure is both moral theories and outcomes:
.
Now for some conditions. In the degenerate case of full confidence in one moral theory , the overall preferences should agree with that theory:
For some and
representing the degrees of freedom in utility function equality. That condition actually already contains most of the specification of
.
.
So we have a utility function, except for those unknown scaling and offset constants, which undo the arbitrariness in the basis and scale used to define each individual utility function.
Thus overall expectation looks like this:
.
This is still incomplete, though. If we want to get actual decisions, we need to pin down each and
.
Offsets and Scales
You'll see above that the probability distribution over is not dependent on the particular lottery, while
is a function of lottery. This is because I assumed that actions can't change what is right.
With this assumption, the contribution of the 's can be entirely factored out:
.
This makes it obvious that the effect of the 's is an additive constant that affects all lotteries the same way and thus never affects preferences. Thus we can set them to any value that is convenient; for this article, all
.
A similar process allows us to arbitrarily set exactly one of the .
The remaining values of actually affect decisions, so setting them arbitrarily has real consequences. To illustrate, consider the opening example of choosing lunch between a
and
when unsure about the moral status of meat.
Making up some details, we might have and
and
. Importing this into the framework described thus far, we might have the following payoff table:
| Moral Theory | U'(Burger) | U'(Salad) | (P(m)) |
|---|---|---|---|
| Meat OK (meat) | 1 | 0 | (0.7) |
| Meat Bad (veg) | 0 | k_veg | (0.3) |
| (expectation) | 0.7 | 0.3*k_veg | (1) |
We can see that with those probabilities, the expected value of exceeds that of
when
(when
), so the decision hinges on the value of that parameter.
The value of can be interpreted as the "intertheoretic weight" of a utility function candidate for the purposes of intertheoretic value comparisons.
In general, if then you have exactly
missing intertheoretic weights that determine how you respond to situations with preference uncertainty. These could be pinned down if you had
independent equations representing indifference scenarios.
For example, if we had when
, then we would have
, and the above decision would be determined in favor of the
.
Expressing Arbitrary Preferences
Preferences are arbitrary, in the sense that we should be able to want whatever we want to want, so our mathematical constructions should not dictate or limit our preferences. If they do, we should just decide to disagree.
What that means here is that because the values of drive important preferences (like at what probability you feel it is safe to eat meat), the math must leave them unconstrained, to be selected by whatever moral reasoning process it is that selected the candidate utility functions and gave them probabilities in the first place.
We could ignore this idea and attempt to use a "normalization" scheme to pin down the intertheoretic weights from the object level preferences without having to use additional moral reasoning. For example, we could dictate that the "variance" of each candidate utility function equals 1 (with some measure assignment over outcomes), which would divide out the arbitrary scales used to define the candidate utility functions, preventing dominance by arbitrary factors that shouldn't matter.
Consider that any given assignment of intertheoretic weights is equivalent to some set of indifference scenarios (like the one we used above for vegetarianism). For example, the above normalization scheme gives us the indifference scenario when
.
If I find that I am actually indifferent at like above, then I'm out of luck, unable to express this very reasonable preference. On the other hand, I can simply reject the normalization scheme and keep my preferences intact, which I much prefer.
(Notice that the normalization scheme was an unjustifiably privileged hypothesis from the beginning; we didn't argue that it was necessary, we simply pulled it out of thin air for no reason, so its failure was predictable.)
Thus I reassert that the 's are free parameters to be set accordance with our actual intertheoretic preferences, on pain of stupidity. Consider an analogy to the move from ordinal to cardinal utilities; when you add risk, you need more degrees of freedom in your preferences to express how you might respond to that risk, and you need to actually think about what you want those values to be.
Uncertainty Over Intertheoretic Weights
(This section is less solid than the others. Watch your step.)
A weakness in the constructions described so far is that they assume that we have access to perfect knowledge of intertheoretic preferences, even though the whole problem is that we are unable to find perfect knowledge of our preferences.
It seems intuitively that we could have a probability distribution over each . If we do this, making decisions is not much complicated, I think; a simple expectation should still work.
If expectation is the way, the expectation over can be factored out (by linearity or something). Thus in any given decision with fixed preference uncertainties, we can pretend to have perfect knowledge of
.
Despite the seeming triviality of the above idea for dealing with uncertainty over , I haven't formalized it much. We'll see if I figure it out soon, but for now, it would be foolish to make too many assumptions about this. Thus the rest of this article still assumes perfect knowledge of
, on the expectation that we can extend it later.
Learning Values, Among Other Things
Strictly speaking, inference across the is-ought gap is not valid, but we do it every time we act on our moral intuitions, which are just physical facts about our minds. Strictly speaking, inferring future events from past observations (induction) is not valid either, but it doesn't bother us much:
We deal with induction by defining an arbitrary (but good-seeming, on reflection) prior joint probability distribution over observations and events. We can handle the is-ought gap the same way: instead of separate probability distributions over events and moral facts
, we define a joint prior over
. Then learning value is just Bayesian updates on partial observations of
. Note that this prior subsumes induction.
Making decisions is still just maximizing expected utility with our constructions from above, though we will have to be careful to make sure that remains independent of the particular lottery.
The problem of how to define such a prior is beyond the scope of this article. I will note that this "moral prior" idea is the solid foundation on which to base Indirect Normativity schemes like Yudkowsky's CEV and Christiano's boxed philosopher. I will hopefully discuss this further in the future.
Recap
The problem was how to make decisions when you are uncertain about what your object-level preferences should be. To solve it, I assumed VNM, in particular that we have a set of possible utility functions, and we want to construct an overall utility function that does the right thing by those utility functions and their probabilities. The simple condition that the overall utility function should make the common sense choices in cases of moral certainty was sufficient to construct a utility function with a precise set of remaining degrees of freedom. The degrees of freedom being the intertheoretic weight and offset of each utility function candidate.
I showed that the offsets and an overall scale factor are superfluous, in the sense that they never affect the decision if we assume that actions don't affect what is desirable. The remaining intertheoretic weights do affect the decision, and I argued that they are critical to expressing whatever intertheoretic preferences we might want to have.
Uncertainty over intertheoretic weight seems tractable, but the details are still open.
I also mentioned that we can construct a joint distribution that allows us to embed value learning in normal Bayesian learning and induction. This "moral prior" would subsume induction and define how facts about the desirability of things could be inferred from physical observations like the opinions of moral philosophers. In particular, it would provide a solid foundation for Indirect Normativity schemes like CEV. The nature of this distribution is still open.
Open Questions
What are the details of how to deal with uncertainty over the intertheoretic weights? I am looking in particular for construction from an explicit set of reasonable assumptions like the above work, rather than simply pulling a method out of thin air unsupported.
What are the details of the Moral Prior? What is its nature? What implications does it have? What assumptions do we have to make to make it behave reasonably? How do we construct one that could be safely given to a superintelligence. This is going to be a lot of work.
I assumed that it is meaningful to assign probabilities over moral theories. Probability is closely tied up with utility, and probability over epiphenomena like preferences is especially difficult. It remains to be seen how much the framing here actually helps us, or if it effectively just disguises pulling a utility function out of a hat.
Is this at all correct? I should build it and see if it type-checks and does what it's supposed to do.
Utility Quilting
Related: Pinpointing Utility
Let's go for lunch at the Hypothetical Diner; I have something I want to discuss with you.
We will pick our lunch from the set of possible orders, and we will recieve a meal drawn from the set of possible meals, O.
Speaking in general, each possible order has an associated probability distribution over O. The Hypothetical Diner takes care to simplify your analysis; the probability distribution is trivial; you always get exactly what you ordered.
Again to simplify your lunch, the Hypothetical Diner offers only two choices on the menu: the Soup, and the Bagel.
To then complicate things so that we have something to talk about, suppose there is some set M of ways other things could be that may affect your preferences. Perhaps you have sore teeth on some days.
Suppose for the purposes of this hypothetical lunch date that you are VNM rational. Shocking, I know, but the hypothetical results are clear: you have a utility function, U. The domain of the utility function is the product of all the variables that affect your preferences (which meal, and whether your teeth are sore): U: M x O -> utility.
In our case, if your teeth are sore, you prefer the soup, as it is less painful. If your teeth are not sore, you prefer the bagel, because it is tastier:
U(sore & soup) > U(sore & bagel)
U(~sore & soup) < U(~sore & bagel)
Your global utility function can be partially applied to some m in M to get an "object-level" utility function U_m: O -> utility. Note that the restrictions of U made in this way need not have any resemblance to each other; they are completely separate.
It is convenient to think about and define these restricted "utility function patches" separately. Let's pick some units and datums so we can get concrete numbers for our utilities:
U_sore(soup) = 1 ; U_sore(bagel) = 0
U_unsore(soup) = 0 ; U_unsore(bagel) = 1
Those are separate utility functions, now, so we could pick units and datum seperately. Because of this, the sore numbers are totally incommensurable to the unsore numbers. *Don't try to comapre them between the UF's or you will get type-poisoning. The actual numbers are just a straightforward encoding of the preferences mentioned above.
What if we are unsure about where we fall in M? That is, we won't know whether our teeth are sore until we take the first bite. That is, we have a probability distribution over M. Maybe we are 70% sure that your teeth won't hurt you today. What should you order?
Well, it's usually a good idea to maximize expected utility:
EU(soup) = 30%*U(sore&soup) + 70%*U(~sore&soup) = ???
EU(bagel) = 30%*U(sore&bagel) + 70%*U(~sore&bagel) = ???
Suddenly we need those utility function patches to be commensuarable, so that we can actually compute these, but we went and defined them separately, darn. All is not lost though, recall that they are just restrictions of a global utility function to particular soreness-circumstance, with some (positive) linear transforms, f_m, thrown in to make the numbers nice:
f_sore(U(sore&soup)) = 1 ; f_sore(U(sore&bagel)) = 0
f_unsore(U(~sore&soup)) = 0 ; f_unsore(U(~sore&bagel)) = 1
At this point, it's just a bit of clever function-inverting and all is dandy. We can pick some linear transform g to be canonical, and transform all the utility function patches into that basis. So for all m, we can get g(U(m & o)) by inverting the f_m and then applying g:
g.U(sore & x) = (g.inv(f_sore).f_sore)(U(sore & x))
= k_sore*U_sore(x) + c_sore
g.U(~sore & x) = (g.inv(f_unsore).f_unsore)(U(~sore & x))
= k_unsore*U_unsore(x) + c_unsore
(I'm using . to represent composition of those transforms. I hope that's not too confusing.)
Linear transforms are really nice; all the inverting and composing collapses down to a scale k and an offset c for each utility function patch. Now we've turned our bag of utility function patches into a utility function quilt! One more bit of math before we get back to deciding what to eat:
EU(x) = P(sore) *(k_sore *U_sore(x) + c_sore) +
(1-P(sore))*(k_unsore*U_unsore(x) + c_unsore)
Notice that the terms involving c_m do not involve x, meaning that the c_m terms don't affect our decision, so we can cancel them out and forget they ever existed! This is only true because I've implicitly assumed that P(m) does not depend on our actions. If it did, like if we could go to the dentist or take some painkillers, then it would be P(m | x) and c_m would be relevent in the whole joint decision.
We can define the canonical utility basis g to be whatever we like (among positive linear transforms); for example, we can make it equal to f_sore so that we can at least keep the simple numbers from U_sore. Then we throw all the c_ms away, because they don't matter. Then it's just a matter of getting the remaining k_ms.
Ok, sorry, those last few paragraphs were rather abstract. Back to lunch. We just need to define these mysterious scaling constants and then we can order lunch. There is only one left; k_unsore. In general there will be n-1, where n is the size of M. I think the easiest way to approach this is to let k_unsore = 1/5 and see what that implies:
g.U(sore & soup) = 1 ; g.U(sore & bagel) = 0
g.U(~sore & soup) = 0 ; g.U(~sore & bagel) = 1/5
EU(soup) = (1-P(~sore))*1 = 0.3
EU(bagel) = P(~sore)*k_unsore = 0.14
EU(soup) > EU(bagel)
After all the arithmetic, it looks like if k_unsore = 1/5, even if we expect you to have nonsore teeth with P(sore) = 0.3, we are unsure enough and the relative importance is big enough that we should play safe safe and go with the soup anyways. In general we would choose soup if P(~sore) < 1/(k_unsore+1), or equivalently, if k_unsore < (1-P(~sore)/P(~sore).
So k is somehow the relative importance of possible preference stuctures under uncertainty. A smaller k in this lunch example means that the tastiness of a bagel over a soup is small relative to the pain saved by eating the soup instead. With this intuition, we can see that 1/5 is a somewhat reasonable value for this scenario, and for example, 1 would not be, and neither would 1/20
What if we are uncertain about k? Are we simply pushing the problem up some meta-chain? It turns out that no, we are not. Because k is linearly related to utility, you can simply use its expected value if it is uncertain.
It's kind of ugly to have these k_m's and these U_m's, so we can just reason over the product K x M instead of M and K seperately. This is nothing weird, it just means we have more utility function patches (Many of which encode the exact same object-level preferences).
In the most general case, the utility function patches in KxM are the space of all functions O -> RR, with offset equivalence, but not scale equivalence (Sovereign utility functions have full linear-transform equivalence, but these patches are only equivalent under offset). Remember, though, that these are just restricted patches of a single global utility function.
So what is the point of all this? Are we just playing in the VNM sandbox, or is this result actually interesting for anything besides sore teeth?
Perhaps Moral/Preference Uncertainty? I didn't mention it until now because it's easier to think about lunch than a philosophical minefield, but it is the point of this post. Sorry about that. Let's conclude with everything restated in terms of moral uncertainty.
TL;DR:
If we have:
-
A set of object-level outcomes
O, -
A set of "epiphenomenal" (outside of
O) 'moral' outcomesM, -
A probability distribution over
M, possibly correlated with uncertainty aboutO, but not in a way that allows our actions to influence uncertainty overM(that is, assuming moral facts cannot be changed by your actions.), -
A utility function over
Ofor each possible value ofM, (these can be arbitrary VNM-rational moral theories, as long as they share the same object-level), -
And we wish to be VNM rational over whatever uncertainty we have
then we can quilt together a global utility function U: (M,K,O) -> RR where and U(m,k,o) = k*U_m(o) so that EU(o) is the sum of all P(m)*E(k | m)*U_m(o)
Somehow this all seems like legal VNM.
Implications
So. Just the possible object-level preferences and a probability distribution over those is not enough to define our behaviour. We need to know the scale for each so we know how to act when uncertain. This is analogous to the switch from ordinal preferences to interval preferences when dealing with object-level uncertainty.
Now we have a well-defined framework for reasoning about preference uncertainty, if all our possible moral theories are VNM rational, moral facts are immutable, and we have a joint probability distribution over OxMxK.
In particular, updating your moral beliefs upon hearing new arguments is no longer a mysterious dynamic, it is just a bayesian update over possible moral theories.
This requires a "moral prior" that corellates moral outcomes and their relative scales to the observable evidence. In the lunch example, we implicitly used such a moral prior to update on observable thought experiments and conclude that 1/5 was a plausible value for k_unsore.
Moral evidence is probably things like preference thought-experiments, neuroscience and physics results, etc. The actual model for this, and discussion about the issues with defining and reasoning on such a prior are outside the scope of this post.
This whole argument couldn't prove its way out of a wet paper bag, and is merely suggestive. Bits and peices may be found incorrect, and formalization might change things a bit.
This framework requires that we have already worked out the outcome-space O (which we haven't), have limited our moral confusion to a set of VNM-rational moral theories over O (which we haven't), and have defined a "Moral Prior" so we can have a probability distribution over moral theories and their wieghts (which we haven't).
Nonetheless, we can sometimes get those things in special limited cases, and even in the general case, having a model for moral uncertainty and updating is a huge step up from the terrifying confusion I (and everyone I've talked to) had before working this out.
Proof of fungibility theorem
Appendix to: A fungibility theorem
Suppose that is a set and we have functions
. Recall that for
, we say that
is a Pareto improvement over
if for all
, we have
. And we say that it is a strong Pareto improvement if in addition there is some
for which
. We call
a Pareto optimum if there is no strong Pareto improvement over it.
Theorem. Let be a set and suppose
for
are functions satisfying the following property: For any
and any
, there exists an
such that for all
, we have
.
Then if an element of
is a Pareto optimum, then there exist nonnegative constants
such that the function
achieves a maximum at
.
Math appendix for: "Why you must maximize expected utility"
This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!
The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.
*
I represent preference relations as total preorders on a simplex
; define
,
,
and
in the obvious ways (e.g.,
iff both
and
, and
iff
but not
). Write
for the
'th unit vector in
.
In the following, I will always assume that satisfies the independence axiom: that is, for all
and
, we have
if and only if
. Note that the analogous statement with weak preferences follows from this:
holds iff
, which by independence is equivalent to
, which is just
.
Lemma 1 (more of a good thing is always better). If and
, then
.
Proof. Let . Then,
and
. Thus, the result follows from independence applied to
,
,
, and
.
Lemma 2. If and
, then there is a unique
such that
for
and
for
.
Proof. Let be the supremum of all
such that
(note that by assumption, this condition holds for
). Suppose that
. Then there is an
such that
. By Lemma 1, we have
, and the first assertion follows.
Suppose now that . Then by definition of
, we do not have
, which means that we have
, which was the second assertion.
Finally, uniqueness is obvious, because if both and
satisfied the condition, we would have
.
Definition 3. is much better than
, notation
or
, if there are neighbourhoods
of
and
of
(in the relative topology of
) such that we have
for all
and
. (In other words, the graph of
is the interior of the graph of
.) Write
or
when
(
is not much better than
), and
(
is about as good as
) when both
and
.
Theorem 4 (existence of a utility function). There is a such that for all
,
Unless for all
and
, there are
such that
.
Proof. Let be a worst and
a best outcome, i.e. let
be such that
for all
. If
, then
for all
, and by repeated applications of independence we get
for all
, and therefore
again for all
, and we can simply choose
.
Thus, suppose that . In this case, let
be such that for every
,
equals the unique
provided by Lemma 2 applied to
and
. Because of Lemma 1,
. Let
.
We first show that implies
. For every
, we either have
, in which case by Lemma 2 we have
for arbitrarily small
, or we have
, in which case we set
and find
. Set
. Now, by independence applied
times, we have
; analogously, we obtain
for arbitrarily small
. Thus, using
and Lemma 1,
and therefore
as claimed. Now note that if
, then this continues to hold for
and
in a sufficiently small neighbourhood of
and
, and therefore we have
.
Now suppose that . Since we have
and
, we can find points
and
arbitrarily close to
and
such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then,
by the preceding paragraph. But this implies that
, which completes the proof.
Corollary 5. is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.
Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem.
Corollary 6. is unique up to affine transformations.
Proof. Since is a VNM utility function for
, this follows from the analogous result for that case.
Corollary 7. Unless for all
, for all
the set
has lower dimension than
(i.e., it is the intersection of
with a lower-dimensional subspace of
).
Proof. First, note that the assumption implies that . Let
be given by
,
, and note that
is the intersection of the hyperplane
with the closed positive orthant
. By the theorem,
is not parallel to
, so the hyperplane
is not parallel to
. It follows that
has dimension
, and therefore
can have at most this dimension. (It can have smaller dimension or be the empty set if
only touches or lies entirely outside the positive orthant.)
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)