Model of unlosing agents
Some have expressed skepticism that "unlosing agents" can actually exist. So to provide an existence proof, here is a model of an unlosing agent. It's not a model you'd want to use constructively to build one, but it's sufficient for the existence result.
Let D be the set of all decisions the agent has made in the past, let U be the set of all utility functions that are compatible with those decisions, and let P be a "better than" relationship on the set of outcomes (possibly intransitive, dependent, incomplete, etc...).
By "utility functions that are compatible those decisions" I mean that an expected utility maximising agent with any u in U would reach the same decisions D as the agent actually did. Notice that U starts off infinitely large when D is empty; when the agent faces a new decision d, here is a decision criteria that leaves U non-empty:
- Restrict to the set of possible decision choices that would leave U non-empty. This is always possible, as any u in U would advocate for a particular decision choices du at d, and therefore choosing du would leave u in the updated U. Call this set compatible.
- Among those compatible choices, choose one that is the least incompatible with P, using some criteria (such as needing to do the least work to remove intransitivenesses and dependences and so on).
- Make that choice, and update P as in step 3, and update D and U (leaving U non-empty, as seen in step 1).
- Proceed.
That's the theory. In practice, we would want to restrict the utilities initially allowed into U to avoid really stupid utilities ("I like losing money to people called Rob at 15:46.34 every alternate Wednesday if the stock market is up; otherwise I don't.") When constructing the initial P and U, it could be a good start to be just looking at categories that humans natuarally express preferences between. But those are implementation details. And again, using this kind of explicit design violates the spirit of unlosing agents (unless the set U is defined in ways that are different from simply listing all u in U).
The proof that this agent is unlosing is that a) U will never be empty, and b) for any u in U, the agent will have behaved indistinguishably from a u-maximiser.
Expected utility, unlosing agents, and Pascal's mugging
Still very much a work in progress
EDIT: model/existence proof of unlosing agents can be found here.
Why do we bother about utility functions on Less Wrong? Well, because of results of the New man and the Morning Star, which showed that, essentially, if you make decisions, you better use something equivalent to expected utility maximisation. If you don't, you lose. Lose what? It doesn't matter, money, resources, whatever: the point is that any other system can be exploited by other agents or the universe itself to force you into a pointless loss. A pointless loss being a lose that give you no benefit or possibility of benefit - it's really bad.
The justifications for the axioms of expected utility are, roughly:
- (Completeness) "If you don't decide, you'll probably lose pointlessly."
- (Transitivity) "If your choices form loops, people can make you lose pointlessly."
- (Continuity/Achimedean) This axiom (and acceptable weaker versions of it) is much more subtle that it seems; "No choice is infinity important" is what it seems to say, but " 'I could have been a contender' isn't good enough" is closer to what it does. Anyway, that's a discussion for another time.
- (Independence) "If your choice aren't independent, people can expect to make you lose pointlessly."
Equivalency is not identity
A lot of people believe a subtlety different version of the result:
- If you don't have a utility function, you'll lose pointlessly.
This is wrong. The correct result is:
- If you don't lose pointlessly, then your decisions are equivalent with having a utility function.
Math appendix for: "Why you must maximize expected utility"
This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!
The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.
*
I represent preference relations as total preorders on a simplex
; define
,
,
and
in the obvious ways (e.g.,
iff both
and
, and
iff
but not
). Write
for the
'th unit vector in
.
In the following, I will always assume that satisfies the independence axiom: that is, for all
and
, we have
if and only if
. Note that the analogous statement with weak preferences follows from this:
holds iff
, which by independence is equivalent to
, which is just
.
Lemma 1 (more of a good thing is always better). If and
, then
.
Proof. Let . Then,
and
. Thus, the result follows from independence applied to
,
,
, and
.
Lemma 2. If and
, then there is a unique
such that
for
and
for
.
Proof. Let be the supremum of all
such that
(note that by assumption, this condition holds for
). Suppose that
. Then there is an
such that
. By Lemma 1, we have
, and the first assertion follows.
Suppose now that . Then by definition of
, we do not have
, which means that we have
, which was the second assertion.
Finally, uniqueness is obvious, because if both and
satisfied the condition, we would have
.
Definition 3. is much better than
, notation
or
, if there are neighbourhoods
of
and
of
(in the relative topology of
) such that we have
for all
and
. (In other words, the graph of
is the interior of the graph of
.) Write
or
when
(
is not much better than
), and
(
is about as good as
) when both
and
.
Theorem 4 (existence of a utility function). There is a such that for all
,
Unless for all
and
, there are
such that
.
Proof. Let be a worst and
a best outcome, i.e. let
be such that
for all
. If
, then
for all
, and by repeated applications of independence we get
for all
, and therefore
again for all
, and we can simply choose
.
Thus, suppose that . In this case, let
be such that for every
,
equals the unique
provided by Lemma 2 applied to
and
. Because of Lemma 1,
. Let
.
We first show that implies
. For every
, we either have
, in which case by Lemma 2 we have
for arbitrarily small
, or we have
, in which case we set
and find
. Set
. Now, by independence applied
times, we have
; analogously, we obtain
for arbitrarily small
. Thus, using
and Lemma 1,
and therefore
as claimed. Now note that if
, then this continues to hold for
and
in a sufficiently small neighbourhood of
and
, and therefore we have
.
Now suppose that . Since we have
and
, we can find points
and
arbitrarily close to
and
such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then,
by the preceding paragraph. But this implies that
, which completes the proof.
Corollary 5. is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.
Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem.
Corollary 6. is unique up to affine transformations.
Proof. Since is a VNM utility function for
, this follows from the analogous result for that case.
Corollary 7. Unless for all
, for all
the set
has lower dimension than
(i.e., it is the intersection of
with a lower-dimensional subspace of
).
Proof. First, note that the assumption implies that . Let
be given by
,
, and note that
is the intersection of the hyperplane
with the closed positive orthant
. By the theorem,
is not parallel to
, so the hyperplane
is not parallel to
. It follows that
has dimension
, and therefore
can have at most this dimension. (It can have smaller dimension or be the empty set if
only touches or lies entirely outside the positive orthant.)
Mathematical Measures of Optimization Power
In explorations of AI risk, it is helpful to formalize concepts. One particularly important concept is intelligence. How can we formalize it, or better yet, measure it? “Intelligence” is often considered mysterious or is anthropomorphized. One way to taboo “intelligence” is to talk instead about optimization processes. An optimization process (OP, also optimization power) selects some futures from a space of possible futures. It does so according to some criterion; that is, it optimizes for something. Eliezer Yudkowsky spends a few of the sequence posts discussing the nature and importance of this concept for understanding AI risk. In them, he informally describes a way to measure the power of an OP. We consider mathematical formalizations of this measure.
Here's EY's original description of his measure of OP.
Put a measure on the state space - if it's discrete, you can just count. Then collect all the states which are equal to or greater than the observed outcome, in that optimization process's implicit or explicit preference ordering. Sum or integrate over the total size of all such states. Divide by the total volume of the state space. This gives you the power of the optimization process measured in terms of the improbabilities that it can produce - that is, improbability of a random selection producing an equally good result, relative to a measure and a preference ordering.
If you prefer, you can take the reciprocal of this improbability (1/1000 becomes 1000) and then take the logarithm base 2. This gives you the power of the optimization process in bits.
Let's say that at time we have a formalism to specify all possible world states
at some future time
. Perhaps it is a list of particle locations and velocities, or perhaps it is a list of all possible universal wave functions. Or maybe we're working in a limited domain, and it's a list of all possible next-move chess boards. Let's also assume that we have a well-justified prior
over these states being the next ones to occur in the absence of an OP (more on that later).
We order according to the OP's preferences. For the moment, we actually don't care about the density, or “measure” of our ordering. Now we have a probability distribution over
. The integral from
to
over this represents the probability that the worldstate at
will be better than
, and worse than
. When time continues, and the OP acts to bring about some worldstate
, we can calculate the probability of an equal or better outcome occurring;
This is a simple generalization of what EY describes above. Here are some things I am confused about.
Finding a specification for all possible worldstates is hard, but it's been done before. There are many ways to reasonably represent this. What I can't figure out is how to specify possible worldstates “in the absence of an OP”. This phrase hides tons of complexity. How can we formally construct this counterfactual? Is the matter that composes the OP no longer present? Is it present but “not acting”? What constitutes a null action? Are we considering the expected worldstate distribution as if the OP never existed? If the OP is some kind of black-box AI agent, it's easier to imagine this. But if the OP is evolution, or a forest fire, it's harder to imagine. Furthermore, is the specification dualist, or is the agent part of the worldstates? If it's dualist, this is a fundamental falseness which can have lots of bad implications. If the agent is part of the worldstates, how do we represent them “in absence of an OP”?
But for the rest of this article, let's pretend we have such a specification. There's also a loss from ignoring the cardinal utility of the worldstates. Let's say you have the two distributions of utility over sets , representing two different OPs. In both, the OP choose a
with the same utility
. The distributions are the same on the left side of
, and the second distribution has a longer tail on the right. It seems like the OP in distribution 1 was more impressive; the second OP missed all the available higher utility. We could make the expected utility of the second distribution arbitrarily high, while maintaining the same fraction of probability mass above the achieved worldstate. Conversely, we could instead extend the left tail of the second distribution, and say that the second OP was more impressive because it managed to avoid all the bad worlds.
Perhaps it is more natural to consider two distributions; the distribution of utility over entire world futures assuming the OP isn't present, versus the distribution after the OP takes its action. So instead of selecting a single possibility with certainty, the probabilities have just shifted.
How should we reduce this distribution shift to a single number which we call OP? Any shift of probability mass upwards in utility should increase the measure of OP, and vice versa. I think also that an increase in the expected utility (EU) of these distributions should be measured as a positive OP, and vice versa. EU seems like the critical metric to use. Let's generalize a little further, and say that instead of measuring OP between two points in time, we let the time difference go to zero, and measure instantaneous OP. Therefore we're interested in some equation which has the same sign as
.
Besides that, I'm not exactly sure which specific equation should equal OP. I seem to have two contradicting desires;
1a) The sign of should be the sign of the OP.
b) Negative and
should be possible.
2) Constant positive OP should imply exponentially increasing .
Criterion 1) feels pretty obvious. Criterion 2) feels like a recognition of what is “natural” for OPs; to improve upon themselves, so that they can get better and better returns. The simplest differential equation that represents positive feedback yields exponentials, and is used across many domains because of its universal nature.
This intuition certainly isn't anthropocentric, but it might be this-universe biased. I'd be interested in seeing if it is natural in other computable environments.
If we just use , then criterion 2) is not satisfied. If we use
, then decreases in EU are not defined, and constant EU is negative infinite OP, violating 1). If we use
, then 2) is satisfied, but negative and decreasing EU give positive OP, violating 1a). If we use
, then 2) is still satisfied, but
gives
, violating 1a). Perhaps the only consistent equation would be
. But seriously, who uses absolute values? I can't recall a fundamental equation that relied on them. They feel totally ad hoc. Plus, there's this weird singularity at
. What's up with that?
Classically, utility is invariant up to positive affine transformations. Criterion 1) respects this because the derivative removes the additive constant, but 2) doesn't. It is still scale invariant, but it has an intrinsic zero. This made me consider the nature of “zero utility”. At least for humans, there is an intuitive sign to utility. We wouldn't say that stubbing your toe is 1,000,000 utils, and getting a car is 1,002,000 utils. It seems to me, especially after reading Omohundro's “Basic AI Drives”, that there is in some sense an intrinsic zero utility for all OPs.
All OPs need certain initial conditions to even exist. After that, they need resources. AIs need computer hardware and energy. Evolution needed certain chemicals and energy. Having no resources makes it impossible, in general, to do anything. If you have literally zero resources, you are not a "thing" which "does". So that is a type of intrinsic zero utility. Then what would having negative utility mean? It would mean the OP anti-exists. It's making it even less likely for it to be able to start working toward its utility function. What would exponentially decreasing utility mean? It would mean that it is a constant OP for the negative of the utility function that we are considering. So, it doesn't really have negative optimization power; if that's the result of our calculation, we should negate the utility function, and say it has positive OP. And that singularity at ? When you go from the positive side, getting closer and closer to 0 is really bad, because you're destroying the last bits of your resources; your last chance of doing any optimization. And going from negative utility to positive is infinite impressive, because you bootstrapped from optimizing away from your goal to optimizing toward your goal.
So perhaps we should drop the part of 1b) that says negative EU can exist. Certainly world-states can exist that are terrible for a given utility function, but if an OP with that utility function exists, then the expected utility of the future is positive.
If this is true, then it seems there is more to the concept of utility than the von Neumann-Morgenstern axioms.
How do people feel about criterion 2), and my proposal that ?
Risk aversion does not explain people's betting behaviours
Expected utility maximalisation is an excellent prescriptive decision theory. It has all the nice properties that we want and need in a decision theory, and can be argued to be "the" ideal decision theory in some senses.
However, it is completely wrong as a descriptive theory of how humans behave. Those on this list are presumably aware of oddities like the Allais paradox. But we may retain some notions that expected utility still has some descriptive uses, such as modelling risk aversion. The story here is simple: each subsequent dollar gives less utility (the utility of money curve is concave), so people would need a premium to accept deals where they have a 50-50 chance of gaining or losing $100.
As a story or mental image, it's useful to have. As a formal model of human behaviour on small bets, it's spectacularly wrong. Matthew Rabin showed why. If people are consistently slightly risk averse on small bets and expected utility theory is approximately correct, then they have to be massively, stupidly risk averse on larger bets, in ways that are clearly unrealistic. Put simply, the small bets behaviour forces their utility to become far too concave.
For illustration, let's introduce Neville. Neville is risk averse. He will reject a single 50-50 deal where he gains $55 or loses $50. He might accept this deal if he were really rich enough, and felt rich - say if he had $20 000 in capital, he would accept the deal. I hope I'm not painting a completely unbelievable portrait of human behaviour here! And yet expected utility maximalisation then predicts that if Neville had fifteen thousand dollars ($15 000) in capital, he would reject a 50-50 bet that either lost him fifteen hundred dollars ($1 500), or gained him a hundred and fifty thousand dollars ($150 000) - a ratio of a hundred to one between gains and losses!
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)