Math appendix for: "Why you must maximize expected utility"

8 Benja 13 December 2012 01:11AM

This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!

The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.

*

I represent preference relations as total preorders on a simplex ; define , , and in the obvious ways (e.g., iff both and , and iff but not ). Write for the 'th unit vector in .

In the following, I will always assume that satisfies the independence axiom: that is, for all and , we have  if and only if . Note that the analogous statement with weak preferences follows from this: holds iff , which by independence is equivalent to , which is just .

Lemma 1 (more of a good thing is always better). If and , then .

Proof. Let . Then, and . Thus, the result follows from independence applied to , , and .

Lemma 2. If  and , then there is a unique such that for and for .

Proof. Let be the supremum of all such that (note that by assumption, this condition holds for ). Suppose that . Then there is an such that . By Lemma 1, we have , and the first assertion follows.

Suppose now that . Then by definition of , we do not have , which means that we have , which was the second assertion.

Finally, uniqueness is obvious, because if both and satisfied the condition, we would have .

Definition 3. is much better than , notation or , if there are neighbourhoods of and of (in the relative topology of ) such that we have for all and . (In other words, the graph of is the interior of the graph of .) Write  or when ( is not much better than ), and ( is about as good as ) when both and .

Theorem 4 (existence of a utility function). There is a such that for all ,

Unless for all and , there are  such that .

Proof. Let be a worst and a best outcome, i.e. let be such that for all . If , then  for all , and by repeated applications of independence we get for all , and therefore again for all , and we can simply choose .

Thus, suppose that . In this case, let be such that for every , equals the unique provided by Lemma 2 applied to and . Because of Lemma 1, . Let .

We first show that implies . For every , we either have , in which case by Lemma 2 we have for arbitrarily small , or we have , in which case we set  and find . Set . Now, by independence applied times, we have ; analogously, we obtain for arbitrarily small . Thus, using and Lemma 1, and therefore  as claimed. Now note that if , then this continues to hold for and in a sufficiently small neighbourhood of and , and therefore we have .

Now suppose that . Since we have  and , we can find points and arbitrarily close to and such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then, by the preceding paragraph. But this implies that , which completes the proof.

Corollary 5. is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.

Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem.

Corollary 6. is unique up to affine transformations.

Proof. Since  is a VNM utility function for , this follows from the analogous result for that case.

Corollary 7. Unless for all , for all the set has lower dimension than (i.e., it is the intersection of with a lower-dimensional subspace of ).

Proof. First, note that the assumption implies that . Let be given by , , and note that is the intersection of the hyperplane with the closed positive orthant . By the theorem, is not parallel to , so the hyperplane is not parallel to . It follows that has dimension , and therefore can have at most this dimension. (It can have smaller dimension or be the empty set if only touches or lies entirely outside the positive orthant.)

Why you must maximize expected utility

20 Benja 13 December 2012 01:11AM

This post explains von Neumann-Morgenstern (VNM) axioms for decision theory, and what follows from them: that if you have a consistent direction in which you are trying to steer the future, you must be an expected utility maximizer. I'm writing this post in preparation for a sequence on updateless anthropics, but I'm hoping that it will also be independently useful.

The theorems of decision theory say that if you follow certain axioms, then your behavior is described by a utility function. (If you don't know what that means, I'll explain below.) So you should have a utility function! Except, why should you want to follow these axioms in the first place?

A couple of years ago, Eliezer explained how violating one of them can turn you into a money pump — how, at time 11:59, you will want to pay a penny to get option B instead of option A, and then at 12:01, you will want to pay a penny to switch back. Either that, or the game will have ended and the option won't have made a difference.

When I read that post, I was suitably impressed, but not completely convinced: I would certainly not want to behave one way if behaving differently always gave better results. But couldn't you avoid the problem by violating the axiom only in situations where it doesn't give anyone an opportunity to money-pump you? I'm not saying that would be elegant, but is there a reason it would be irrational?

It took me a while, but I have since come around to the view that you really must have a utility function, and really must behave in a way that maximizes the expectation of this function, on pain of stupidity (or at least that there are strong arguments in this direction). But I don't know any source that comes close to explaining the reason, the way I see it; hence, this post.

I'll use the von Neumann-Morgenstern axioms, which assume probability theory as a foundation (unlike the Savage axioms, which actually imply that anyone following them has not only a utility function but also a probability distribution). I will assume that you already accept Bayesianism.

*

Epistemic rationality is about figuring out what's true; instrumental rationality is about steering the future where you want it to go. The way I see it, the axioms of decision theory tell you how to have a consistent direction in which you are trying to steer the future. If my choice at 12:01 depends on whether at 11:59 I had a chance to decide differently, then perhaps I won't ever be money-pumped; but if I want to save as many human lives as possible, and I must decide between different plans that have different probabilities of saving different numbers of people, then it starts to at least seem doubtful that which plan is better at 12:01 could genuinely depend on my opportunity to choose at 11:59.

So how do we formalize the notion of a coherent direction in which you can steer the future?

continue reading »

Pascal's Mugging for bounded utility functions

8 Benja 06 December 2012 10:28PM

Followup to: Pascal's Mugging: Tiny probabilities of vast utilities; The Lifespan Dilemma

This is Pascal's Mugging: Someone comes to you and says, "Give me five dollars, and I'll use my powers from outside the matrix to grant you 4^^^^4 years of fun." And they're lying, of course, but under a Solomonoff prior, the probability that they're not, though surely very small, isn't going to be less than one in 3^^^3; and so if you shut up and multiply, it's clear that the expected utility of paying up outweighs the expected utility of anything sensible you might be doing with those five dollars, and therefore—

Well, fortunately, if you're afraid that your utility-maximizing AI will end up paying all its money to the first clever mugger to come along and ask: never to worry! It will do so only if it can't think of anything better to do with five dollars, after all. So to avoid being mugged, all it has to do is to think of a harebrained scheme for spending $5 that has more than a one-in-4^^^4 chance of providing 5^^^^5 years of fun. Problem solved.

If, however, you would like to be there be a chance greater than one-in-hell that your AI ends up doing something actually useful, you'll need to do something else. And the simplest answer is to adopt a bounded utility function: any positive singularity gives at least 50 utils, a billion years gives 80 utils, a googol years gives 99 utils, a googolplex years gives 99.9 utils, and 4^^^^4 years of fun give 100 utils (minus epsilon).

This will, indeed, solve the problem. Probability of getting mugged: used to be one (minus epsilon, of course); has now been brought down to zero. That's right: zero.

(Plus epsilon.)

But let's suppose that the impossible happens, and the universe turns out to be able to support TREE(100) years of fun, and we've already lived out 4^^^^4 of them, and the AI has long since folded up operations and faded out of existence because humanity has become sufficiently sane that we no longer need it—

And lo, someone comes to you and says, "Alas, you're not really experiencing 4^^^^4 years of fun here; you're really a mere billion-year-old living in a very convincing simulation. Give me five dollars, and I'll use my powers from outside the matrix to extend your lifespan to a googol years."

And they're lying, of course — but it has been a long time indeed since you last faced a choice that could make a difference of nineteen whole utils...

*

If you truly have a bounded utility function, you must agree that in this situation, paying up is exactly what you'd want to do. Even though it means that you will not experience 4^^^^4 years of fun, even conditional on the universe being capable of supporting TREE(100) of them.

[ETA: To clarify, by "4^^^^4", I really mean any number so large that your utility function assigns (100 - epsilon) utils to it. It's possible to have a utility function where this is only true for infinite numbers which are so incredibly infinite that, given a particular formal language, their definition is so long and complicated that no mere human-sized mind could comprehend it. See this comment thread for discussion of bounded utility functions that assign significant weight to very large lifetimes.]

A model of UDT with a concrete prior over logical statements

43 Benja 28 August 2012 09:45PM

I've been having difficulties with constructing a toy scenario for AI self-modification more interesting than Quirrell's game, because you really want to do expected utility maximization of some sort, but currently our best-specified decision theories search through the theorems of one particular proof system and "break down and cry" if they can't find one that tells them what their utility will be if they choose a particular option. This is fine if the problems are simple enough that we always find the theorems we need, but the AI rewrite problem is precisely about skirting that edge. It seems natural to want to choose some probability distribution over the possibilities that you can't rule out, and then do expected utility maximization (because if you don't maximize EU over some prior, it seems likely that someone could Dutch-book you); indeed, Wei Dai's original UDT has a "mathematical intuition module" black box which this would be an implementation of. But how do you assign probabilities to logical statements? What consistency conditions do you ask for? What are the "impossible possible worlds" that make up your probability space?

Recently, Wei Dai suggested that logical uncertainty might help avoid the Löbian problems with AI self-modification, and although I'm sceptical about this idea, the discussion pushed me into trying to confront the logical uncertainty problem head-on; then, reading Haim Gaifman's paper "Reasoning with limited resources and assigning probabilities to logical statements" (which Luke linked from So you want to save the world) made something click. I want to present a simple suggestion for a concrete definition of "impossible possible world", for a prior over them, and for an UDT algorithm based on that. I'm not sure whether the concrete prior is useful—the main point in giving it is to have a concrete example we can try to prove things about—but the definition of logical possible worlds looks like a promising theoretical tool to me.

continue reading »

How to cheat Löb's Theorem: my second try

14 Benja 22 August 2012 06:21PM

In his open problems talk, Eliezer explains how Löb's theorem prevents you from having a consistent proof system P with an axiom schema that anything P proves is actually true, and asks how we can then "build an AI that could completely rewrite itself, without decreasing the amount of trust it had in math every time it executed that self-rewrite" (18:46).

Recently, I posted about an attempt to apply a general trick for avoiding diagonalization problems to a minimal toy version of this problem. Since then, Wei Dai has posted an interesting quining approach to the same toy problem, and Giles had a promising idea for doing something similar in a different way and will hopefully do a write-up filling in the details. Unfortunately my own "proof" turned out to be broken.

I think I've fixed the problem and made the proof more comprehensible and intuitive in the process. (To avoid confusion, note that what I'm proving is slightly different from, though related to, what I did in the previous post.) However, getting the details right seems to be far from trivial, so I would very much appreciate if people checked my new argument, and told me that it looks okay / where it goes wrong / where they get lost. Thanks in advance!

I'll be more explicit about quoting/unquoting than before, which means I'll need to introduce some notation. However, to sustain you through the schlep of preliminaries, I thought I'd start with an informal summary.

continue reading »

An angle of attack on Open Problem #1

30 Benja 18 August 2012 12:08PM

There is a problem with the proof here and I have to think about whether I can fix it. Thanks to vi21maobk9vp for pointing me in the right direction! I have posted a new and hopefully correct proof attempt. Thanks again to vi21maobk9vp!

In his talk on open problems in Friendly AI, Eliezer's first question is how, given Löb's theorem, an AI can replace itself with a better expected utility maximizer that believes in as much mathematics as the original AI. I know exactly one trick for that sort of problem, so I decided to try that on a toy variant. To my surprise, it more or less just worked. Therefore:

Professor Quirrell proposes a game. You start with a score of one. Professor Quirrell moves first, by choosing a computer program and showing you its source code. You then have three options: Take your winnings; double down; or self-destruct.

If you take your winnings, the game ends, and your score is converted to Quirrell points.

If you self-destruct, the game ends, your score is lost, you'll be sent to bed without dinner, you'll lose 150 House points, Rita Skeeter will write a feature alleging that you're a Death Eater, and Professor Quirrell will publicly critique your performance. You are advised not to pick this option.

If you double down, your score doubles, and you advance to the next round. Professor Quirrell again moves first by choosing a computer program. Then, it's your turn—except that this time, you don't get to choose your move yourself: instead, it'll be chosen by Professor Quirrell's program from the previous round.

Professor Quirrell will endeavor to present an educational sequence of programs.

continue reading »

Self-modification is the correct justification for updateless decision theory

12 Benja 11 April 2010 04:39PM

Reply to: Late great filter is not bad news

Suppose that you build an AI, and Omega appears to it and says:

Here's a button. A million years ago I calculated the umpteenth digit of pi. If it is even, I calculated whether you would press this button (in such a way that your human creator was never simulated as a conscious being). If I predicted that you wouldn't press the button, I destroyed Earth right then and there.* If it is odd, I created a doomsday device that will destroy the solar system if you press this button.

[* ETA: Assume that if the digit is even and the AI is predicted to press the button, Omega does not destroy Earth, but does turn Alpha Centauri purple (say). The point is for this to be a scenario that you, the AI creator, know not to have come to pass.]

Suppose you're the kind of AI creator whose AI is time consistent in a certain sense from the beginning of time and presses the button. Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.

Suppose, on the other hand, that you're a very similar kind of AI creator, only you program your AI not to take into account impossible possible worlds that had already turned out to be impossible (when you created the AI | when you first became convinced that timeless decision theory is right). Then you've got an AI that most of the time acts the same way, but does worse in worlds we know to be logically impossible, and destroys humanity less often in worlds we do not know to be logically impossible.

Wei Dai's great filter post seems to suggest that under UDT, you should be the first kind of AI creator. I don't think that's true, actually; I think that in UDT, you should probably not start with a "prior" probability distribution that gives significant weight to logical propositions you know to be false: do you think the AI should press the button if it was the first digit of pi that Omega calculated?

But obviously, you don't want tomorrow's you to pick the prior that way just after Omega has appeared to it in a couterfactual mugging (because according to your best reasoning today, there's a 50% chance this loses you a million dollars).

The most convincing argument I know for timeless flavors of decision theory is that if you could modify your own source code, the course of action that maximizes your expected utility is to modify into a timeless decider. So yes, you should do that. Any AI you build should be timeless from the start; and it's reasonable to make yourself into the kind of person that will decide timelessly with your probability distribution today (if you can do that).

But I don't think you should decide that updateless decision theory is therefore so pure and reflectively consistent that you should go and optimize your payoff even in worlds whose logical impossibility was clear before you first decided to be a timeless decider (say). Perhaps it's less elegant to justify UDT through self-modification at some arbitrary point in time than through reflective consistency all the way from the big bang on; but in the worlds we can't rule out yet, it's more likely to win.

View more: Prev | Next