# A summary of Savage's foundations for probability and utility.

**Edit:** I think the P2c I wrote originally may have been a bit too weak; fixed that. Nevermind, rechecking, that wasn't needed.

**More edits (now consolidated):** Edited nontriviality note. Edited totality note. Added in the definition of numerical probability in terms of qualitative probability (though not the proof that it works). Also slight clarifications on implications of P6' and P6''' on partitions into equivalent and almost-equivalent parts, respectively.

**One very late edit, June 2:** Even though we don't get countable additivity, we still want a σ-algebra rather than just an algebra (this is needed for some of the proofs in the "partition conditions" section that I don't go into here). Also noted nonemptiness of gambles.

The idea that rational agents act in a manner isomorphic to expected-utility maximizers is often used here, typically justified with the Von Neumann-Morgenstern theorem. (The last of Von Neumann and Morgenstern's axioms, the independence axiom, can be grounded in a Dutch book argument.) But the Von Neumann-Morgenstern theorem assumes that the agent already measures its beliefs with (finitely additive) probabilities. This in turn is often justified with Cox's theorem (valid so long as we assume a "large world", which is implied by e.g. the existence of a fair coin). But Cox's theorem assumes as an axiom that the plausibility of a statement is taken to be a real number, a very large assumption! I have also seen this justified here with Dutch book arguments, but these all seem to assume that we are already using some notion of expected utility maximization (which is not only somewhat circular, but also a considerably stronger assumption than that plausibilities are measured with real numbers).

There is a way of grounding both (finitely additive) probability and utility simultaneously, however, as detailed by Leonard Savage in his *Foundations of Statistics* (1954). In this article I will state the axioms and definitions he gives, give a summary of their logical structure, and suggest a slight modification (which is equivalent mathematically but slightly more philosophically satisfying). I would also like to ask the question: To what extent can these axioms be grounded in Dutch book arguments or other more basic principles? I warn the reader that I have not worked through all the proofs myself and I suggest simply finding a copy of the book if you want more detail.

Peter Fishburn later showed in *Utility Theory for Decision Making* (1970) that the axioms set forth here actually imply that utility is bounded.

(Note: The versions of the axioms and definitions in the end papers are formulated slightly differently from the ones in the text of the book, and in the 1954 version have an error. I'll be using the ones from the text, though in some cases I'll reformulate them slightly.)

## Primitive notions; preference given a set of states

We will use the following primitive notions. Firstly, there is a set S of "states of the world"; the exact current state of the world is unknown to the agent. Secondly, there is a set F of "consequences" - things that can happen as a result of the agent's actions. *Actions* or *acts* will be interpreted as functions f:S→F, as two actions which have the same consequences regardless of the state of the world are indistinguishable and hence considered equal. While the agent may be uncertain as to the exact results of its actions, this can be folded into his uncertainty about the state of the world. Finally, we introduces as primitive a relation ≤ on the set of actions, interpreted as "is not preferred to". I.e., f≤g means that given a choice between actions f and g, the agent will either prefer g or be indifferent. As usual, sets of states will be referred to as "events", and for the usual reasons we may want to restrict the set of admissible events to a boolean σ-subalgebra of ℘(S), though I don't know if that's really necessary here (Savage doesn't seem to do so, though he does discuss it some).

In any case, we then have the following axiom:

*P1. The relation ≤ is a total preorder.*

The intuition here for transitivity is pretty clear. For totality, if the agent is presented with a choice of two acts, it must choose one of them! Or be indifferent. Perhaps we could instead use a partial preorder (or order?), though this would give us two different indistinguishable flavors of indifference, which seems problematic. But this could be useful if we wanted intransitive indifference. So long as indifference is transitive, though, we can collapse this into a total preorder.

As usual we can then define f≥g, f<g (meaning "it is false that g≤f"), and g>f. I will use f≡g to mean "f≤g and g≤f", i.e., the agent is indifferent between f and g. (Savage uses an equals sign with a dot over it.)

Note that though ≤ is defined in terms of how the agent chooses when presented with two options, Savage later notes that there is a construction of W. Allen Wallis that allows one to adduce the agent's preference ordering among a finite set of more than two options (modulo indifference): Simply tell the agent to rank the options given, and that afterward, two of them will be chosen uniformly at random, and it will get whichever one it ranked higher.

The second axiom states that if two actions have the same consequences in some situation, just what that equal consequence is does not affect their relative ordering:

*P2. Suppose f≤g, and B is a set of states such f and g agree on B. If f' and g' are another pair of acts which, outside of B, agree with f and g respectively, and on B, agree with each other, then f'≤g'.*

In other words, to decide between two actions, only the cases where they actually have different consequences matter.

With this axiom, we can now define:

*D1. We say "f≤g given B" to mean that if f' and g' are actions such that f' agrees with f on B, g' agrees with g on B, and f' and g' agree with each other outside of B, then f'≤g'.*

Due to axiom P2, this is well-defined.

Here is where I would like to suggest a small modification to this setup. The notion of "f≤g given B" is implicitly taken to be how the agent makes decisions if it knows that B obtains. However it seems to me that we should actually take "f≤g given B", rather than f≤g, to be the primitive notion, explicitly interpeted as "the agent does not prefer f to g if it knows that B obtains". The agent always has some state of prior knowledge and this way we have explicitly specified decisions under a given state of knowledge - the acts we are concerned with - as the basis of our theory. Rather than defining f≤g given B in terms of ≤, we can define f≤g to mean "f≤g given S" and then add additional axioms governing the relation between "≤ given B" for varying B, which in Savage's setup are theorems or part of the definition D1.

(Specifically, I would modify P1 and P2 to talk about "≤ given B" rather than ≤, and add the following theorems as axioms:

*P2a. If f and g agree on B, then f≡g given B.*

*P2b. If B⊆C, f≤g given C, and f and g agree outside B, then f≤g given B.*

*P2c. If B and C are disjoint, and f≤g given B and given C, then f≤g given B∪C.*

This is a little unwieldy and perhaps there is an easier way - these might not be minimal. But they do seem to be sufficient.)

In any case, regardless which way we do it, we've now established the notion of preference given that a set of states obtains, as well as preference without additional knowledge, so henceforth I'll freely use both as Savage does without worrying about which makes a better foundation, since they are equivalent.

## Ordering on preferences

The next definition is simply to note that we can sensibly talk about f≤b, b≤f, b≤c where here b and c are consequences rather than actions, simply by interpreting consequences as constant functions. (So the agent does have a preference ordering on consequences, it's just induced from its ordering on actions. We do it this way since it's its choices between actions we can actually see.)

However, the third axiom reifies this induced ordering somewhat, by demanding that it be invariant under gaining new information.

*P3'. If b and c are consequences and b≤c, then b≤c given any B.*

Thus the fact that the agent may change preferences given new information, just reflects its uncertainty about the results of their actions, rather than actually preferring different consequences in different states (any such preferences can be done away with by simply expanding the set of consequences).

Really this is not strong enough, but to state the actual P3 we will first need a definition:

*D3. An event B is said to be null if f≤g given B for any actions f and g.*

Null sets will correspond to sets of probability 0, once numerical probability is introduced. Probability here is to be adduced from the agent's preferences, so we cannot distinguish between "the agent is certain that B will not happen" and "if B obtains, the agent doesn't care what happens".

Now we can state the actual P3:

*P3. If b and c are consequences and B is not null, then b≤c given B if and only if b≤c.*

P3', by contrast, allowed some collapsing of preference on gaining new information; here we have disallowed that except in the case where the new information is enough to collapse all preferences entirely (a sort of "end of the world" or "fatal error" scenario).

## Qualitative probability

We've introduced above the idea of "probability 0" (and hence implicitly probability 1; observe that "¬B is null" is equivalent to "for any f and g, f≤g given B if and only if f≤g"). Now we want to expand this to probability more generally. But we will not initially get numbers out of it; rather we will first just get another total preordering, A≤B, "A is at most as probable as B".

How can we determine which of two events the agent thinks is more probable? Have it bet on them, of course! First, we need a nontriviality axiom so it has some things to bet on.

*P5. There exist consequences b and c such that b>c.*

(I don't know what the results would be if instead we used the weaker nontriviality axiom "there exist actions f and g such that f<g", i.e., "S is not null". That we eventually get that expected utility for comparing all acts suggests that this should work, but I haven't checked.)

So let us now consider a class of actions which I will call "wagers". (Savage doesn't have any special term for these.) Define "the wager on A for b over c" to mean the action that, on A, returns b, and otherwise, returns c. Denote this by w_{A,b,c}. Then we postulate:

*P4. Let b>b' be a pair of consequences, and c>c' another such pair. Then for any events A and B, w _{A,b,b'}≤w_{B,b,b'} if and only if w_{A,c,c'}≤w_{B,c,c'}.*

That is to say, if the agent is given the choice between betting on event A and betting on event B, and the prize and booby prize are the same regardless of which it bets on, then it shouldn't just matter just what the prize and booby prize are - it should just bet on whichever it thinks is more probable. Hence we can define:

*D4. For events A and B, we say "A is at most as probable as B", denoted A≤B*,* if w _{A,b,b'}≤w_{B,b,b'}, where b>b' is a pair of consequences.*

By P4, this is well-defined. We can then show that the relation on events ≤ is a total preorder, so we can use the usual notation when talking about it (again, ≡ will denote equivalence).

In fact, ≤ is not only a total preorder, but a *qualitative probability*:

- ≤ is a total preorder
- ∅≤A for any event A
- ∅<S
- Given events B, C, and D with D disjoint from B and C, then B≤C if and only if B∪D≤C∪D.

(There is no condition corresponding to countable additivity; as mentioned above, we simply won't get countable additivity out of this.) Note also that under this, A≡∅ if and only if A is null in the earlier sense. Also, we can define "A≤B given C" by comparing the wagers given C; this is equivalent to the condition that A∩C≤B∩C. This relation is too a qualitative probability.

## Partition conditions and numerical probability

In order to get real numbers to appear, we are of course going to have to make some sort of Archimedean assumption. In this section I discuss what some of these look like and then ultimately state P6, the one Savage goes with.

First, definitions. We will be considering finitely-additive probability measures on the set of states, i.e. a function P from the set of events to the interval [0,1] such that P(S)=1, and for disjoint B and C, P(B∪C)=P(B)+P(C). We will say "P agrees with ≤" if for every A and B, A≤B if and only if P(A)≤P(B); and we will say "P almost agrees with ≤" if for every A and B, A≤B implies P(A)≤P(B). (I.e., in the latter case, numerical probability is allowed to collapse some distinctions between events that the agent might not actually be indifferent between.)

We'll be considering here partitions of the set of states S. We'll say a partition of S is "uniform" if the parts are all equivalent. More generally we'll say it is "almost uniform" if, for any r, the union of any r parts is at most as probable as the union of any r+1 parts. (This is using ≤, remember; we don't have numerical probabilities yet!) (Note that any uniform partition is almost uniform.) Then it turns out that the following are equivalent:

- There exist almost-uniform partitions of S into arbitrarily large numbers of parts.
- For any B>∅, there exists a partition of S with each part less probable than B.
- There exists a (necessarily unique) finitely additive probability measure P that almost agrees with ≤, which has the property that for any B and any 0≤λ≤1, there is a C⊆B such that P(C)=λP(B).

(Definitely not going into the proof of this here. However, the actual definition of the numerical probability P(A) is not so complicated: Let k(A,n) denote the largest r such that there exists an almost-uniform partition of S into n parts, for which there is some union of r parts, C, such that C≤A. Then the sequence k(A,n)/n always converges, and we can define P(A) to be its limit.)

So we could use this as our 6th axiom:

*P6'''. For any B>∅**, there exists a partition of S with each part less probable than B.*

Savage notes that other authors have assumed the stronger

*P6''. There exist uniform partitions of S into arbitrarily large numbers of parts.*

since there's an obvious justification for this: the existence of a fair coin! If a fair coin exists, then we can generate a uniform partition of S into 2^{n} parts simply by flipping it n times and considering the result. We'll actually end up assuming something even stronger than this.

So P6''' does get us numerical probabilities, but they don't necessarily reflect all of the qualitative probability; P6''' is only strong enough to force almost agreement. Though it is stronger than that when ∅ is involved - it does turn out that P(B)=0 if and only if B≡∅. (And hence also P(B)=1 if and only if B≡S.) But more generally it turns out that P(B)=P(C) if and only if B and C are "almost equivalent", which I will denote B≈C (Savage uses a symbol I haven't seen elsewhere), which is defined to mean that for any E>∅ disjoint from B, B∪E≥C, and for any E>∅ disjoint from C, C∪E≥B.

(It's not obvious to me that ≈ is in general an equivalence relation, but it certainly is in the presence of P6'''; Savage seems to use this implicitly. Note also that another consequence of P6''' is that for any n there exists a partition of S into n almost-equivalent parts; such a partition is necessarily almost-uniform.)

However the following stronger version of P6''' gets rid of this distinction:

*P6'. For any B>C, there exists a partition of S, each part D of which satisfies C∪D<B.*

(Observe that P6''' is just P6' for C=∅.) Under P6', almost equivalence is equivalence, and so numerical probability agrees with qualitative probability, and we finally have what we wanted. (So by earlier, P6' implies P6'', not just P6'''. Indeed by above it implies the existence of uniform partitions into n parts for any n, not just arbitrarily large n.)

In actuality, Savage assumes an even stronger axiom, which is needed to get utility and not just probability:

*P6. For any acts g<h, and any consequence b, there is a partition of S such that if g is modified on any one part to be constantly b there, we would still have g<h; and if h is modified on any one part to be constantly b there, we would also still have g<h.*

Applying P6 to wagers yields the weaker P6'.

We can now also get conditional probability - if P6' holds, it also holds for the preorderings "≤ given C" for non-null C, and hence we can define P(B|C) to be the probability of B under the quantitative probability we get corresponding to the qualitative probabilty "≤ given C". Using the uniqueness of agreeing probability measures, it's easy to check that indeed, P(B|C)=P(B∩C)/P(C).

## Utility for finite gambles

Now that we have numerical probability, we can talk about finite gambles. If we have consequences b_{1}, ..., b_{n}, and probabilities λ_{1}, ..., λ_{n} summing to 1, we can consider the gamble ∑λ_{i}b_{i}, represented by any action which yields b_{1} with probability λ_{1}, b_{2} with probability λ_{2}, etc. (And with probability 0 does anything; we don't care about events with probability 0.) Note that by above such an action necessarily exists. It can be proven that any two actions representing the same gamble are equivalent, and hence we can talk about comparing gambles. We can also sensibly talk about mixing gambles - taking ∑λ_{i}f_{i} where the f_{i} are finite gambles, and the λ_{i} are probabilities summing to 1 - in the obvious fashion.

With these definitions, it turns out that Von Neumann and Morgenstern's independence condition holds, and, using axiom P6, Savage shows that the continuity (i.e. Archimedean) condition also holds, and hence there is indeed a utility function, a function U:F→**R** such that for any two finite gambles represented by f and g respectively, f≤g if and only if the expected utility of the first gamble is less than or equal to that of the second. Furthermore, any two such utility functions are related via an increasing affine transformation.

We can also take expected value knowing that a given event C obtains, since we have numerical probability; and indeed this agrees with the preference ordering on gambles given C.

## Expected utility in general and boundedness of utility

Finally, Savage shows that if we assume one more axiom, P7, then we have that for any essentially bounded actions f and g, we have f≤g if and only if the expected utility of f is at most that of g. (It is possible to define integration with respect to a finitely additive measure similarly to how one does with respect to a countably additive measure; the result is linear and monotonic but doesn't satisfy convergence properties.) Similarly with respect to a given event C.

The axiom P7 is:

*P7. If f and g are acts and B is an event such that f≤g(s) given B for every s∈B, then f≤g given B. Similarly, if f(s)≤g given B for every s in B, then f≤g given B.*

So this is just another variant on the "sure-thing principle" that I earlier labeled P2c.

Now in fact it turns out as mentioned above that P7, when taken together with the rest, implies that utility is bounded, and hence that we do indeed have that for any f and g, f≤g if and only if the expected utility of f is at most that of g! This is due to Peter Fishburn and postdates the first edition of *Foundations of Statistics*, so in there Savage simply notes that it would be nice if this worked for f and g not necessarily essentially bounded (so long as their expected values exist, and allowing them to be ±∞), but that he can't prove this, and then adds a footnote giving a reference for bounded utility. (Though he does prove using P7 that if you have two acts f and g such that f,g≤b for all consequences b, then f≡g; similarly if f,g≥b for all b. Actually, this is a key lemma in proving that utility is bounded; Fishburn's proof works by showing that if utility were unbounded, you could construct two actions that contradict this.)

Of course, if you really don't like the conclusion that utility is bounded, you could throw out axiom 7! It's pretty intuitive, but it's not clear that ignoring it could actually get you Dutch-booked. After all, the first 6 axioms are enough to handle finite gambles, 7 is only needed for more general situations. So long as your Dutch bookie is limited to finite gambles, you don't need this.

## Questions on further justification

So now that I've laid all this out, here's the question I originally meant to ask: To what extent can these axioms be grounded in more basic principles, e.g. Dutch book arguments? It seems to me that most of these are too basic for that to apply - Dutch book arguments need more working in the background. Still, it seems to me axioms P2, P3, and P4 might plausibly be grounded this way, though I have not yet attempted to figure out how. P7 presumably can't, for the reasons noted in the previous section. P1 I assume is too basic. P5 obviously can't (if the agent doesn't care about anything, that's its own problem).

P6 is an Archimedean condition. Typically I've seen those (specifically Von Neumann and Morgenstern's continuity condition) justified on this site with the idea that infinitesimals will never be relevant in any practical situation - if c has only infinitesimally more utility than b, the only case when the distinction would be relevant is if the probabilities of accomplishing them were exactly equal, which is not realistic. I'm guessing infinitesimal probabilities can probably be done away with in a similar manner?

Or are these not good axioms in the first place? You all are more familiar with these sorts of things than me. Ideas?

## Comments (88)

Best*6 points [-]Off topic but amusing:

Leonard Savage,

Foundations of Statistics, page 27See this and this.

*2 points [-]Without context, I can't tell whether he was trying to say the chances were high, low, extraordinarily different, or slightly different.

Chance of snow 40% and chance of Republican win 45% satisfies the quote.

I would have guessed more like 20% and 45%, but the point was that he was unlucky, not miscalibrated.

*4 points [-]Great summary!

P7 looks self-evident to me. I'm less comfortable with the P6. Unbounded utility depends on P6 requiring a partition into an arbitrarily large number of parts - is this used in the proof of bounded utility? In general, I don't think Archimedian axioms are safe given our current level of understanding of Pascal's mugger-like problems.

EDIT: P7 doesn't look as self-evident anymore. Consider the St. Petersburg lottery. Any particular payout from buying a ticket for $2 is worse than the expected value of buying a ticket for $1, but obviously it is preferable to buy the ticket for $1. Again, I don't think we can judge this given our current level of understanding of Pascal's mugger-like problems.

Numerical utility at all relies on this, so I'm not sure what you mean here.

Any particular finite gamble requires a finite number of parts. A St. Petersburg lottery requires an infinite number of parts.

Constructing a St. Petersburg lottery relies on this, but I don't see why that means "unbounded utility" depends on it; unbounded utility isn't even a consequence of these axioms, indeed the opposite is so.

In any case I don't even see how you state the notion of bounded (or unbounded) utility without numerical utility; we don't mean bounded in the sense of having internally a maximum and minimum, we mean corresponding to a bounded set of real numbers. No internal maximum or minimum is needed; how do you state that without setting up the correspondence? And to get real numbers you need an Archimedean assumption of some sort.

If there are only a finite number of options, utility can only be unbounded if at least one of the options has the possibility of utilities with arbitrarily large absolute value. It is hard to deal with an infinite number of options, but it might be possible depending on how that works with the other axioms, but this is irrelevant because P6 was not connected to the proof of bounded utility.

That is why I was initially concerned about P6.

I don't think real numbers are the best field to use for utility because of Pascal's mugging, some of the stuff described here, and this paper.

Okay, what field do you think works for utility that's better than real numbers?

The obvious candidates are surreal numbers or non-standard reals. Wikipedia says that the former doesn't have omega plus 1, where omega is the number of ordinary integers, but IIRC the latter does, so I'd try the latter first. I do not feel confident that it solves the problem, though.

*0 points [-]The surreals do have ω+1 - see the "..And Beyond" section of the wiki page. If this is contradicted anywhere else on the page, tell me where and I'll correct it.

The surreals are probably the best to use for this, though they'll need to emerge naturally from some axioms, not just be proclaimed correct. From WP: "In a rigorous set theoretic sense, the surreal numbers are the largest possible ordered field; all other ordered fields, such as the rationals, the reals, the rational functions, the Levi-Civita field, the superreal numbers, and the hyperreal numbers, are subfields of the surreals.", so even if the surreals are not necessary, they will probably be sufficient.

*1 point [-]Conway used surreal numbers for go utilities. I discussed the virtues of surreal utilities here.

*0 points [-]Those aren't really utilities because they aren't made for taking expectations, though any totally ordered set can be embedded in the surreals, so they are perfect for choosing from possibly-infinite sets of certain outcomes.

*0 points [-]Checking with the definition of utility

expectationsdo not seem critical.Conway's move values may usefully be seen as utilities associated with possible moves.

Are we agreed that bounded real-valued (or rational-valued) utility gets rid of Pascal's mugging?

Yes. Bounded utility solves tons of problems, it just doesn't, AFAICT, describe my preferences.

The bound would also have to be substantially less than 3^^^^3.

As you know, if there is a bound, without loss of generality we can say all utilities go from 0 to 1.

Repairing your claim to take that into account, if you're being mugged for $5, and the plausibility of the mugger's claim is 1/X where X is large, and the utility the mugger promises you is about 1, then you get mugged if your utility for $5 is less than 1/X, roughly. So I agree that there are utility functions that would result in the mugging, but they don't appear especially simple or especially consistent with observed human behavior, so the mugging doesn't seem likely.

Now, if the programming language used to compute the prior on the utility functions has a special instruction that loads 3^^^^3 into an accumulator with one byte, maybe the mugging will look likely. I don't see any way around that.

This strikes me as very similar to Fishburn's proof that P7 implies utility is bounded. (Maybe it's essentially the same? Need to read more carefully; point is, his proof also works by comparing two St. Petersburg lotteries.). Of course, we only get the problem if we imagine that the St. Petersburg lottery is for utility, rather than for money with decreasing marginal (and in this theory, ultimately bounded) utility...

Yes, this is Fishburn's proof, just as a modus tollens rather than a modus ponens.

Here is a small counterexample to P2. States = { Red, Green, Blue }. Outcomes = { Win, Lose }. Since there are only two outcomes, we can write actions as the subset of states that Win. My preferences are: {} < { Green } = { Blue } < { Red } < { Red,Green } = { Red,Blue } < { Green,Blue } < { Red,Green,Blue }

This contradicts P2 because { Green } < { Red } but { Red,Blue } < { Green,Blue }.

Here is a situation where this may apply: There is an urn with 300 balls. 100 of them are red. The rest are either green or blue. You draw a ball from this urn.

So Red represents definite probability 1/3, while Green and Blue are unknowns. Depending on context, it sure looks like these are the right preferences to have. This is called the Ellsberg paradox.

Even if you insist this is somehow wrong, it is not going to be Dutch booked. Even if we extend the state space to include arbitrarily many fair coins (as P6 may require), and even if we extend the result space to allow for multiple draws or other payouts, we can define various consistent objective functions (that are not expected utility) which show this behaviour.

This can be Dutch booked. As described on this Wikipedia page, you are asked to set prices for promises to pay $1 conditional on events and an adversary chooses whether to buy these from you or sell them to you at that price. If Price( {Green} ) + Price( {Red, Blue} ) ≠ $1, the adversary can ensure you lose money, and the same holds for {Red} and {Green, Blue}. However, this is incompatible with {Green} < {Red} and {Red, Blue} < {Green, Blue}.

*2 points [-]I'm aware of this. In this case my "operational subjective probability", as described on that same page, is necessarily not consistent with my preferences.

To put this another way, suppose that I do put the same price on Red, Green, and Blue when faced with that particular choice (i.e. knowing that I will have to buy or sell at the price I name). Why does it follow that I should not choose Red over Green in other circumstances? Or more to the point, how can I be Dutch booked if I then choose Red over Green in other circumstances?

You're completely right. Dutch book arguments prove almost nothing interesting. Your preference is rational.

I realize it has been a while, but can you answer some question about your preferences?

In the hypothetical world where all probabilities that you were asked to bet on were known, would you be a Bayesian?

How stable is your preference for Knightian risk over uncertainty? In other words, how much more would winning on green have to be worth for you to prefer it to red (feel free to interpret this as much as is necessary to make it precise)?

I'm not really clear on the first question. But since the second question asks how much something is worth, I take it the first question is asking about a utility function. Do I behave as if I were maximising expected utility, ie. obey the VNM postulates as far as known probabilities go? A yes answer then makes the second question go something like this: given a bet on red whose payoff has utility 1, and a bet on green whose payoff has utility N, what is the critical N where I am indifferent between the two?

For every N>1, there are decision procedures for which the answer to the first is yes, the answer to the second is N, and which displays the Ellsberg-paradoxical behaviour. Ellsberg himself had proposed one. I did have a thought on how one of these could be well illustrated in not too technical terms, and maybe it would be appropriate to post it here, but I'd have to get around to writing it up. In the meantime I can also illustrate interactively: 1) yes, 2) you can give me an N>1 and I'll go with it.

*0 points [-]Okay. Let N = 2 for simplicity and let $ denote utilons like you would use for decisions involving just risk and no uncertainty.

P(Red) = 1/3, so you are indifferent between $-1 unconditionally and ($-3 if Red, $0 otherwise). You are also indifferent between $-3 iff Red and $-3N (= $-6) iff Green (or equivalently Blue). By transitivity, you are therefore indifferent between $-1 unconditionally and $-6 iff Green. Also, you are obviously indifferent between $4 unconditionally and $6 iff (die ≥ 3).

I would think that you would allow a `pure risk' bet to be added to an uncorrelated uncertainty bet - correct me if that is wrong. In that case, you would be indifferent between $3 unconditionally and $6 iff (die ≥ 3) - $6 iff Green, but you would not be indifferent between $3 unconditionally and $6 iff (Green ∨ Blue) - $6 iff Green, which is the same as $6 iff Blue, which you value at $1.

This seems like a strange set of preferences to have, especially since both (die ≥ 3) and (Green ∨ Blue) are both pure risk, but it could be correct.

That's right.

I take it what is strange is that I could be indifferent between A and B, but not indifferent between A+C and B+C.

For a simpler example let's add a fair coin (and again let N=2). I think $1 iff Green is as good as $1 iff (Heads and Red), but $1 iff (Green or Blue) is better than $1 iff ((Heads and Red) or Blue). (All payoffs are the same, so we can actually forget the utility function.) So again: A is as good as B, but A+C is better than B+C. Is this the same strangeness?

*0 points [-]Not quite.

I think that the situation that you described in less strange then the one that I described. In yours, you are combining two 'unknown probabilities' to produce 'known probabilities'.

I find my situation stranger because the only difference between a choice that you are indifferent about and one that you do have a preference about is the substitution of (Green ∨ Blue) for (die ≥ 3). Both of these have clear probabilities and are equivalent in almost any situation. To put this another way, you would be indifferent between $3 unconditionally and $6 iff (Green ∨ Blue) - $6 iff Green if the two bets on coloured balls were taken to refer to

differentdraws from the (same) urn. This looks a lot like risk aversion, and mentally feels like risk aversion to me, but it is not risk aversion since you would not make these bets if all probabilities were known to be 1/3.Ohh, I see. Well done! Yes, I lose.

If I had a do-over on my last answer, I would not agree that $-6 iff Green is worth $-1. It's $-3.

But, given that I can't seem to get it straight, I have to admit I haven't given LW readers much reason to believe that I do know what I'm talking about here, and at least one good reason to believe that I don't.

In case anyone's still humouring me, if an event has unknown probability, so does its negation; I prefer a bet on Red to a bet on Green, but I also prefer a bet against Red to a bet against Green. This is actually the same thing as combining two unknown probabilities to produce a known one: both Green and (not Green) are unknown, but (Green or not Green) is known to be 100%.

$-6 iff Green is actually identical to $-6 + $6 iff (not Green). (not Green) is identical to (Red or Blue), and Red is a known probability of 1/3. $6 iff Blue is as good as $6 iff Green, which, for N=2, is worth $1. $-6 iff Green is actually worth $-3, rather than $-1.

*0 points [-]Hmm. Now we have that $6 iff Green is worth $1 and $-6 iff Green is worth $-3, but $6-6 = $0 iff Green is not equivalent to $1-3 = $-2.

In particular, if you have $6 conditional on Green, you will trade that to me for $1. Then, we agree that if Green occurs, I will give you $6 and you will give me $6, since this adds up to no change. However, then I agree to waive your having to pay me the $6 back if you give me $3. You now have your original $6 iff Green back, but are down an unconditional $2, an indisputable net loss.

Also, this made me realize that I could have just added an unconditional $6 in my previous example rather than complicating things by making the $6 first conditional on (die ≥ 3) and then on (Green ∨ Blue). That would be much clearer.

*1 point [-]Isn't this -1 and -4, not -1 and -1? I think you want -3/N = -1.5.

*0 points [-]I'm not quite sure what you're first sentence is referring to, but fool prefers risk to uncertainty. From his post:

Sorry, but that is highly nonobvious! Why do you claim that?

Note BTW that your state space is wrong in that it doesn't include differing states of how many green balls there are, but I assume you're just restricting your ordering to those actions which depend only on the color of the ball (since other actions would not be possible in this context).

"Consistent" in what sense?

As to the state space, as you say, we could expand the state space and restrict the actions as you suggest, and it wouldn't matter. But if you prefer we could draw a ball from the urn, set it aside, and destroy the urn before revealing the colour of the ball. At that point colour really is the only state, as I understand the word "state".

As to why it looks right: red is a known probability, green and blue aren't. It seems quite reasonable to choose the known risk over the unknown one. Especially under adversarial conditions. This is sometimes called ambiguity aversion or uncertainty aversion, which is sort of orthogonal to risk aversion.

As for consistency, if you're maximising a single function, you're not going to end up in a lower state via upward-moving steps.

Beyond that I can point to literature on the Ellsberg paradox. The wikipedia page has some info and some resources.

FWIW, it doesn't seem right to me to mention adversarial situations when that's not given in the problem. Preferring safer bets seems right in the presence of an adversary, but this example isn't displaying that reasoning.

FWIW, agreed, "not given in the problem". My bad.

*2 points [-]Betting generally includes an adversary who wants you to lose money so they win it. Possibly in psychology experiments, betting against the experimenter, you are more likely to have a betting partner who is happy to lose money on bets. And there was a case of a bet happening on Less Wrong recently where the person offering the bet had another motivation, demonstrating confidence in their suspicion. But generally, ignoring the possibility of someone wanting to win money off you when they offer you a bet is a bad idea.

Now betting is supposed to be a metaphor for options with possibly unknown results. In which case sometimes you still need to account for the possibility that the options were made available by an adversary who wants you to choose badly, but less often. And you also should account for the possibility that they were from other people who wanted you to choose well, or that the options were not determined by any intelligent being or process trying to predict your choices, so you don't need to account for an anticorrelation between your choice and the best choice. Except for your own biases.

*0 points [-]The probability of drawing a blue ball is 1/3, as is that of drawing a green ball.

I'd insist that my preferences are {} < {Red} = {Green} = {Blue} < {Red, Green} = {Red, Blue} = {Blue, Green} < {Red, Green, Blue}. There's no reason to prefer Red to Green: the possibility of there being few Green balls is counterbalanced by the possibility of there being close to 200 of them.

ETA: Well, there are situations in which your preference order is a good idea, such as when there is an adversary changing the colours of the balls in order to make you lose. They can't touch red without being found out, they can only change the relative numbers of Blue and Green. But in that case, choosing the colour that makes you win isn't the only effect of an action - it also affects the colours of the balls, so you need to take that into account.

So the true state space would be

`{Ball Drawn = i}`

for each value of i in [1..300]. The contents of the urn are chosen by the adversary, to be`{Red = 100, Green = n, Blue = 200 - n}`

for n in [0..200]. When you take the action`{Green}`

, the adversary sets`n`

to 0, so that action maps all`{Ball Drawn = i}`

to`{Lose}`

. And so on. Anyway, I don't think this is a counter-example for that reason: you're not just deciding the winning set, you're affecting the balls in the urn.I see. No, that's not the kind of adversary I had in mind when I said that.

How about a four-state example. The states are { (A,Heads), (A,Tails), (B,Heads), (B,Tails) }.

The outcomes are { Win, Lose }. I won't list all 16 actions, just to say that by P1 you must rank them all. In particular, you must rank the actions X = { (A,Heads), (A,Tails) }, Y = { (B,Heads), (B,Tails) }, U = { (A,Heads), (B,Tails) }, and V = { (A,Tails), (B,Heads) }. Again I'm writing actions as events, since there are only two outcomes.

To motivate this, consider the game where you and your (non-psychic, non-telekinetic etc) adversary are to simultaneously reveal A or B; if you pick the same, you win, if not, your adversary wins. You are at a point in time where your adversary has written "A" or "B" on a piece of paper face down, and you have not. You have also flipped a coin, which you have not looked at (and are not required to look at, or show your adversary). Therefore the above four states do indeed capture all the state information, and the four actions I'm singling out correspond to: you ignore the coin and write "A", or ignore and write "B"; or else you decide to base what you write on the flip of the coin, one way, or the other. As I say, by P1, you must rank these.

Me, I'll take the coin, thanks. I rank X=Y<U=V. I just violated P2. Am I really irrational?

And even if you think I am, one of the questions originally asked was how things could be justified by Dutch book arguments or the like. So the Ellsberg paradox and variants is still relevant to that question, normative arguments aside.

So P2 doesn't apply in this example. Why not? Well, the reason you prefer to use the coin is because you suspect the adversary to be some kind of predictor, who is slightly more likely to write down a B if you just write down A (ignoring the coin).

That'snot something captured by the state information here. You clearly don't think that`(A,Tails)`

is simultaneously more and less likely than`(B,Tails)`

, just that the action you choose can have some influence on the outcome. I think it might be that if you expanded the state space to include a predictor with all the possibilities of what it could do, P2 would hold again.That isn't the issue. At the point in time I am talking about, the adversary has already made his non-revealed choice (and he is not telekinetic). There is no other state.

Tails versus Heads is objectively 1:1 resulting from the toss of a fair coin, whereas A versus B has an uncertainty that results from my adversary's choice. I may not have reason to think that he will choose A over B, so I can still call it 1:1, but there is still a qualitative distinction between uncertainty and randomness, or ambiguity and risk, or objective and subjective probability, or whatever you want to call it, and it is not irrational to take it into account.

*3 points [-]I have to admit, this ordering seem reasonable... for the reasons nshepperd suggests. Just saying that he's not telepathic isn't enough to say he's not any sort of predictor - after all, I'm a human, I'm bad at randomizing, maybe he's played this game before and compiled statistics. Or he just has a good idea how peope tend to think about this sort of thing. So I'm not sure you're correct in your conclusion that this isn't the issue.

Then I claim that a non-psychic predictor, no matter how good, is very different from a psychic.

The powers of a non-psychic predictor are entirely natural and causal. Once he has written down his hidden choice, then he becomes irrelevant. If this isn't clear, then we can make an analogy with the urn example. After the ball is drawn but before its colour is revealed, the contents of the urn are irrelevant. As I pointed out, the urn could even be destroyed before the colour of the ball is revealed, so that the ball's colour truly is the only state. Similarly, after the predictor writes his choice but before it is revealed, he might accidentally behead himself while shaving.

Now of course your beliefs about the talents of the late predictor might inform your beliefs about his hidden choice. But that's the only way they can possibly be releveant. The coin and the predictor's hidden choice on the paper really are the only states of the world now, and your own choice is free and has no effect on the state. So, if you display a strict preference for the coin, then your uncertainty is still not captured by subjective probability. You still violate P2.

To get around this, it seems you would have to posit some residual entanglement between your choice and the external state. To me this sounds like a strange thing to argue. But I suppose you could say your cognition is flawed in a way that is invisible to you, yet was visible to the clever but departed predictor. So, you might argue that, even though there is no actual psychic effect, your choice is not really free, and you have to take into account your internalities in addition to the external states.

My question then would be, does this entanglement prevent you from having a total ordering over all maps from states (internal and external) to outcomes? If yes, then P1 is violated. If no, then can I not just ask you about the ordering of the maps which only depend on the external states, and don't we just wind up where we were?

Well, that sounds irrational. Why would you pay to switch from X to U, a change that makes no difference to the probability of you winning?

*1 point [-]Because there might be more to uncertainty than subjective probability.

Let's take a step back.

Yes, if you assume that uncertainty is entirely captured by subjective probability, then you're completely right. But if you assume that, then you wouldn't need the Savage axioms in the first place. The Savage axioms are one way of justifying this assumption (as well as expected utility). So, what justifies the Savage axioms?

One suggestion the original poster made was to use Dutch book arguments, or the like. But now here's a situation where there does seem to be a qualitative difference between a random event and an uncertain event, where there is a "reasonable" thing to do that violates P2, and where nothing like a Dutch book argument seems to be available to show that it is suboptimal.

I hope that clarifies the context.

EDIT: I put "reasonable" in scare-quotes. It

isreasonable, and I am prepared to defend that. But it isn't necessary to believe it is reasonable to see why this example matters in this context.Peter Wakker apparently thinks he found a way to have unbounded utilities and obey most of Savage's axioms. See Unbounded utility for Savage's "Foundations of Statistics," and other models. I'll say more if and when I understand that paper.

I don't think P2 can be justified by Dutch Book type arguments. I don't think it can be justified, as a

rational requirementof choice, at all. My reservations are similar to Edward McClennen's in "Sure Thing Doubts".So you would argue that, knowing a fact, your preferences can depend on what would have happened had that fact been false?

Right. (Not

mypreference necessarily, but a rational person's.) The facts in question include past actions, which can form the basis of regrets. The value of an event can depend on its historical context - that doesn't seem unreasonable.Would we be able to write the outcomes as full histories?

I don't see why not. However, I haven't seen many (any?) decision theory treatments that do so.

This is very nice.

One point I find less than perfectly convincing: the motivation of the "total" part of P1 by saying that if our preorder were partial then we'd have two different kinds of indifference.

First off, I don't see anything bad about that in terms of mathematical elegance. Consider, e.g., Conway's beautiful theory of numbers and (two player perfect-information) games, in which the former turn out to be a special case of the latter. When you extend the <= relation on numbers to games, you get a partial preorder with, yes, two kinds of "indifference". One means "these two games are basically the same game; they are interchangeable in almost all contexts". The other means "these are quite different games, but neither is unambiguously a better game to find yourself playing".

This sort of thing also seems eminently plausible to me on (so to speak) psychological grounds. Real agents

dohave multiple kinds of indifference. Sometimes two situations just don't differ in any way we care about. Sometimes they differ a great deal but neither seems clearly preferable.It would probably be much harder to extract von Neumann / Morgenstern from a version of the axioms with P1 weakened to permit non-totality. But I wonder whether what you

wouldget (perhaps with some other strengthenings somewhere) might end up being a better match for real agents' real preferences.(That would not necessarily be a good thing; perhaps our experience of internal conflicts from multiple incommensurable-feeling values merely indicates a suboptimality in our thinking. After all, agents do have to decide what to do in any given situation.)

I basically agree with you on this. Savage doesn't seem to actually justify totality much; that was my own thought as I was writing this. The real question, I suppose, is not "are there two flavors of indifference" but "is indifference transitive", since that's equivalent to totality. I didn't bother talking about totality any further because, while I'm not entirely comfortable with it myself, it seems to be a standard assumption here.

I'll add a note to the post about how totality can be considered as transitivity of indifference.

Yes, that's a good way of looking at it.

If we (1) look at the way our preferences actually are and (2) consider "aargh, conflict of incommensurable values, can't decide" to be a kind of indifference, then indifference certainly isn't transitive. But, again, maybe we'd do better to consider idealized agents that don't have such confusions.

In particular because agents which do have such confusions should leave money on the table -- they are incapable of dutch-booking people who can be dutch-booked.

How so? (It looks to me as though the ability to dutch-book someone dutch-book-able doesn't depend at all on one's value system. In particular, the individual transactions that go to make up the d.b. don't need to be of positive utility on their own, because the dutch-book-er knows that the dutch-book-ee is going to be willing to continue through to the end of the process. I think. What am I missing?)

Hmm. That's not quite the right description of the illogic but something very odd is going on:

Suppose I find A and B incomparable and B and C incomparable but A is preferable to C.

Joe is willing to trade C=>B and B=>A.

I trade C into B knowing that I will eventually get A.

Then, I refuse to trade B to A!

But if Joe had not been willing to trade B=>A, I would not have traded C=>B!

*2 points [-]Excellent summary. Savage's founding of statistics is nice because it only assumes that agents have to make choices between actions, making no assumptions about whether they have to have beliefs or goals. This is important because agents in general don't have to use beliefs or goals, but they do all have to chose actions.

Thanks for the info about boundedness, I didn't notice that on my quick skim through the book.

Yeah, obviously in the 1954 edition he didn't know that; in the 1972 edition, he leaves all the obsolete discussion in and just adds a footnote saying that FIshburn proved boundedness and giving a reference! Had to look that up separately. Didn't notice it either until late in writing this.

Fortunately (since I'm away from university right now) I found a PDF of Fishburn's book online: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=AD0708563

*1 point [-]I think you mean that agents don't have to use beliefs or goals, but they do all have to choose between actions.

If you really meant what you said, then you drew some deep bizarre counterintuitive conclusion there that I can't understand, and I'd really like to see an argument for it.

Yep, my mistake. Fixed.

*1 point [-]I don't think transitivity is a reasonable assumption.

Suppose an agent is composed of simpler submodules--this, to a very rough approximation, is how actual brains seem to function--and its expressed preferences (i.e. actions) are assembled by polling its submodules.

Bam, voting paradox. Transitivity is out.

Neural signals represent things cardinally rather than ordinally, so those voting paradoxes probably won't apply.

Even conditional on humans not having transitive preferences even in an approximate sense, I find it likely that it would be useful to come up with some 'transativization' of human preferences.

Agreed that there's a good chance that game-theoretic reasoning about interacting submodules will be important for clarifying the structure of human preferences.

Neural signals represent things cardinally rather than ordinallyI'm not sure what you mean by this. In the general case, resolution of signals is highly nonlinear, i.e. vastly more complicated than any simple ordinal or weighted ranking method. Signals at synapses are nearly digital, though: to first order, a synapse is either firing or it isn't. Signals along individual nerves are also digital-ish--bursts of high-frequency constant-amplitude waves interspersed with silence.

My point, though, is that it's not reasonable to assume that transitivity holds axiomatically when it's simple to construct a toy model where it doesn't.

On a macro level, I can imagine a person with dieting problems preferring starving > a hot fudge sundae, celery > starving, and a hot fudge sundae > celery.

My experience is that this is generally because of a measurement problem, not a reflectively endorsed statement.

Well, it's clearly pathological in some sense, but the space of actions to be (pre)ordered is astronomically big and reflective endorsement is slow, so you can't usefully error-check the space that way. cf. Lovecraft's comment about "the inability of the human mind to correlate all its contents".

I don't think it will do to simply assume that an actually instantiated agent will have a transitive set of expressed preferences. Bit like assuming your code is bugfree.

The agent is allowed to ask it's submodules how they would feel about various gambles e.g. "Would you prefer B or a 50% probability of A and a 50% probability of C". Equipped with this extra information a voting paradox can be avoided. This is because the preferences over gambles tell you not just which order the submodule would rank the candidates in, but quantitatively how much it cares about each of them.

Assuming the submodules are rational (which they had better be if we want the overall agent to be rational) then their preferences over gambles can be expressed as a utility function on the outcomes. So then the main agent can make its utility function a weighted sum of theirs. This avoids non-transitivity.

A preference order which says just what order the candidates come in is called an "ordinal utility function".

A utility function that actually describes the relative values of the candidates is a "cardinal utility function".

*1 point [-]I do not like this modification. His way is more elegant because it starts with less information. It is less robust because it does not depend on an initial concept of "knowledge". "knowledge" does not make sense in all instances.

P1 is Dutch Book justifiable, I think. For instance x has to not be preferred to x, or else trading x for x would be a benefit.

This thought isn't original to me, but it's probably worth making. It feels like there are two sorts of axioms. I am following tradition in describing them as "rationality axioms" and "structure axioms". The rationality axioms (like the transitivity of the order among acts) are norms on action. The structure axioms (like P6) aren't normative at all. (It's about structure on the world, how bizarre is it to say "The world

oughtto be such that P6 holds of it"?)Given this, and given the necessity of the structure axioms for the proof, it feels like Savage's theorem can't serve as a justification of Bayesian epistemolgy as a norm of rational behaviour.

P6 is really both. Structurally, it forces there to be something like a coin that we can flip as many times as we want. But normatively, we can say that if the agent has blah blah blah preference, it shall be able to name a partition such that blah blah blah. See e.g. [rule 4]. This of course doesn't address

whywe think such a thing is normative, but that's another issue.But why ought the world be such that such a partition exists for us to name? That doesn't seem normative. I guess there's a minor normative element in that it demands "If the world conspires to allow us to have partitions like the ones needed in P6, then the agent must be able to know of them and reason about them" but that still seems secondary to the demand that the world is thus and so.

Agreed, the structural component is not normative. But to me, it is the structural part that seems benign.

If we assume the agent lives forever, and there's always some uncertainty, then surely the world

isthus and so. If the agent doesn't live forever, then we're into bounded rationality questions, and even transitivity is up in the air.P6 entails that there are (uncountably) infinitely many events. It is at least compatible with modern physics that the world is fundamentally discrete both spatially and temporally. The visible universe is bounded. So it may be that there are only finitely many possible configurations of the universe. It's a big number sure, but if it's finite, then Savage's theorem is irrelevant. It doesn't tell us anything about what to believe in our world. This is perhaps a silly point, and there's probably a nearby theorem that works for "appropriately large finite worlds", but still. I don't think you can just uncritically say "surely the world is thus and so".

If this is supposed to say something normative about how I should structure my beliefs, then the structural premises should be true of the world I have beliefs about.

But it was a conditional statement. If the universe is discrete and finite, then obviously there are no immortal agents either.

Basically I don't see that aspect of P6 as more problematic than the unbounded resource assumption. And when we question that assumption, we'll be questioning a lot more than P6.

Here is a small counterexample to P2. States = { Red, Green, Blue }. Outcomes = { Win, Lose }. Since there are only two outcomes, we can write actions as the subset of states that Win. My preferences are: {} < { Green } = { Blue } < { Red } < { Red,Green } = { Red,Blue } < { Green,Blue } < { Red,Green,Blue }

Here is a situation where this may apply: There is an urn with 300 balls. 100 of them are red. The rest are either green or blue. You draw a ball from this urn.

So Red represents definite probability 1/3, while Green and Blue are unknowns. Depending on context, it sure looks like these are the right preferences to have. This is called the Ellsberg paradox.

Even if you insist this is somehow wrong, it is not going to be Dutch booked. Even if we extend the state space to include arbitrarily many fair coins (as P6 may require), and even if we extend the result space to allow for multiple draws or other payouts, we can define various consistent objective functions (that are not expected utility) which show this behaviour.

You mean P7 is implied already by P1-6 for finite B, I assume.

No, I meant that P1-P6 imply the expected utility hypothesis for finite gambles, i.e., if f and g each only take on finitely many values (outside a set of probability 0). They therefore also imply P7 for finite gambles, and hence in particular for finite B, but "finite B" is a very strict condition - under P1-P6, any finite B will always be null, so P7 will be true for them trivially!

Okay. I was considering finite gambles backed by a finite S, although of course that need not be the case. Do these axioms only apply to infinite S? If so, I didn't notice where that was stated - is it a consequence I missed? I'm also curious why P1-P6 imply that any finite B must be null:

A finite B necessarily has only finitely many subsets, while any nonnull B necessarily has at least continuum-many subsets, since there is always a subset of any given probability at most P(B).

Basically one of the effects of P6 is to ensure we're not in a "small world". See all that stuff about uniform partitions into arbitrarily many parts, etc.

Yes, P6 very clearly says that. Somehow I skipped it on first reading. So when you add P6, S is provably infinite. Thanks.