A summary of Savage's foundations for probability and utility.

34 Sniffnoy 22 May 2011 07:56PM

Edit: I think the P2c I wrote originally may have been a bit too weak; fixed that. Nevermind, rechecking, that wasn't needed.

More edits (now consolidated): Edited nontriviality note. Edited totality note. Added in the definition of numerical probability in terms of qualitative probability (though not the proof that it works). Also slight clarifications on implications of P6' and P6''' on partitions into equivalent and almost-equivalent parts, respectively.

One very late edit, June 2: Even though we don't get countable additivity, we still want a σ-algebra rather than just an algebra (this is needed for some of the proofs in the "partition conditions" section that I don't go into here). Also noted nonemptiness of gambles.

The idea that rational agents act in a manner isomorphic to expected-utility maximizers is often used here, typically justified with the Von Neumann-Morgenstern theorem.  (The last of Von Neumann and Morgenstern's axioms, the independence axiom, can be grounded in a Dutch book argument.)  But the Von Neumann-Morgenstern theorem assumes that the agent already measures its beliefs with (finitely additive) probabilities.  This in turn is often justified with Cox's theorem (valid so long as we assume a "large world", which is implied by e.g. the existence of a fair coin).  But Cox's theorem assumes as an axiom that the plausibility of a statement is taken to be a real number, a very large assumption!  I have also seen this justified here with Dutch book arguments, but these all seem to assume that we are already using some notion of expected utility maximization (which is not only somewhat circular, but also a considerably stronger assumption than that plausibilities are measured with real numbers).

There is a way of grounding both (finitely additive) probability and utility simultaneously, however, as detailed by Leonard Savage in his Foundations of Statistics (1954).  In this article I will state the axioms and definitions he gives, give a summary of their logical structure, and suggest a slight modification (which is equivalent mathematically but slightly more philosophically satisfying).  I would also like to ask the question: To what extent can these axioms be grounded in Dutch book arguments or other more basic principles?  I warn the reader that I have not worked through all the proofs myself and I suggest simply finding a copy of the book if you want more detail.

Peter Fishburn later showed in Utility Theory for Decision Making (1970) that the axioms set forth here actually imply that utility is bounded.

(Note: The versions of the axioms and definitions in the end papers are formulated slightly differently from the ones in the text of the book, and in the 1954 version have an error. I'll be using the ones from the text, though in some cases I'll reformulate them slightly.)

continue reading »

Colonization models: a tutorial on computational Bayesian inference (part 2/2)

19 snarles 17 May 2011 03:54AM

Recap

Part 1 was a tutorial for programming a simulation for the emergence and development of intelligent species in a universe 'similar to ours.'  In part 2, we will use the model developed in part 1 to evaluate different explanations of the Fermi paradox. However, keep in mind that the purpose of this two-part series is for showcasing useful methods, not for obtaining serious answers.

We summarize the model given in part 1:

SIMPLE MODEL FOR THE UNIVERSE

  • The universe is represented by the set of all points in Cartesian 4-space which are of Euclidean distance 1 from the origin (that is, the 3-sphere).  The distance between two points is taken to be the Euclidean distance (an approximation to the spherical distance which is accurate at small scales)
  • The lifespan of the universe consists of 1000 time steps.
  • A photon travels s=0.0004 units in a time step.
  • At the end of each time step, there is a chance that a Type 0 civilization will spontaneously emerge in an uninhabited region of space.  The base rate for civilization birth is controlled by the parameter a.  But this base rate is multiplied by the proportion of the universe which remains uncolonized by Type III civilizations.
  • In each time step, a Type 0 civilization has a probability b of self-destructing, a probability c of transitioning to a non-expansionist Type IIa civilization, and a probability d of transitioning to a Type IIb civilization.
  • Observers can detect all Type II and Type III civilizations within their past light cones.
  • In each time step, a Type IIb civilization has a probability e of transitioning to an expansionist Type III civilization.
  • In each time step, all Type III civilizations colonize space in all directions, expanding their sphere of colonization by k * s units per time step.

 

Section III.  Inferential Methodology

In this section, no apologies are made for assuming that the reader has a solid grasp of the principles of Bayesian reasoning.  Those currently following the tutorial from Part 1 may find it a good idea to skip to Section IV first.

To dodge the philosophical controversies surrounding anthropic reasoning, we will employ an impartial observer model.  Like Jaynes, we introduce a robot which is capable of Bayesian reasoning, but here we imagine a model in which such a robot is instantaneously created and randomly injected into the universe at a random point in space, and at a random time point chosen uniformly from 1 to 1000 (and the robot is aware that it is created via this mechanism).  We limit ourselves to asking what kind of inferences this robot would make in a given situation.  Interestingly, the inferences made by this robot will turn out to be quite similar to the inferences that would be made under the self-indication assumption.

continue reading »

Colonization models: a programming tutorial (Part 1/2)

23 snarles 16 May 2011 11:33PM

Introduction

Are we alone in the universe?  How likely is our species to survive the transition from a Type 0 to a Type II civilization?  The answers to these questions would be of immense interest to our race; however, we have few tools to reason about these questions.  This does not stop us from wanting to find answers to these questions, often by employing controversial principles of inference such as 'anthropic reasoning.'  The reader can find a wealth of stimulating discussion about anthropic reasoning at Katja Grace's blog, the site from which this post takes its inspiration.  The purpose of this post is to give a quantitatively oriented approach to anthropic reasoning, demonstrating how computer simulations and Bayesian inference can be used as tools for exploration.

The central mystery we want to examine is the Fermi paradox: the fact that

  1. we are an intelligent civilization
  2. we cannot observe any signs that other intelligent civilizations ever existed in the universe

One explanation for the Fermi paradox is that we are the only intelligent civilization in the universe.  A far more chilling explanation is that intelligent civilizations emerge quite frequently, but that all other intelligent civilizations that have come before us ended up destroying themselves before they could manage to make their mark on their universe.

We can reason about which of the above two explanations are more likely if we have the audacity to assume a model for the emergence and development of civilizations in universe 'similar to ours.'  In such a model, it is usually useful to distinguish different 'types' of civilizations.  Type 0 civilizations are civilizations with similar levels of technology as ourselves.  If a Type 0 civilization survives long enough and accumulates enough scientific knowledge, it can make a transition to a Type I civilization--a civilization which has attained mastery of their home planet.  A Type I civilization, over time, can transition to a Type II civilization if it colonizes its solar system.  We would suppose that a nearby civilization would have to have reached Type II in order for their activities to be prominent enough for us to be able to detect them.  In the original terminology, a Type III civilization is one which has mastery of its galaxy, but in this post we take it to mean something else.

The simplest model for the emergence and development of civilizations would have to specify the following:

  1. the rate at which intelligent life appears in universes similar to ours;
  2. the rate at which these intelligent species transition from Type 0 to Type II, Type III civilizations--or self-destruct in the process;
  3. the visibility of Type II and Type III civilizations to Type 0 civilizations elsewhere
  4. the proportion of advanced civilizations which ultimately adopt expansionist policies;
  5. the speed at which those Type III civilizations can expand and colonize the universe.

In the model we propose in the post, the above parameters are held to be constant throughout the entire history of the universe.  The importance of the model is that after given a particular specification of the parameters, we can apply Bayesian inference to see how well the model explains the Fermi paradox.  The idea is to simulate many different histories of universes for a given set of parameters, so as to find the expected number of observers who observe the Fermi paradox given a particular specification of the parameters.  More details about Bayesian inference given in Part 2 of this tutorial.

This post is targeted at readers who are interested in simulating the emergence and expansion of intelligent civilizations in 'universes similar to ours' but who lack the programming knowledge to code these simulations.  In this post we will guide the reader through the design and production of a relatively simple universe model and the methodology for doing 'anthropic' Bayesian inference using the model.

continue reading »

Dutch Books and Decision Theory: An Introduction to a Long Conversation

19 Jack 21 December 2010 04:55AM

For a community that endorses Bayesian epistemology we have had surprisingly few discussions about the most famous Bayesian contribution to epistemology: the Dutch Book arguments. In this post I present the arguments, but it is far from clear yet what the right way to interpret them is or even if they prove what they set out to. The Dutch Book arguments attempt to justify the Bayesian approach to science and belief; I will also suggest that any successful Dutch Book defense of Bayesianism cannot be disentangled from decision theory. But mostly this post is to introduce people to the argument and to get people thinking about a solution. The literature is scant enough that it is plausible people here could actually make genuine progress, especially since the problem is related to decision theory.1

Bayesianism fits together. Like a well-tailored jacket it feels comfortable and looks good. It's an appealing, functional aesthetic for those with cultivated epistemic taste. But sleekness is not a rigourous justification and so we should ask: why must the rational agent adopt the axioms of probability as conditions for her degrees of belief? Further, why should agents accept the principle conditionalization as a rule of inference? These are the questions the Dutch Book arguments try to answer.

The arguments begin with an assumption about the connection between degrees of belief and willingness to wager. An agent with degree of belief b in hypothesis h is assumed to be willing to buy wager up to and including $b in a unit wager on h and sell a unit wager on h down to and including $b. For example, if my degree of belief that I can drink ten eggnogs without passing out is .3 I am willing to bet $0.30 on the proposition that I can drink the nog without passing out when the stakes of the bet are $1. Call this the Will-to-wager Assumption. As we will see it is problematic.

continue reading »

Inherited Improbabilities: Transferring the Burden of Proof

30 komponisto 24 November 2010 03:40AM

One person's modus ponens is another's modus tollens.

- Common saying among philosophers and other people who know what these terms mean.

If you believe A => B, then you have to ask yourself: which do I believe more? A, or not B?

- Hal Daume III, quoted by Vladimir Nesov.

Summary: Rules of logic have counterparts in probability theory. This post discusses the probabilistic analogue of modus tollens (the rule that if A=>B is true and B is false, then A is false), which is the inequality P(A) ≤ P(B)/P(B|A). What this says, in ordinary language, is that if A strongly implies B, then proving A is approximately as difficult as proving B. 

The appeal trial for Amanda Knox and Raffaele Sollecito starts today, and so to mark the occasion I thought I'd present an observation about probabilities that occurred to me while studying the "motivation document"(1), or judges' report, from the first-level trial.

One of the "pillars" of the case against Knox and Sollecito is the idea that the apparent burglary in the house where the murder was committed -- a house shared by four people, namely Meredith Kercher (the victim), Amanda Knox, and two Italian women -- was staged. That is, the signs of a burglary were supposedly faked by Knox and Sollecito in order to deflect suspicion from themselves. (Unsuccessfully, of course...)

As the authors of the report, presiding judge Giancarlo Massei and his assistant Beatrice Cristiani, put it (p.44):

What has been explained up to this point leads one to conclude that the situation of disorder in Romanelli's room and the breaking of the window constitute an artificially created production, with the purpose of directing investigators toward someone without a key to the entrance, who would have had to enter the house via the window whose glass had been broken and who would then have perpetrated the violence against Meredith that caused her death.

continue reading »

If a tree falls on Sleeping Beauty...

83 ata 12 November 2010 01:14AM

Several months ago, we had an interesting discussion about the Sleeping Beauty problem, which runs as follows:

Sleeping Beauty volunteers to undergo the following experiment. On Sunday she is given a drug that sends her to sleep. A fair coin is then tossed just once in the course of the experiment to determine which experimental procedure is undertaken. If the coin comes up heads, Beauty is awakened and interviewed on Monday, and then the experiment ends. If the coin comes up tails, she is awakened and interviewed on Monday, given a second dose of the sleeping drug, and awakened and interviewed again on Tuesday. The experiment then ends on Tuesday, without flipping the coin again. The sleeping drug induces a mild amnesia, so that she cannot remember any previous awakenings during the course of the experiment (if any). During the experiment, she has no access to anything that would give a clue as to the day of the week. However, she knows all the details of the experiment.

Each interview consists of one question, “What is your credence now for the proposition that our coin landed heads?”

In the end, the fact that there were so many reasonable-sounding arguments for both sides, and so much disagreement about a simple-sounding problem among above-average rationalists, should have set off major alarm bells. Yet only a few people pointed this out; most commenters, including me, followed the silly strategy of trying to answer the question, and I did so even after I noticed that my intuition could see both answers as being right depending on which way I looked at it, which in retrospect would have been a perfect time to say “I notice that I am confused” and backtrack a bit…

And on reflection, considering my confusion rather than trying to consider the question on its own terms, it seems to me that the problem (as it’s normally stated) is completely a tree-falling-in-the-forest problem: a debate about the normatively “correct” degree of credence which only seemed like an issue because any conclusions about what Sleeping Beauty “should” believe weren’t paying their rent, were disconnected from any expectation of feedback from reality about how right they were.

continue reading »

Frugality and working from finite data

27 Snowyowl 03 September 2010 09:37AM

The scientific method is wonderfully simple, intuitive, and above all effective. Based on the available evidence, you formulate several hypotheses and assign prior probabilities to each one. Then, you devise an experiment which will produce new evidence to distinguish between the hypotheses. Finally, you perform the experiment, and adjust your probabilities accordingly. 

So far, so good. But what do you do when you cannot perform any new experiments?

This may seem like a strange question, one that leans dangerously close to unprovable philosophical statements that don't have any real-world consequences. But it is in fact a serious problem facing the field of cosmology. We must learn that when there is no new evidence that will cause you to change your beliefs (or even when there is), the best thing to do is to rationally re-examine the evidence you already have.

continue reading »

Taking Ideas Seriously

51 Will_Newsome 13 August 2010 04:50PM

I, the author, no longer endorse this post.


 

Abstrummary: I describe a central technique of epistemic rationality that bears directly on instrumental rationality, and that I do not believe has been explicitly discussed on Less Wrong before. The technnique is rather simple: it is the practice of taking ideas seriously. I also present the rather simple metaphor of an 'interconnected web of belief nodes' (like a Bayesian network) to describe what it means to take an idea seriously: it is to update a belief and then accurately and completely propagate that belief update through the entire web of beliefs in which it is embedded. I then give a few examples of ideas to take seriously, followed by reasons to take ideas seriously and what bad things happens if you don't (or society doesn't). I end with a few questions for Less Wrong.

continue reading »

Applied Bayes' Theorem: Reading People

24 Kaj_Sotala 30 June 2010 05:21PM

Or, how to recognize Bayes' theorem when you meet one making small talk at a cocktail party.

Knowing the theory of rationality is good, but it is of little use unless we know how to apply it. Unfortunately, humans tend to be poor at applying raw theory, instead needing several examples before it becomes instinctive. I found some very useful examples in the book Reading People: How to Understand People and Predict Their Behavior - Anytime, Anyplace. While I didn't think that it communicated the skill of actually reading people very well, I did notice that it did have one chapter (titled "Discovering Patterns: Learning to See the Forest, Not Just the Trees") that could almost have been a collection of Less Wrong posts. It also serves as an excellent example of applying Bayes' theorem in every-day life.

In "What is Bayesianism?" I said that the first core tenet of Bayesianism is "Any given observation has many different possible causes". Reading People says:

If this book could deliver but one message, it would be that to read people effectively you must gather enough information about them to establish a consistent pattern. Without that pattern, your conclusions will be about as reliable as a tarot card reading.

In fact, the author is saying that Bayes' theorem applies when you're trying to read people (if this is not immediately obvious, just keep reading). Any particular piece of evidence about a person could have various causes. For example, in a later chapter we are offered a list of possible reasons for why someone may have dressed inappropriately for an occasion. They might (1) be seeking attention, (2) lack common sense, (3) be self-centered and insensitive to others, (4) be trying to show that they are spontaneous, rebellious, or noncomformists and don't care what other people think, (5) not have been taught how to dress and act appropriately, (6) be trying to imitate someone they admire, (7) value comfort and convenience over all else, or (8) simply not have the right attire for the occasion.

Similarly, very short hair on a man might indicate that he (1) is in the military, or was at some point in his life, (2) works for an organization that demands very short hair, such as a police force or fire department, (3) is trendy, artistic or rebellious, (4) is conservative, (5) is undergoing or recovering from a medical treatment, (6) thinks he looks better with short hair, (7) plays sports, or (8) keeps his hair short for practical reasons.

So much for reading people being easy. This, again, is the essence of Bayes' theorem: even though somebody being in the military might almost certainly mean that they'd have short hair, them having a short hair does not necessarily mean that they are in the military. On the other hand, if someone has short hair, is clearly knowledgeable about weapons and tactics, displays a no-nonsense attitude, is in good shape, and has a very Spartan home... well, though it's still not for certain, it seems likely to me that of all the people having all of these attributes, quite a few of them are in the military or in similar occupations.

continue reading »

Development of Compression Rate Method

11 Daniel_Burfoot 20 May 2010 05:11PM

 

Summary: This post provides a brief discussion of the traditional scientific method, and mentions some areas where the method cannot be directly applied. Then, through a series of thought experiments, a set of minor modifications to the traditional method are presented. The result is a refined version of the method, based on data compression.

Related to: Changing the Definition of Science, Einstein's Arrogance, The Dilemma: Science or Bayes?

ETA: For those who are familiar with notions such as Kolmogorov Complexity and MML, this piece may have a low ratio of novelty:words. The basic point is that one can compare scientific theories by instantiating them as compression programs, using them to compress a benchmark database of measurements related to a phenomenon of interest, and comparing the resulting codelengths (taking into account the length of the compressor itself).

continue reading »

View more: Prev | Next