This afternoon I heard a news story about a middle eastern country where one person said of the defenses for a stockpile of nuclear weapons, "even if there is only a 1% probability of the defenses failing, we should do more to strengthen them given the consequences of their failure".  I have nothing against this person's reasoning, but I do have an issue with where that 1% figure came from.

The statement above and others like it share a common problem:  they are phrased such that it's unclear over what probability space the measure was taken.  In fact, many journalist and other people don't seem especially concerned by this.  Even some commenters on Less Wrong give little indication of the probability space over which they give a probability measure of an event, and nobody calls them on it.  So what is this probability space they are giving probability measurements over?

If I'm in a generous mood, I might give the person presenting such a statement the benefit of the doubt and suppose they were unintentionally ambiguous.  On the defenses of the nuclear weapon stockpile, the person might have meant to say "there is only a 1% probability of the defenses failing over all attacks", as in "in 1 attack out of every 100 we should expect the defenses to fail".  But given both my experiences with how people treat probability and my knowledge of naive reasoning about probability, I am dubious of my own generosity.  Rather, I suspect that many people act as though there were a universal probability space over which they may measure the probability of any event.

To illustrate the issue, consider the probability that a fair coins comes up heads.  We typically say that there is a 1/2 chance of heads, but what we are implicitly saying is that given a probability measure P on the measurable space ({heads, tails}, {{}, {heads}, {tails}, {heads, tails}}), P({heads}) = P({tails}) = 1/2 and P({}) = 0 and P({heads, tails}) = 1.  But if we look at the issue of a coin coming up heads from a wider angle, we could interpret it as "what is the probability of some particular coin sitting heads-up over the span of all time", which is another question all together.  What this is asking is "what is the probability of the event that this coin sits heads-up over the universal probability space", i.e. the probability space of all events that could occur at some time during the existence of the universe, and we have no clear way to calculate the probability of such an event other than to say that the universal probability space must contain infinitely many (how infinitely is still up for debate) events of measure zero.  So there is a universal probability space; it's just not very useful to us, hence the title of the article, since it practically doesn't exist for us.

None of this is to say, though, that the people committing these crimes against probability are aware of what probability space they are taking a measure over.  Many people act as if there is some number they can assign to any event which tells them how likely it is to occur and questions of "probability spaces" never enter their minds.  What does it mean that something happens 1% of the time?  I don't know; maybe that it doesn't happen 99% of the time?  How is 1% of the time measured?  I don't know; maybe one out of every 100 seconds?  Their crime is not one of mathematical abuse but of mathematical ignorance.

As aspiring rationalists, if we measure a probability, we ought to know over what probability space we're measuring.  Otherwise a probability isn't well defined and is just another number that, at best, is meaningless and, at worst, can be used to help us defeat ourselves.  Even if it's not always a good stylistic choice to make the probability space explicit in our speech and writing, we must always know over what probability space we are measuring a probability.  Otherwise we are just making up numbers to feel rational.

New Comment
43 comments, sorted by Click to highlight new comments since:

Use common sense. This is no different from other matters of imprecise communication. Calling up on the meaninglessness of assertions about coin flips really does sound silly.

probability measure P on the measurable space ({heads, tails}, {{}, {heads}, {tails}, {heads, tails}}), P({heads}) = P({tails}) = 1/2 and P({}) = P({heads, tails}) = 0

I think you mean P({heads, tails}) = 1.

Wow, that's a pretty bad error to miss. Fixed.

The more I think about it, the sadder it is that it took LW a day and a half to catch that error.

We typically say that there is a 1/2 chance of heads, but what we are implicitly saying is that given a probability measure P on the measurable space ({heads, tails}, {{}, {heads}, {tails}, {heads, tails}})

Giving a probability of one event only implies we think that particular event is possible. It doesn't say anything about what other events we are considering, so there is no necessity to describe the entire space of possibilities.

Many people act as if there is some number they can assign to any event which tells them how likely it is to occur and questions of "probability spaces" never enter their minds. What does it mean that something happens 1% of the time? I don't know; maybe that it doesn't happen 99% of the time? How is 1% of the time measured? I don't know; maybe one out of every 100 seconds?

Under a Bayesian interpretation of probability, which is generally used here, probability does not express how frequent something will occur. Instead, it represents your belief an event will occur or a proposition is true. Then p=0.01 means "possible enough to consider, but very doubtful". I think most people naturally adopt a Bayesian perspective, so I'm not sure what the problem is.

Giving a probability of one event only implies we think that particular event is possible. It doesn't say anything about what other events we are considering, so there is no necessity to describe the entire space of possibilities.

Just because you don't care about measuring other probabilities in the space doesn't mean that you can ignore it. If you don't know what the space is, it's like taking a blank piece of paper, putting an "x" on it, and saying that's where the treasure is buried: not only do you not know the territory, but you don't even know enough about the map for that "x" to have any value.

I think most people naturally adopt a Bayesian perspective, so I'm not sure what the problem is.

I think you're giving too much credit here. Go out and slip into casual conversation a remark about the probability of something and see how people treat it. You could be right about the human brain, though, and maybe it's really a First World problem created by "numerical literacy" education in schools to try to help people read the news. Every time they hear a percentage they think of the frequentist interpretation they learned in school.

I'm surprised more people don't take this point seriously here. Really, if someone says there's a 1% chance of the defenses failing, what does that actually mean? Sure, in this context it might not matter because the point stands, but probabilities are misused this way all the time.

This sounds just like something I've always wondered about: the percentages they give in weather reports, for likelihood of rain. Does '30% chance of rain' mean that they estimate a 30% chance of getting any rain, or that they think it'll rain for 30% of the day, or what?

Yes, it means 30% chance of any rain today, and as Cosmos points out it's based primarily on historical data: 30% of historical situations significantly like this have head rain. And since the estimates update based on new data, they're practically tautologous.

I don't have a reference for that though.

I thought this was based off of historical data, although I don't remember the source and could easily be wrong.

If I am not wrong, it should be interpreted as: "It has rained 30% of the time we have had similar weather conditions in the past."

My memory is telling me that it should be translated, "We expect 30% of the geographic area to get rain today." I have no good reference for this.

It depends on where you live. According to The Straight Dope it is typically used to mean probability it will rain compared to historical data. But in some places it is used differently, especially where the weather conditions permit more definite rain forecasts. I live in Florida and there is rarely a question of whether or not it will rain but rather a question of how much geographical area will receive rain. During the wet months the probability of any rain is very high, so it's only a matter of how much area is going to get hit. During the dry months, the only rain we receive typically comes in the form of fronts moving down from the north, so again chances of any rain tend towards 1 or 0 while the coverage can vary.

Please link to the news story you're referring to.

I think most LW people take a Bayesian view of probability, where probability is a consistent, numerical measure of how confident we are that a proposition is true, updated according to Bayes rule, etc. You're advocating the mathematical view of probability, where probability really means "probability measure", ie the measure on a measure space of measure 1.

It's not that one of these views is right and the other is wrong. They're actually describing two slightly differentthings. Measure-theoretic probability is something we can have a completely rigorous mathematical theory of, because it's assumptions are technical and precise. But having a mathematical theory doesn't necessarily mean you can apply it to the real world. You need knowledge and judgment to determine what real-world phenomena are modeled by your theory and how well. Measure theoretic probability models card games very well (measure space = the uniform distribution over the 52! ways that the deck could be shuffled) if the dealer is honest, but it doesn't help you decide weather to trust the dealer.

Bayesian probability is a less rigorous, but more directly-applicable-to-real-life theory. It shares a lot of terminology and theorems with measure theoretic probability, but it isn't quite the same thing. In particular, in Bayesian probability you don't have a probability space, so when you see people here talking probability without specifying the space, it's not an error, they're just being Bayesian.

Jayne's "Probability Theory" is a great book on Bayesian probability theory, but if you've got a Mathematical education, which I suspect you might, it's going to piss you off. Just ignore the parts where Jaynes bloviates about things he doesn't understand, and learn from the parts where he teaches the things he does. Most of the book is the latter.

Er... So, where does the measure-theoretic definition of probability become incompatible with "Bayesian probability" you talk about? Can you give a reference that supports your position? (I understand there are disputes on foundations, but representationally these all seem to be exactly the same thing.)

Who said they were incompatible? I only said they're different. In measure-theoretic probability theory, you start with a space, in Bayseian, you don't. In measure-theory land the propositions that get assigned probabilities must be subsets of the space, in Bayes-land they can be anything that's true or false (or will be in the future) in the real world.

The difference between the two is not a dispute on foundations. They really are two different, but overlapping theories. Measure-theoretic probability theory is a formal mathematical theory, like group theory or point-set topology, It's a set of theorems about mathematical objects (probability measures) that satisfy certain axioms. Those objects may be good models of something in real life, or they may not. Either way the theorems are still true. For a reference on this topic, see, well, any book on measure theory. There are lots of them. Here's one

http://www.amazon.com/Probability-Measure-3rd-Patrick-Billingsley/dp/0471007102/ref=sr_1_13?ie=UTF8&s=books&qid=1241625621&sr=1-13

Bayesian theory is just not that formal. What are the axioms of Bayesian theory? What propositions are allowed? How do you select Priors? Bayesian probability may use a lot of math, but math isn't what it is. It's more like physics than group theory.

Bayesian probability may use a lot of math, but math isn't what it is.

Yet it seems that math is what it should be. Bayesian probability, as it's used in probabilistic inference, is usually founded on the same Kolmogorov axioms, standard mathematical probability theory. I don't see any problems with the mathematical part, I dispute your characterization of Bayesian probability as an inherently informal theory (hence it was taken in quotation marks in my comment).

[-]Cyan20

I think smoofra is talking about the same sorts of things Jaynes is when he writes:

The danger here is particularly great because mathematicians generally regard these limit theorems as the most important and sophisticated fruits of probability theory, and have a tendency to use language which implies that they are proving properties of the real world. Our point is that these theorems are valid properties of the abstract mathematical model that was defined and analyzed [emphasis in original]. The issue is: to what extent does that model resemble the real world? It is probably safe to say that no limit theorem is directly applicable in the real world, simply because no mathematical model captures every circumstance that is relevant in the real world.

- PT:LOS, pp 65-66.

ADBOC

Jaynes aggressively scorns abstract mathematics. I love abstract mathematics. We both agree that just because you have a model or a theorem, it doesn't necessarily apply to the real world.

edit: (ADBOC directed to jaynes, not to cyan)

[-]Cyan20

I come to quote Jaynes, not to praise him; the scorn that men write lives after them, the good is oft interred with their bones -- let it not be thus with Jaynes.

Yep, you shouldn't wirehead yourself into developing a theory about the mathematical formalism, you should instead develop a theory about the world. But the theory that you develop should be mathematical where possible.

There's nothing wrong with doing pure Math, if you know that's what your doing.

Arguably there may be, if it can be shown that you normatively should worry only about the real world, even if what you are doing in the real world is thinking math.

if it can be shown that you normatively should worry only about the real world,

It can't be. Not in any system of norms I would give a fig about. Art, Fiction, and Math are worthwhile. They don't have to be useful. If you disagree with that, then we simply have different utility functions, and there's no point in arguing further.

You are seeing "useful" too narrowly. I only stated that whatever you consider "useful", it's probably a statement exclusively about the real world, and "doing math" is one of the activities in the real world. I don't see how you could place Art in the same cached thought, since it was remarked many times that you shouldn't go Spock.

Any theory about the real world is inherently informal.

Do you disagree that Bayesian probability theory is about as informal as physics, or do you disagree with my characterization of physics as informal? If it's the latter, then we don't disagree on anything except the meaning of words.

A theory about the real world may be perfectly formal, it just won't have a perfectly formal applicability proof. On the other hand, if you can show that a theory is applicable with probability of 1-2^{-10000}, it's as good as formally proven to apply.

I disagree that it's correct terminology to call a theory informal, just because it's can't be formally proven to apply to the real world.

It's not the lack of a proof that makes it informal, it's that the elements themselves of the theory aren't precisely, formally, mathematically defined. A valid proposition in measure-theoretic probability is a subset of the measure space. nothing else will do. Propositions in Bayseian probability are written in natural language, about events in the real world.

I'm using the word "formal" in the sense that it is used in mathematics. If you're going to say that propositions written in natural language, about events in the real world are "formal" in that sense, then you're just refusing to communicate.

[-]kim0-30

All recursive probability spaces converge to the same probabilities, as the information increases.

Not that those people making up probabilities knows anything about that.

If you want an universal probability space, just take some universal computer, run all programs on it, and keep those that output event A. Then you can see how many of those that output event B, and thus you can get p(B|A) whatever A and B are.

This is algorithmic information theory, and should be known by any black belt bayesian.

Kim Øyhus

All recursive probability spaces converge to the same probabilities, as the information increases.

Google gives 0 hits on "recursive probability space". Blanket assertions like this need to be technically precise.

I refer interested readers to the Algorithmic probability article on Scholarpedia.

[-]kim0-10

The technically precise reference was this part:

"This is algorithmic information theory,.."

But if you claim my first line was too obfuscated, I can agree.

Kim Øyhus

Please specify in what sense the first line was correct, or declare it an error. Pronouncing assertions known to be incorrect and then just shrugging that off shouldn't be acceptable on this forum.

[-]kim000

O.K.

One wants an universal probability space where one can find the probability of any event. This is possible:

One way of making such a space is to take all recursive functions of some universal computer, run them, and storing the output, resulting in an universal probability space because every possible set of events will be there, as the results of infinitely many recursive functions, or programs as they are called. The probabilities corresponds to the density of these outputs, these events.

A counterargument is that it is too dependent on the actual universal computer chosen. However, theorems in algorithmic information theory shows that this dependence converges asymptotically as information increases, because the difference of densities of different outputs from different universal computers can at most be 2 to the power of the shortest program simulating the universal computer in another universal computer.

Kim Øyhus

One way of making such a space is to take all recursive functions of some universal computer, run them, and storing the output,

OK....

resulting in an universal probability space because every possible set of events will be there

what!? You haven't yet described a probability space. The aforementioned set is infinite, so the uniform distribution is unavailable. What probability distribution will you have on this set of recursive-function-runs. And in what way is the resulting probability space universal?

[+]kim0-50

As far as I can tell, you are talking absolute gibberish.

If I'm wrong, please explain.

edit: if someone who downvoted me could please explain what the heck a "recursive probability space" is supposed to be, I'd appreciate it.