Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

(Subjective Bayesianism vs. Frequentism) VS. Formalism

27 Post author: potato 26 November 2011 05:05AM

One of the core aims of the philosophy of probability is to explain the relationship between frequency and probability. The frequentist proposes identity as the relationship. This use of identity is highly dubious. We know how to check for identity between numbers, or even how to check for the weaker copula relation between particular objects; but how would we test the identity of frequency and probability? It is not immediately obvious that there is some simple value out there which is modeled by probability, like position and mass are values that are modeled by Newton's Principia. You can actually check if density * volume = mass, by taking separate measurements of mass, density and volume, but what would you measure to check a frequency against a probability?

There are certain appeals to frequentest philosophy: we would like to say that if a bag has 100 balls in it, only 1 of which is white, then the probability of drawing the white ball is 1/100, and that if we take a non-white ball out, the probability of drawing the white ball is now 1/99. Frequentism would make the philosophical justification of that inference trivial. But of course, anything a frequentist can do, a Bayesian can do (better). I mean that literally: it's the stronger magic.

A Subjective Bayesian, more or less, says that the reason frequencies are related to probabilities is because when you learn a frequency you thereby learn a fact about the world, and one must update one's degrees of belief on every available fact. The subjective Bayesian actually uses the copula in another strange way:

Probability is subjective degree of belief.

and subjective Bayesians also claim:

Probabilities are not in the world, they are in your mind.

These two statements are brilliantly championed in Probability is Subjectively Objective. But ultimately, the formalism which I would like to suggest denies both of these statements. Formalists do not ontologically commit themselves to probabilities, just as they do not say that numbers exist; hence we don't allocate probabilities in the mind or anywhere else; we only commit ourselves to number theory, and probability theory. Mathematical theories are simply repeatable processes which construct certain sequences of squiggles called "theorems", by changing the squiggles of other theorems, according to certain rules called "inferences". Inferences always take as input certain sequences of squiggles called premises, and output a sequence of squiggles called the conclusion. The only thing an inference ever does is add squiggles to a theorem, take away squiggles from a theorem, or both. It turns out that these squiggle sequences mixed with inferences can talk about almost anything, certainly any computable thing. The formalist does not need to ontologically commit to numbers to assert that "There is a prime greater than 10000.", even though "There is x such that" is a flat assertion of existence; because for the formalist "There is a prime greater than 10000." simply means that number theory contains a theorem which is interpreted as "there is a prime greater than 10000." When you say a mathematical fact in English, you are interpreting a theorem from a formal theory. If under your suggested interpretation, all of the theorems of the theory are true, then whatever system/mechanism your interpretation of the theory talks about, is said to be modeled by the theory.

So, what is the relation between frequency and probability proposed by formalism? Theorems of probability, may be interpreted as true statements about frequencies, when you assign certain squiggles certain words and claim the resulting natural language sentence. Or for short we can say: "Probability theory models frequency." It is trivial to show that Komolgorov models frequency, since it also models fractions; it is an algebra after all. More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; dutch-book arguments do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here. If Bayesian probability theory really does model rational belief, which many believe it does, then that is likely the most interesting thing we are ever going to be able to model with probability. But probability theory also models spatial measurement? Why not add the position that probability is volume to the debating lines of the philosophy of probability?

Why are frequentism's and subjective Bayesianism's misuses of the copula not as obvious as volumeism's? This is because what the Bayesian and frequentest are really arguing about is statistical methodology, they've just disguised the argument as an argument about what probability is. Your interpretation of probability theory will determine how you model uncertainty, and hence determine your statistical methodology. Volumeism cannot handle uncertainty in any obvious way; however, the Bayesian and frequentest interpretations of probability theory, imply two radically different ways of handling uncertainty.

The easiest way to understand the philosophical dispute between the frequentist and the subjective Bayesian is to look at the classic biased coin:

A subjective Bayesian and a frequentist are at a bar, and the bartender (being rather bored) tells the two that he has a biased coin, and asks them "what is the probability that the coin will come up heads on the first flip?" The frequentist says that for the coin to be biased means for it not have a 50% chance of coming up heads, so all we know is that it has a probability that is not equal 50%. The Bayesain says that that any evidence I have for it coming up heads, is also evidence for it coming up tails, since I know nothing about one outcome, that doesn't hold for its negation, and the only value which represents that symmetry is 50%.

I ask you. What is the difference between these two, and the poor souls engaged in endless debate over realism about sound in the beginning of Making Beliefs Pay Rent?

If a tree falls in a forest and no one hears it, does it make a sound? One says, "Yes it does, for it makes vibrations in the air." Another says, "No it does not, for there is no auditory processing in any brain."

One is being asked: "Are there pressure waves in the air if we aren't around?" the other is being asked: "Are there auditory experiences if we are not around?" The problem is that "sound" is being used to stand for both "auditory experience" and "pressure waves through air". They are both giving the right answers to these respective questions. But they are failing to Replace the Symbol with the Substance and they're using one word with two different meanings in different places. In the exact same way, "probability" is being used to stand for both "frequency of occurrence" and "rational degree of belief" in the dispute between the Bayesian and the frequentist. The correct answer to the question: "If the coin is flipped an infinite amount of times, how frequently would we expect to see a coin that landed on heads?" is "All we know, is that it wouldn't be 50%." because that is what it means for the coin to be biased. The correct answer to the question: "What is the optimal degree of belief that we should assign to the first trial being heads?" is "Precisely 50%.", because of the symmetrical evidential support the results get from our background information. How we should actually model the situation as statisticians depends on our goal. But remember that Bayesianism is the stronger magic, and the only contender for perfection in the competition.

For us formalists, probabilities are not anywhere. We do not even believe in probability technically, we only believe in probability theory. The only coherent uses of "probability" in natural language are purely syncategorematic. We should be very careful when we colloquially use "probability" as a noun or verb, and be very careful and clear about what we mean by this word play. Probability theory models many things, including degree of belief, and frequency. Whatever we may learn about rationality, frequency, measure, or any of the other mechanisms that probability models, through the interpretation of probability theorems, we learn because probability theory is isomorphic to those mechanisms. When you use the copula like the frequentist or the subjective Bayesian, it makes it hard to notice that probability theory modeling both frequency and degree of belief, is not a contradiction. If we use "is" instead of "model", it is clear that frequency is not degree of belief, so if probability is belief, then it is not frequency.  Though frequency is not degree of belief, frequency does model degree of belief, so if probability models frequency, it must also model degree of belief.

Comments (106)

Comment author: Will_Newsome 13 December 2011 02:44:15PM *  4 points [-]

User:potato smartly linked to Wei Dai's post "Frequentist Magic vs. Bayesian Magic", which along with its commentary would perhaps be the next thing to read after this post. (Wei Dai and user:potato think that (algorithmic) Bayesian magic is stronger but I agree with Toby Ord's and Vladimir Slepnev's points and would caution against hasty conclusions.)

More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; dutch-book arguments do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here.

In a sense there has been highly important and insightful work done in the vein of the Dutch book arguments, looking into e.g. how decisions should be made under what is called "indexical" or "anthropic" uncertainty, a situation where Bayesian reasoning seems like it ought to work but doesn't as such. This work has been done in large part by Less Wrong-cognizant folk. One easy-to-understand summary of some of the motivations for such research can be found found in User:ata's analysis of the classic "sleeping beauty" problem which is similar to User:potato's disambiguating approach used in the above post (and note Vladimir Nesov's similar warning against equivocation on the word "probability"). If you're interested in the foundations of justification (whether epistemic or moral, or what's the difference?), then you'd be wise to look into the big open questions in decision theory. There are a lot of 'em.

Comment author: [deleted] 25 November 2011 02:54:41PM *  7 points [-]

.

Comment author: potato 27 November 2011 05:47:23AM *  10 points [-]

In other words, the OP has mixed up the quotation and the referent (or the representation and the referent).

It seems to me that I am the one proposing a sharp distinction between probability theory (the representation), and rational degree of belief (the referent). If you say that probability is degree of belief, you destroy all the distinction between the model and the modeled. If by "probability" you mean subjective degree of belief, I don't really care what you call it. But know that "probability" has been used in ways which are not consistent with that synonymy claim. By the fact that we do not have 100% belief that bayes does model ideal inference with uncertainty, this means that bayesian probability is not identical to subjective belief given out knowledge. If X is identical to Y, then X is isomorphic-to/models Y. Because we can still conceive of bayes not perfectly modeling rationality, without implying a contradiction, this means that our current state of knowledge does not include that bayes is identical to subjective degree of belief.

We learn that something is probability by looking at probability theory, not by looking at subjective belief. If rational subjective belief turned out to not be modeled by probability theory, then we would say that subjective degree of belief was not like probability, not that probability theory does not define probability.

The first person to make bayes, may have been thinking about rationality when he/she first created the system, or he/she may have been thinking about spatial measurements, or he/she may have been thinking about finite frequencies, and he/she would have made the same formal system in every case. Their interpretations would have been different, but they would all be the one identical probability theory. Which one the actual creator was thinking of, is irrelevant. What spaces, beliefs, finite frequencies all have in common is that they are modeled by probability theory. To use "probability" to refer to one of these, over another, is a completely arbitrary choice (mind you I said finite frequency).

If we loose nothing by using "models" instead of "is", why would we ever use "is"? "Is' is a much stronger claim than "models". And frankly, I know how to check whether or not a given argument is an animal, for instance; how do I check if a given argument is a probability? I see if it satisifies the probability axioms. Finite frequency, measure, and rational degree of belief all seem to follow the probability axioms and inferences under specific, though similar, interpretations of probability theory.

Comment author: [deleted] 27 November 2011 03:14:42PM *  -3 points [-]

.

Comment author: nshepperd 27 November 2011 05:32:09PM *  5 points [-]

As far as I can understand you, you seem to think that because humans aren’t always rational in the sense described by Cox’s postulates, probability theory only “models” human reasoning under uncertainty. You also seem to think that probability theory is just “squiggles on paper”.

Only models? Just squiggles on paper?

You've misunderstood the article, I think. Probability theory (the Kolmogorov Axioms) does model correct degrees of belief and describes normatively what they should be. It also models "long-term frequencies" in the sense that the Kolmogorov Axioms also apply to such things.

None of this requires the word "probability" to refer to degrees of belief. You don't even need a word at all to do the math and get the right answer. It's convenient to use the word that way though, since we already have a word "frequency" that refers to the stupider idea.

(And also I suspect that most people learned the word at school mostly by being given examples of likely and unlikely things. For them, "probability" refers to the little progress bar in their mind that goes up for more likely things and down for less likely things [ie. degrees of belief]. And thus many frequentists may commit philosophical errors when they try to define it as frequencies then use the intuitive definition to draw a conclusion in the same argument. This alone is a good reason to use "probability" for beliefs and "frequencies" for, well, frequencies.)

Comment author: potato 27 November 2011 09:56:05PM 0 points [-]

Yes, we can use "probability is degree of belief" but we have to be very careful about this sort of word play, because what that really means is that "probability models degree of belief".

Comment author: Manfred 28 November 2011 04:45:57AM 0 points [-]

Probability doesn't come from attempting to model something out in the world. It comes from attempting to find a measure of degree of belief that's consistent with certain desiderata, like "you shouldn't believe both a thing and its opposite." So the phrase "probability models degree of belief" is false.

Comment author: potato 28 November 2011 11:25:05AM *  0 points [-]

You're riht, I mean to say "probability theory models theoretically optimal degree of belief updates, gven other degrees of belief". Or "probability theory models ideally rational degrees of belief."

Comment author: [deleted] 27 November 2011 06:12:02PM *  -2 points [-]

.

Comment author: potato 27 November 2011 09:39:04PM *  2 points [-]

Right. So what the heck's the point of the article? Why not just continue to say that probability is degrees of belief, and is not frequency?

Because then you'll keep arguing for decades about which one it really is, to absolutely no fruitful conclusion. Why not just keep saying that sound is air pressure and not auditory experience, or vice versa? When you do that, it makes it harder to see what is really going on. Call me conservative, but I think we should use as precise of a terminology as possible. Also, it seems to me that "probability is degree of belief" is an unverifiable claim, or I at least do not know what experiences I should test it with. But really, even in your own writing you don't feel comfortable using the copula as the relation between probability and degree of belief without italicizing it, doesn't that make you think that maybe there is a better word for the relation which you wouldn't feel like you need to italicize? How about "models"? And really we shouldn't be using probability as a noun, it's a function not an object, but we can deal with that later.

Exactly what about my article suggests that we should change our terminology to legitimize frequentism? I am saying that frequentism and subjective bayesianism both fail the moment they use the copula with probability as the subject, that is a stupid thing to do in philosophy. It's as bad as hegel. "Probability" is not a noun, it is a function, it is syncategorematic like "the", "or", "sake", etc. it is not categorematic; "probability" does not have a physical extension. And there are things that Volume has in common with degree of belief, which we might call probability like behavior. Again, if we found that degree of belief wasn't modeled by probability theory, we would say that subjective bayesianism was wrong, not that probability theory does not really describe probability. If "aubjective belief" did mean probability instead, if we found that probability theory did not model ideally rational degree of belief, we would say that komolgorov's axioms need to be fixed, they don't really define probability.

Comment author: [deleted] 27 November 2011 09:53:25PM *  -1 points [-]

.

Comment author: nshepperd 28 November 2011 04:57:35AM *  2 points [-]

That isn't why there's a frequentist/Bayesian dispute. Everyone agrees they are both "interpretations". As another commenter has pointed out, the semantic argument is just a proxy for the dispute over whether one or other interpretation is preferable either philosophically or in practical terms.

Calling them interpretations seems to imply that at most one of them can be correct. "Displacement of a falling object on earth" and "kinetic energy of an 18.6 kg object" aren't competing interpretations of the math f(x) = 9.8x^2, they're just two different things the equation applies to.

If the frequentists are making any error, it's denying that beliefs must be updated according to the Kolmogorov Axioms, not asserting that frequencies can also be treated with the same laws. It's denying the former that might lead them to apply incorrect methods in inference, which is the only problem that really matters.

Comment author: jsteinhardt 27 November 2011 10:13:26PM *  2 points [-]

I think the interpretation of probability and what methods to use for inference are two separate debates. There was a really good discussion post on this a while back.

I'm also curious as to who exactly these frequentists are that you are arguing against. Perhaps I am spoiled by hanging out with people who regularly have to solve statistical problems, and therefore need to have a reasonable conception of statistics, but most frequentist sentiments that I encounter are fairly well-reasoned, sometimes even pointing out legitimate issues with Bayesian statistics. It is true that I sometimes get incorrect claims that I have to correct, but I don't think becoming a Bayesian magically protects you from this.

EDIT: To clarify, the "frequentist sentiments" I referred to did not explicitly distinguish between interpretations of probability and inference algorithms, but as the goal was engineering I think the arguments were all implicitly pragmatic.

Comment author: [deleted] 27 November 2011 10:23:47PM 1 point [-]

I'm going by what I've read of Jaynes, Yudkowsky, and books by a couple of other writers on Bayesian statistics.

I don't believe there are any legitimate issues with Bayesian statistics, because Bayes's rule is derived from basic desiderata of rationality which I find entirely convincing, and it seems to me that the maximum entropy principle is the best computable approximation to Solomonoff induction (although I'd appreciate other opinions on that).

There may be legitimate issues with people failing to apply the simple mathematical laws of probability theory correctly, because the correct application can get very complicated - but that is not an issue with Bayesian statistics per se. I'm sure that in many cases, the wisest thing to do might be to use frequentist methods, but being a Bayesian does not prohibit someone from applying frequentist methods when they are a convenient approximation.

Comment author: jsteinhardt 28 November 2011 01:04:09AM *  3 points [-]

The two issues that come to mind are the difficulty of specifying priors and the computational infeasibility of performing Bayesian updates.

I don't think anyone can reasonably dispute that if the correct prior is handed to you, together with a black box for applying Bayes' rule, then you should perform Bayesian updates based on your data to get a posterior distribution. That is simply a mathematical theorem (Bayes' theorem). And yes, it is also a theorem (Cox's theorem) that any rational agent is implicitly using a prior. But we aren't yet in a position to create a perfectly rational agent, and until we are, worrying about the specific form of consistency that is invoked for Cox's theorem seems silly.

It's possible that we don't really disagree. As a purely abstract statement about what you should do given unlimited computational resources, sure, Solomonoff induction is the way to go. I definitely agree with that. But if you need to actually solve a specific practical problem, additional considerations come into play.

By the way, what do you mean by "the maximum entropy principle is the best computable approximation to Solomonoff induction"? That sounds intriguing, so I'd be interested to have you elaborate a bit.

Comment author: Ron_Fern 27 November 2011 10:35:56PM 2 points [-]

I'm pretty sure almost all of freqeuntist methods are derivable as from bayes, or close approximations of bayes. Do they have any tool which is radically un-bayesian?

Comment author: Ron_Fern 27 November 2011 10:34:34PM 1 point [-]

I think the interpretation of probability and what methods to use for inference are two separate debates. There was a really good discussion post on this a while back.

I completely agree with this. It seems to me that we should completely throw away the question of what probability is, and look at which form of inference is optimal.

Comment author: potato 27 November 2011 10:20:49PM *  2 points [-]

The definitional dispute about sound is different in that air pressure and auditory experience are both useful concepts, and there is no competition between them.

There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn't do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.

As another commenter has pointed out, the semantic argument is just a proxy for the dispute over whether one or other interpretation is preferable either philosophically or in practical terms.

That wasn't another commenter, that was in my article, I'm pretty sure.

If people switched to saying that probability models both subjective degrees of belief and imaginary long-run frequency, there would still be this argument; however, it would then be harder for the Bayesian revolution (with whom the momentum lies) to finally oust the cursed frequentists, because language would be used in such a way as to imply equal validity of the interpretations.

If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.

Comment author: [deleted] 27 November 2011 10:07:37PM *  -3 points [-]

.

Comment author: potato 27 November 2011 10:22:25PM 0 points [-]

The definitional dispute about sound is different in that air pressure and auditory experience are both useful concepts, and there is no competition between them.

There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn't do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.

As another commenter has pointed out, the semantic argument is just a proxy for the dispute over whether one or other interpretation is preferable either philosophically or in practical terms.

That wasn't another commenter, that was in my article, I'm pretty sure.

If people switched to saying that probability models both subjective degrees of belief and imaginary long-run frequency, there would still be this argument; however, it would then be harder for the Bayesian revolution (with whom the momentum lies) to finally oust the cursed frequentists, because language would be used in such a way as to imply equal validity of the interpretations.

If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.

Comment author: potato 27 November 2011 10:23:07PM 0 points [-]

I don't think that's where I meant to put that comment.

Comment author: potato 27 November 2011 10:08:40PM 0 points [-]

Probability theory is maths, and although I agree that questions like “where is maths?” and “what is maths?” and “does maths exist?” are confusing

See, these questions are not confusing to me at all. Hofstadter's formalism deals with them perfectly. Have you ever read G.E.B.? I assumed so, but I wasn't sure, maybe you haven't.

Yes, I do think that probability theory is a repeatable process of typographical string manipulations. What do you think it is?

Ultimately language should be useful, and I don’t see the point of changing the word “is” to the word “models”. This wouldn’t change my beliefs about probability theory; I’d just be using the word “models” to mean the same thing as the word “is”. And I would then lack a means of saying that the Bayesian interpretation of probability is good, and the frequentist interpretation is stupid and counter-productive – I want to be able to say that probability is X, and probability isn't Y, because this is the most useful way of using language to talk about probability theory – why would I want to put the good interpretation and the dumb interpretation on an equal footing by saying that probability “models” X and also “models” Y?

Here I completely disagree, and almost wonder if you haven't been reading my comments. Bayesianism is stronger, more capable, perfecter, stronger, more rational, more useful than frequentism, first of all, and all of that has nothing to do with the commitment to conceptualism that subjective bayes requires. This is all still true if you are a formalist.

Bayesianism is not righter than frequentism because probabilities are really subjective beliefs, and the frequentists were wrong, it's not frequency. Bayesains are righter than frequentists because bayes-inferences are deductively demonstrable to win more than frequentist-inferences. Again, the argument about what probability really is is just a way to disguise the argument about who's statistical method is more successful, the only way the frequentist even has a shot at such an argument if it is disguised as a question about what probability is instead of a question about who's inferences are theoretically ideal.

So um, platonism? Really? Why? What does it get you that formalism doesn't with less ontological commitment?

Comment author: jsteinhardt 25 November 2011 04:20:44PM 5 points [-]

The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.

I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., "as long as the underlying distribution has bounded variance, this method will work"), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).

If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can't even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I'm drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.

Comment author: [deleted] 26 November 2011 04:42:24PM *  4 points [-]

Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).

Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.

Comment author: jsteinhardt 27 November 2011 12:11:48AM 1 point [-]

Are you referring to the result that every non-dominated decision procedure is either a Bayesian procedure or a limit of Bayesian procedures? If so, one could imagine a frequentist procedure that is strictly dominated by other procedures, but where finding the dominating procedures is computationally infeasible. Alternately, a procedure could be non-dominated, and thus Bayesian for the right choice of prior, but the correct choice of prior could be difficult to find (the only proof I know of the "non-dominated => Bayesian" result is non-constructive).

Comment author: paulfchristiano 26 November 2011 05:36:42PM *  1 point [-]

Can you explain further? Casually, I consider results like compressed sensing and multiplicative weights to be examples of frequentist approaches (as do people working in these areas), which achieve their results in adversarial settings where no prior is available. I would be interested in seeing how Bayesian methods with improper priors recommend similar behavior.

Comment author: [deleted] 26 November 2011 06:27:57PM 0 points [-]

I admit I'm not familiar with either of those... Can you make a simple example of an “adversarial setting where no prior is available”?

Comment author: paulfchristiano 26 November 2011 07:18:31PM *  0 points [-]

I let you choose some linear functionals, and then tell you the value of each one on some unknown sparse vector (compressed sensing).

We play an iterated game with unknown payoffs; you observe your payoff in each round, but nothing more, and want to maximize total payoff (multiplicative weights).

Put even more simply, what is the Bayesian method that plays randomly in rock-paper-scissors against an unknown adversary? Minimax play seems like a canonical example of a frequentist method; if you have any fixed model of your adversary you might as well play deterministically (at least if you are doing consequentialist loss minimization).

Comment author: Cyan 29 November 2011 05:31:52AM *  0 points [-]

The minimax estimator can be related to Bayesian estimation through the concept of a "least-favorable prior".

Comment author: Ron_Fern 26 November 2011 06:50:34AM 5 points [-]

Why can't a frequentist say: "Bayesians are conflating probability with subjective degree of belief." ? They were here first after all.

Probability does model frequency, and it does model subjective degree of believe, and this is not a contradiction. Using the copula is the problem, obviously: if subjective degree of believe is not frequency, and probability is frequency, then probability is not subjective degree of belief. Analogously, if subjective degree of believe is not frequency, and probability is subjective degree of belief, then probability is not frequency.

The problem is that they all conflate "probability" with "subjective degree of belief" and "frequency", the bayesian conflates subjective degree of belief and probability. The frequentist conflates probability and frequency.

The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.

The debate over whether to use Bayesian methods or frequentest methods is of import. I think potato was trying to say this here:

How we should actually model the situation as a probability distribution depends on our goal. But remember that Bayesianism is the stronger magic.

But the question of whether probability is frequency, or if probability is subjective degree of belief, is just as silly as a dispute over whether numbers are quantity, or if they are orders. The answer is that numbers model both, and are neither.

Comment author: [deleted] 26 November 2011 11:49:45AM *  1 point [-]

.

Comment author: potato 26 November 2011 01:00:57PM *  7 points [-]

Probability "models" frequency in the sense that sometimes frequency data dominates all of our other knowledge about some phenomenon.

No, probability models frequency in the sense that there is an interpretation of komologorov which only mentions terms from the part of our language used to talk about frequency, and all komologorov theorems come out as true statements about frequency under this interpretation.

I mean, literally, Bayes is an arithmetic of odds and fractions, of course it models frequency. At least as well as fractions and odds do. Probability is a frequency as often as it is a fraction or an odds.

So you are saying that as long as frequentists understand that Bayesian methods are theoretically ideal and cannot be improved upon, whereas frequentist methods may be useful approximations, they shouldn't run into any real life mistakes. This is nearly true, were it not for the fact that frequentists don't actually believe this.

They don't, but could and should.

If someone manages simultaneously to believe that frequentist philosophy (probability ≡ frequency) is sound, yet frequentist methods are fallible ad-hoc methods and Bayesian methods provide the best inferences possible given our state of knowledge, then he is performing quite a feat.

I agree, this is why instead of saying that probability is identical to frequency, or that it is frequency, we should say that it models frequency.

Comment author: [deleted] 26 November 2011 01:20:36PM *  -1 points [-]

.

Comment author: potato 26 November 2011 01:22:07PM *  6 points [-]

What is this comment supposed to add? Is it an ad hominem, or are you asking for clarification? If you don't understand that comment perhaps you should try rereading my original post, I have updated it a bit since you first commented, perhaps it is clearer.

(edit) clarification:

The reason that probabilities model frequency is not because our data about some phenomena are dominated by facts of frequency. If you take 10 chips, 6 of them red, 4 of them blue, 5 red ones and 1 blue one on the table, and the rest not on the table, you'll find that bayes can be used to talk about the frequencies of these predicates in the population. You only need to start with theorems that when interpreted produce the assumptions I just provided, e.g., P(red and on the table) = 1/2, P(~red and on the table) = 1/10, P(red and ~on the table) = 2/5. From those basic statements we can infer using bayes all the following results: P(red|on the table) = 5/6, P(~red|on the table) = 1/6, P( (red and on the table) or blue) = 9/10, P(red) = P(red|on the table) P(on the table) + P(red|~on the table) P(~on the table) = 6/10, etc. These are all facts about the FREQUENCY distributions of these chips' predicates, which can be reached using bayes, and the assumptions above. We can interpret P(red) as the frequency of red chips out of all the chips, and P(red|on the table) as the frequency of red chips out of chips on the table. You'll find that anything you proof about these frequencies using bayesian inference will be true claims about the frequencies of these predicates within the chips. Hence, bayes models frequency. This is all I meant by bayes models frequency. You'll also find that it works just as well with volume or area. (I am sorry I wasn't that concrete to begin with.)

In the same exact way, you can interpret probability theorems as talking about degrees of belief, and if you ask a bayesian, all those interpreted theorems will come out as true statements about rational degree of belief. In this way bayes models rational belief. You can also interpret probability theory as talking about boston's night life, but not everyone of those interpreted theorems will be true, so probabiliity theory does not model boston's night life under that interpretation. To model something, means to produce only true statements under a given interpretation about that something.

Frequentists may not treat their tool box as a set of mostly unrelated approximations to perfect learning, or treat bayes as the optimal laws of inference, but they should as far as I can tell. And if they did, they would not cease to be frequentists, they would still use the same methods, use "probability" the same way, and still focus on long run frequency over evidential support. The only difference is that rather than saying probability is frequency and that probability is not subjective degree of belief, they would say that probability models both frequency and subjective degree of belief. Subjective bayesians should make a similar update, though I am sure they don't swing the copula around as liberally as frequentists. This is what i meant when i said that frequentists could and should believe that frequentism is just a useful approximation, and that bayes is in some sense optimal. I was never really arguing about the practical advantages of bayesianism over frequntism, but about how they both seem to make a similar philosophical mistake in using identity or the copula when the relation of modeling is more applicable. A properly Hofstadterish formalism seems like the best way to deal with all of this comprehensively.

You understand what I was saying now? I really want to know. That you are confused by what seem to me to be my most basic claims, and that you are also as familiar with E. T. Jaynes as your comments suggests is worrying to me. Does this clarification make you less confused?

Comment author: [deleted] 26 November 2011 02:49:34PM *  1 point [-]

.

Comment author: potato 26 November 2011 03:06:08PM *  5 points [-]

Fine, let's make up a new frequentism, which is probably already in existence: finite frequentism. Bayes still models finite frequencies, like the example i gave of the chips.

When a normal frequentest would say "as the number of trials goes to infinity" the finite frequentest can say "on average" or "the expectation of". Rather than saying, as the number of die rolls goes to infinity the fraction of sixes is 1/6, we can just say that as the number rises it stabilizes around and gets closer to 1/6. That is a fact which is finitely verifiable. If we saw that the more die rolls we added to the average, the closer the fraction of sixes approached 1/2, and the closer it hovered around 1/2, the frequentest claim would be falsified.

There may be no infinite populations. But the frequentist can still make due with finite frequencies and expected frequencies, and i am not sure what he would loose. There are certainly finite frequencies in the world, and average frequencies are at least empirically testable. What can the frequentist do with infinite populations or trials, that he/she can't do with expected/average frequencies.

Also, are you a finitist when it comes to calculus? Because the differential calculus requires much more commitment to the idea of a limit, infinity, and the infinitesimal, than frequentists require, if frequentests require these concepts at all. Would you find a finitist interpretation of the calculus to be more philosophically sound than the classical approach?

Comment author: shokwave 26 November 2011 03:10:45PM 0 points [-]

potato,

I don't think there's much value in replying to Phlebas' latest reply.

Comment author: Vladimir_Nesov 25 November 2011 01:40:31AM 6 points [-]

It's "vs.", not "V.S."

Comment author: potato 28 November 2011 04:17:33PM -1 points [-]

jeeze, i just understood what you were trying to tell me. Whoops

Comment author: thomblake 28 November 2011 10:04:42PM 2 points [-]

Based on reading the comment threads here, it seems as though some folks are missing something important in this post. So, I'll try to restate it differently in case that helps. Below, double-quotes are used to refer to the word, while single-quotes are used merely to highlight and separate tokens.

There is a mathematical formalism, which we can call "probability", which does a good job of modeling various sorts of things, like 'subjective degrees of belief', 'frequencies', 'volume', and 'area'.

'Bayesians' think that "probability" should refer to 'subjective degrees of belief'. 'Frequentists' think that "probability" should refer to 'frequencies'. One alleged problem with 'Frequentists' is that they seem to carelessly slip between 'subjective degrees of belief' and 'frequencies', due to the natural-language meaning of "probability".

One suggestion proposed in the comment threads was to simply use "probability" to refer to 'subjective degrees of belief' as the 'Bayesians' would have us do, since we already have the word "frequencies" to refer to 'frequencies'. I intuit that the OP's objection to this suggestion is that it leaves us with no word for the mathematical formalism called "probability" above, which can apply just as well to 'frequencies', 'subjective degrees of belief', and many other things.

Comment author: nshepperd 29 November 2011 01:28:15AM 1 point [-]

I intuit that the OP's objection to this suggestion is that it leaves us with no word for the mathematical formalism called "probability" above, which can apply just as well to 'frequencies', 'subjective degrees of belief', and many other things.

"Probability theory"

Comment author: Metus 25 November 2011 02:52:00AM 2 points [-]

Upvoted for resolving a mysterious question I resolved a few days ago myself.

Comment author: buybuydandavis 25 November 2011 07:00:50AM 1 point [-]

I think mathematical formalism is a limited route. Mathematics gives you formal structures that might be useful for something, but demonstrating just what they are useful for is the real trick. . I've found that mathematicians often quite breezily assume their formalism applies to the world.

"Probability is defined as some formal structure." Yawn. Until you show me that said mathematics actually solves the problems I'm trying to solve with concepts of probability, I'm uninterested.

we learn because probability theory is isomorphic to those mechanisms.

Showing how it is isomorphic is the real task.

This is fundamentally where Jaynes had it better than most. He starts by trying to solve the problem of which beliefs to have confidence and rely on. He's solving a real problem, instead of creating mathematical objects and just applying real world labels to elements of them after the fact.

The easiest way to understand the philosophical dispute between the frequentist and the subjective Bayesian is to look at the classic biased coin: ... I ask you. What is the difference between these two...

The difference between the two is that the subjective bayesian has a means to cope with the problem that at least most frequentists lack.

Comment author: potato 25 November 2011 07:54:03AM 5 points [-]

I mostly agree, finding why/how it is isomorphic is the important thing. But it is still isomorphic to more than one thing, frequency and subjective degree of belief included.

The two are still locked in a debate which is ultimately the result of interpreting one question in two different ways, and then answering the two seperate questions as if they were exclusive answers to one question. Exactly as the two argue about sound being there in the absence of observers.

The bayesian would give the same answer as the frequentist if he interpreted the question as the frequentist. Same goes for the sound realist, for the same reasons.

Comment author: Matt_Simpson 27 November 2011 04:31:52AM *  2 points [-]

The two are still locked in a debate which is ultimately the result of interpreting one question in two different ways, and then answering the two seperate questions as if they were exclusive answers to one question. Exactly as the two argue about sound being there in the absence of observers.

Not exactly. The real argument is about what should be used for inference (both scientific and otherwise). The debate about "what probability actually is" is just another case of debating semantics as a proxy for debating what's actually at stake.

Quick edit: and your post helps make this clear.

Comment author: potato 27 November 2011 04:55:17AM *  1 point [-]

Yes, I think we agree. Except that i don't think that the fact that there is meaningful argument to be had about bayesian inference v.s. frequentist inference, means that the debate has not been centered around arguing about what probability is, which is a mistake; the same class of mistake as the mistake being made by the realists and idealists arguing over sound. The bayesian and the frequentist have proposed ways to settle their debate. And there are observations which act as evidence for bayesian inference, or frequentist inference. But exactly what experience should i expect if i think "probability is frequency" as opposed to if I think "probability is subjective degree of belief" ? Arguing about which inferences are optimal, is perfectly reasonable, but arguing about what thing probability really is, is silly.

Comment author: Matt_Simpson 27 November 2011 07:07:31PM 0 points [-]

Yes, I think we agree. Except that i don't think that the fact that there is meaningful argument to be had about bayesian inference v.s. frequentist inference, means that the debate has not been centered around arguing about what probability is, which is a mistake...

Did I give the impression that I thought the argument about what probability is wasn't a mistake?

Comment author: potato 27 November 2011 08:45:59PM 0 points [-]

I wasn't sure.

Comment author: Matt_Simpson 27 November 2011 10:14:03PM 0 points [-]

Oh, well in rereading my comment I could see why it was ambiguous. Yeah, I think we agree.

Comment author: buybuydandavis 25 November 2011 08:29:36AM *  1 point [-]

How isormorphic it is remains to be seen. The infinite set digressions have not been particularly helpful to real problems.

The objective bayesian is free to estimate frequencies, and has done so, a la Jaynes. He explicitly identifies that both questions are answering different questions, and answers both.

I'm not aware of anyone doing this, but I think a frequentist could just as well interpret subjective degrees of belief in frequentist terms, but the sample space would be in informational terms, looking for transformation groups in states of knowledge.

"Probability" is a word used in the interpretation of probability theory.

Sometimes. I think if we're trying to keep terms straight, you should separate probability_SubjectiveBayes, probability_Math, probability_Frequentist, and probability_HumanLanguage. You seem to conflate probability_Math and probability_HumanLanguage.

Comment author: lessdazed 25 November 2011 09:05:55AM *  4 points [-]

probabilitySubjectiveBayes, probabilityMath, probabilityFrequentist, and probabilityHumanLanguage. You seem to conflate probabilityMath and probabilityHumanLanguage.

probability\_SubjectiveBayes, probability\_Math, probability\_Frequentist, and probability\_HumanLanguage. You seem to conflate probability\_Math and probability\_HumanLanguage.

Comment author: buybuydandavis 25 November 2011 08:06:42PM 1 point [-]

Corrected. Thanks.

Comment author: CarlShulman 25 November 2011 01:13:10AM *  1 point [-]

Some sentences in the text have typos where "frequentest" replaces "frequentist."

Comment author: potato 25 November 2011 01:26:59AM *  0 points [-]

working on it

Fixed?

Comment author: shokwave 25 November 2011 02:29:43AM 0 points [-]

There are certain appeals to frequentest philosophy

Comment author: RolfAndreassen 25 November 2011 02:34:27AM *  -1 points [-]

Formalism, which appears in the title, is not mentioned anywhere in the post, as far as I can see.

Apart from that, I have to ask, what is the purpose of this post? What is the point you are making? It looks like a rehash of Sequence elements, put together somewhat at random. It's almost as though someone had coded a Markov-chain bot to imitate a typical LessWrong discussion, except that such a bot would be unlikely to mis-spell 'frequentist'.

Comment author: Metus 25 November 2011 02:50:08AM 4 points [-]

The author is resolving a mysterious question ("Is probability frequentist or bayesian?") quite nicely. Maybe it is covered in that painfully long monster "Sequences" but is surely useful to a novice.

Comment author: Oscar_Cunningham 25 November 2011 01:06:50PM *  1 point [-]

What does "copula" mean?

EDIT: I sort-of get it from reading wikipedia, but I still don't really see what it means in the context of this post.

Comment author: komponisto 25 November 2011 02:08:07PM 3 points [-]

The "is" in "probability is degree of belief".

Comment author: DSimon 27 November 2011 03:29:41PM 3 points [-]

What does "copula" mean?

Well, when two probability theories love each other very much...

Comment author: fubarobfusco 25 November 2011 07:26:55PM -1 points [-]

Formalists do not commit themselves to probabilities, just as they do not commit themselves to numbers

And yet, if I set one apple next to one apple, there are two apples. Arithmetic predicts facts about the world with such reliability that it is perfectly reasonable to say that sentences about numbers have real-world truth values, regardless of whether numbers "exist". We come up with arithmetic because it enables us to make sense of the world, because the world actually does behave that way.

Comment author: gwern 27 November 2011 09:22:54PM *  5 points [-]

And yet, if I set one apple next to one apple, there are two apples.

And if I pour one bucket of water into another, do I now have two buckets?

(Yes, there's something being conserved in this example, but is it 'number of buckets'/'number of apples'?)

Comment author: wedrifid 28 November 2011 05:36:29AM *  0 points [-]

And if I pour one bucket of water into another, do I now have two buckets?

Yes? One empty bucket, one full bucket and a bunch of water that overflowed and went on the floor.

Comment author: gwern 28 November 2011 04:45:34PM -3 points [-]

Is that so? I never said the buckets averaged more than half full.

Comment author: wedrifid 28 November 2011 04:58:25PM *  -3 points [-]

Is that so? I never said the buckets averaged more than half full.

Yes. You definitely have two buckets.

If there is some parlance in which you can refer to "a bucket of water" and mean "less half a bucket of water" without just being misleading to the point of falsehood then one of those buckets will be an empty bucket and the other bucket has an unspecified amount of water in it. Unless, I suppose, you also failed to mention that after you emptied one bucket into the other bucket you replaced the water with a lemur. Then you would have two buckets, one with water in it and another with a lemur.

Regardless if there is some argument by analogy to be made which requires something which when added to another thing produces just one thing it is best to focus on the water, not the buckets. 'Buckets' is going to rfer to either the actual object or be an ad-hoc unit for water measurement - exactly the thing that the analogy wants to avoid!

Comment author: gwern 28 November 2011 05:00:59PM -2 points [-]

If I am a pessimist and have a glass that is only half-full (or knowing this sorry world, less than that), and I throw it in your face, are you 'being misleading to the point of falsehood' when you yell that I threw a glass of water at you and then punch me?

Or would you say instead 'You threw a glass of uncertain description which may or may not have a lemur in it at me!'?

Comment author: wedrifid 29 November 2011 06:10:35AM -2 points [-]

If I am a pessimist and have a glass that is only half-full (or knowing this sorry world, less than that), and I throw it in your face, are you 'being misleading to the point of falsehood' when you yell that I threw a glass of water at you and then punch me?

I now have one glass.

Comment author: potato 25 November 2011 08:50:38PM 2 points [-]

But it takes a machine besides the universe to count apples. Namely, humans. Arithmetic is turing complete, as is probability theory, so we should not be confused when we notice that it can practically talk about everything under the sun, including things out there in being.

Comment author: damang 26 November 2011 06:52:57AM 0 points [-]

All in all a decent post I thought. Why can't i see the score?

Comment author: Manfred 28 November 2011 04:35:06AM 2 points [-]

A score of "dot" means that the post is quite new, so it isn't showing the score yet.

Comment author: Manfred 25 November 2011 02:34:51PM 0 points [-]

They are both giving the right answers to these respective questions. But they are failing to Replace the Symbol with the Substance in their ordinary speech. In the exact same way, "probability" is being used to stand for both "frequency of occurrence" and "rational degree of belief" in the dispute between the Bayesian and the frequentist.

This is inaccurate. If frequentists stuck exactly to that definition, they could never get an answer from the real world, because we never have the infinite number of experiments required to get the limiting frequency. Definitions should be useful.

Comment author: potato 25 November 2011 08:47:27PM *  0 points [-]

The answer the frequentist is giving is that we only have enough information to know that it is not 50%. Which is correct.

eidt: not 50% on average.

Comment author: Manfred 26 November 2011 09:39:04PM *  3 points [-]

The answer the frequentist is giving is that we only have enough information to know that it is not 50%. Which is correct.

Nope. We also know that it's a coin with two sides. If it was a 4-sided die that was guaranteed not to have a 50% chance of '1', the situation would be quite different, don't you think? The problem has enough information to be solved.

If our straw frequentist wants a frequency of 1/2, they should consider "if infinite independent people give me this problem and then flip their coin, with what frequency will these trials give heads?"

Comment author: potato 26 November 2011 11:09:52PM *  0 points [-]

Couldn't we just say that 1 was heads and ~1 was tails? Then it would be the same, right?

The problem has enough information to be solved.

really? Could you explain further, what do i know besides that if infinite independent people give me this problem and then flip the coin, it will not land heads 50% of the time? Knowing that it is two sided doesn't change anything as far as I can tell. And what problem are you solving exactly? Just to make sure we're on the same page.

I always thought of the coin bayes frequentest thing as being a dispute over definitions. It didn't seem like you really derive P(heads) ≠ 1/2 as a frequentest, it is kind of already in the premises of the situation given your interpretation of "probability", and your model of bias. On the other hand, the bayesian using his/her interpretation of "probability", makes a one step inference using the principle of indifference that P(heads) = 1/2.

Neither of these are deep theorems of the respective statistical disciplines, their proof is trivial in both traditions. They are the statistical consequences of interpreting probability in different ways, i.e., modeling different things with probability to deal with uncertainty. I didn't think of this difference as showing some deep difference between the bayesian and the frequentist; the frequentist and the bayesian are different in terms of their most basic surface apparatus; they use their tool (probability) to model different things (freqeuncy and degree of belief), which then gets them to two statistical methods, but still only one probability theory.

Comment author: Manfred 27 November 2011 01:53:57AM *  1 point [-]

Could you explain further, what do i know besides that if infinite independent people give me this problem and then flip the coin, it will not land heads 50% of the time?

My wording was unclear - I've edited to fix. If people flip the same coin over and over, it won't land heads 50% of the time. "Independent" means different coins. The key idea is to imagine, rather than flipping the same coin over and over, being in the same state of information over and over.

I always thought of the coin bayes frequentist thing as being a dispute over definitions.

It is, in a sense. But if you want to make any decisions about the coin (say that "coin" is whether or not the next solar flare will knock out your satellite), and frequentist and bayesian estimators disagree, which should you use? If you have certain desiderata about your decisions (e.g. you won't take bets that are guaranteed to lose), this is a math problem with a right answer.

And then of course the question is, if this "frequentist probability" stuff is almost always the same as this "bayesian probability" stuff, and when it's different you shouldn't ever base decisions on it, why keep it as an alternate definition? Words should be useful.

Comment author: potato 27 November 2011 02:36:58AM 0 points [-]

Well, let's not keep frequentism as a statistical method, cause bayes almost always if not always does better. But it is a theoretically interesting fact that komolgorov models finite frequencies, and our intuitions about infinite frequencies, and a fact that it does.

Understanding exactly what degrees of belief are, becomes a lot easier (I suppose) if you know that for some reason they are isomorphic to frequencies, and that for some reason they are also isomorphic to spatial measures. This not only allows us to solve problems of degree of belief by solving problems of frequency and area. But also, if we understand what degree of belief has in common with frequency and area, then we understand what it has in common with bayes.

Comment author: Manfred 28 November 2011 05:02:16AM 2 points [-]

Understanding exactly what degrees of belief are, becomes a lot easier (I suppose) if you know that for some reason they are isomorphic to frequencies

If probabilities were systematically wrong about the frequency of success in independent trials, there would be some other method of reasoning from incomplete information that was better than probabilistic logic. But since the real world obeys all the requirements for probabilistic logic (basically, causality works), there is no such method, and so frequencies match probabilities.

for some reason they are also isomorphic to spatial measures

Read a introductory chapter on set theory that uses pictures to represent sets, and you will understand why.

It's certainly an interesting fact that these things behave the same. But it's not an unsolved problem. We don't have to keep a definition around that's useless in the real world because of any lurking mystery.

Comment author: [deleted] 25 November 2011 09:08:54PM 2 points [-]

That's untrue - a biased coin might well still happen to produce 50% heads and 50% tails given a certain finite number of trials.

Manfred's point is that the frequentist is not using "probability" to stand for "frequency of occurrence", but to stand for "imaginary frequency of occurrence in an infinite number of trials" - otherwise the frequentist position would be blatantly false for the reason that I pointed out.

Comment author: potato 26 November 2011 09:19:19PM *  1 point [-]

Ok, now I understand what you are saying.

I wrote my update here

Ok, so the frequentest is giving the right answer given the question he is being asked about hypothetical infinite frequencies.

Comment author: potato 26 November 2011 03:53:43PM *  -1 points [-]

How does this do in light of this comment ?