Phlebas comments on (Subjective Bayesianism vs. Frequentism) VS. Formalism - Less Wrong

27 Post author: potato 26 November 2011 05:05AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (106)

You are viewing a single comment's thread.

Comment author: [deleted] 25 November 2011 02:54:41PM *  7 points [-]

.

Comment author: potato 27 November 2011 05:47:23AM *  10 points [-]

In other words, the OP has mixed up the quotation and the referent (or the representation and the referent).

It seems to me that I am the one proposing a sharp distinction between probability theory (the representation), and rational degree of belief (the referent). If you say that probability is degree of belief, you destroy all the distinction between the model and the modeled. If by "probability" you mean subjective degree of belief, I don't really care what you call it. But know that "probability" has been used in ways which are not consistent with that synonymy claim. By the fact that we do not have 100% belief that bayes does model ideal inference with uncertainty, this means that bayesian probability is not identical to subjective belief given out knowledge. If X is identical to Y, then X is isomorphic-to/models Y. Because we can still conceive of bayes not perfectly modeling rationality, without implying a contradiction, this means that our current state of knowledge does not include that bayes is identical to subjective degree of belief.

We learn that something is probability by looking at probability theory, not by looking at subjective belief. If rational subjective belief turned out to not be modeled by probability theory, then we would say that subjective degree of belief was not like probability, not that probability theory does not define probability.

The first person to make bayes, may have been thinking about rationality when he/she first created the system, or he/she may have been thinking about spatial measurements, or he/she may have been thinking about finite frequencies, and he/she would have made the same formal system in every case. Their interpretations would have been different, but they would all be the one identical probability theory. Which one the actual creator was thinking of, is irrelevant. What spaces, beliefs, finite frequencies all have in common is that they are modeled by probability theory. To use "probability" to refer to one of these, over another, is a completely arbitrary choice (mind you I said finite frequency).

If we loose nothing by using "models" instead of "is", why would we ever use "is"? "Is' is a much stronger claim than "models". And frankly, I know how to check whether or not a given argument is an animal, for instance; how do I check if a given argument is a probability? I see if it satisifies the probability axioms. Finite frequency, measure, and rational degree of belief all seem to follow the probability axioms and inferences under specific, though similar, interpretations of probability theory.

Comment author: jsteinhardt 25 November 2011 04:20:44PM 5 points [-]

The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.

I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., "as long as the underlying distribution has bounded variance, this method will work"), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).

If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can't even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I'm drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.

Comment author: [deleted] 26 November 2011 04:42:24PM *  4 points [-]

Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).

Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.

Comment author: jsteinhardt 27 November 2011 12:11:48AM 1 point [-]

Are you referring to the result that every non-dominated decision procedure is either a Bayesian procedure or a limit of Bayesian procedures? If so, one could imagine a frequentist procedure that is strictly dominated by other procedures, but where finding the dominating procedures is computationally infeasible. Alternately, a procedure could be non-dominated, and thus Bayesian for the right choice of prior, but the correct choice of prior could be difficult to find (the only proof I know of the "non-dominated => Bayesian" result is non-constructive).

Comment author: paulfchristiano 26 November 2011 05:36:42PM *  1 point [-]

Can you explain further? Casually, I consider results like compressed sensing and multiplicative weights to be examples of frequentist approaches (as do people working in these areas), which achieve their results in adversarial settings where no prior is available. I would be interested in seeing how Bayesian methods with improper priors recommend similar behavior.

Comment author: [deleted] 26 November 2011 06:27:57PM 0 points [-]

I admit I'm not familiar with either of those... Can you make a simple example of an “adversarial setting where no prior is available”?

Comment author: paulfchristiano 26 November 2011 07:18:31PM *  0 points [-]

I let you choose some linear functionals, and then tell you the value of each one on some unknown sparse vector (compressed sensing).

We play an iterated game with unknown payoffs; you observe your payoff in each round, but nothing more, and want to maximize total payoff (multiplicative weights).

Put even more simply, what is the Bayesian method that plays randomly in rock-paper-scissors against an unknown adversary? Minimax play seems like a canonical example of a frequentist method; if you have any fixed model of your adversary you might as well play deterministically (at least if you are doing consequentialist loss minimization).

Comment author: Cyan 29 November 2011 05:31:52AM *  0 points [-]

The minimax estimator can be related to Bayesian estimation through the concept of a "least-favorable prior".

Comment author: Ron_Fern 26 November 2011 06:50:34AM 5 points [-]

Why can't a frequentist say: "Bayesians are conflating probability with subjective degree of belief." ? They were here first after all.

Probability does model frequency, and it does model subjective degree of believe, and this is not a contradiction. Using the copula is the problem, obviously: if subjective degree of believe is not frequency, and probability is frequency, then probability is not subjective degree of belief. Analogously, if subjective degree of believe is not frequency, and probability is subjective degree of belief, then probability is not frequency.

The problem is that they all conflate "probability" with "subjective degree of belief" and "frequency", the bayesian conflates subjective degree of belief and probability. The frequentist conflates probability and frequency.

The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.

The debate over whether to use Bayesian methods or frequentest methods is of import. I think potato was trying to say this here:

How we should actually model the situation as a probability distribution depends on our goal. But remember that Bayesianism is the stronger magic.

But the question of whether probability is frequency, or if probability is subjective degree of belief, is just as silly as a dispute over whether numbers are quantity, or if they are orders. The answer is that numbers model both, and are neither.

Comment author: [deleted] 26 November 2011 11:49:45AM *  1 point [-]

.

Comment author: potato 26 November 2011 01:00:57PM *  7 points [-]

Probability "models" frequency in the sense that sometimes frequency data dominates all of our other knowledge about some phenomenon.

No, probability models frequency in the sense that there is an interpretation of komologorov which only mentions terms from the part of our language used to talk about frequency, and all komologorov theorems come out as true statements about frequency under this interpretation.

I mean, literally, Bayes is an arithmetic of odds and fractions, of course it models frequency. At least as well as fractions and odds do. Probability is a frequency as often as it is a fraction or an odds.

So you are saying that as long as frequentists understand that Bayesian methods are theoretically ideal and cannot be improved upon, whereas frequentist methods may be useful approximations, they shouldn't run into any real life mistakes. This is nearly true, were it not for the fact that frequentists don't actually believe this.

They don't, but could and should.

If someone manages simultaneously to believe that frequentist philosophy (probability ≡ frequency) is sound, yet frequentist methods are fallible ad-hoc methods and Bayesian methods provide the best inferences possible given our state of knowledge, then he is performing quite a feat.

I agree, this is why instead of saying that probability is identical to frequency, or that it is frequency, we should say that it models frequency.

Comment author: [deleted] 26 November 2011 01:20:36PM *  -1 points [-]

.

Comment author: potato 26 November 2011 01:22:07PM *  6 points [-]

What is this comment supposed to add? Is it an ad hominem, or are you asking for clarification? If you don't understand that comment perhaps you should try rereading my original post, I have updated it a bit since you first commented, perhaps it is clearer.

(edit) clarification:

The reason that probabilities model frequency is not because our data about some phenomena are dominated by facts of frequency. If you take 10 chips, 6 of them red, 4 of them blue, 5 red ones and 1 blue one on the table, and the rest not on the table, you'll find that bayes can be used to talk about the frequencies of these predicates in the population. You only need to start with theorems that when interpreted produce the assumptions I just provided, e.g., P(red and on the table) = 1/2, P(~red and on the table) = 1/10, P(red and ~on the table) = 2/5. From those basic statements we can infer using bayes all the following results: P(red|on the table) = 5/6, P(~red|on the table) = 1/6, P( (red and on the table) or blue) = 9/10, P(red) = P(red|on the table) P(on the table) + P(red|~on the table) P(~on the table) = 6/10, etc. These are all facts about the FREQUENCY distributions of these chips' predicates, which can be reached using bayes, and the assumptions above. We can interpret P(red) as the frequency of red chips out of all the chips, and P(red|on the table) as the frequency of red chips out of chips on the table. You'll find that anything you proof about these frequencies using bayesian inference will be true claims about the frequencies of these predicates within the chips. Hence, bayes models frequency. This is all I meant by bayes models frequency. You'll also find that it works just as well with volume or area. (I am sorry I wasn't that concrete to begin with.)

In the same exact way, you can interpret probability theorems as talking about degrees of belief, and if you ask a bayesian, all those interpreted theorems will come out as true statements about rational degree of belief. In this way bayes models rational belief. You can also interpret probability theory as talking about boston's night life, but not everyone of those interpreted theorems will be true, so probabiliity theory does not model boston's night life under that interpretation. To model something, means to produce only true statements under a given interpretation about that something.

Frequentists may not treat their tool box as a set of mostly unrelated approximations to perfect learning, or treat bayes as the optimal laws of inference, but they should as far as I can tell. And if they did, they would not cease to be frequentists, they would still use the same methods, use "probability" the same way, and still focus on long run frequency over evidential support. The only difference is that rather than saying probability is frequency and that probability is not subjective degree of belief, they would say that probability models both frequency and subjective degree of belief. Subjective bayesians should make a similar update, though I am sure they don't swing the copula around as liberally as frequentists. This is what i meant when i said that frequentists could and should believe that frequentism is just a useful approximation, and that bayes is in some sense optimal. I was never really arguing about the practical advantages of bayesianism over frequntism, but about how they both seem to make a similar philosophical mistake in using identity or the copula when the relation of modeling is more applicable. A properly Hofstadterish formalism seems like the best way to deal with all of this comprehensively.

You understand what I was saying now? I really want to know. That you are confused by what seem to me to be my most basic claims, and that you are also as familiar with E. T. Jaynes as your comments suggests is worrying to me. Does this clarification make you less confused?

Comment author: [deleted] 26 November 2011 02:49:34PM *  1 point [-]

.

Comment author: potato 26 November 2011 03:06:08PM *  5 points [-]

Fine, let's make up a new frequentism, which is probably already in existence: finite frequentism. Bayes still models finite frequencies, like the example i gave of the chips.

When a normal frequentest would say "as the number of trials goes to infinity" the finite frequentest can say "on average" or "the expectation of". Rather than saying, as the number of die rolls goes to infinity the fraction of sixes is 1/6, we can just say that as the number rises it stabilizes around and gets closer to 1/6. That is a fact which is finitely verifiable. If we saw that the more die rolls we added to the average, the closer the fraction of sixes approached 1/2, and the closer it hovered around 1/2, the frequentest claim would be falsified.

There may be no infinite populations. But the frequentist can still make due with finite frequencies and expected frequencies, and i am not sure what he would loose. There are certainly finite frequencies in the world, and average frequencies are at least empirically testable. What can the frequentist do with infinite populations or trials, that he/she can't do with expected/average frequencies.

Also, are you a finitist when it comes to calculus? Because the differential calculus requires much more commitment to the idea of a limit, infinity, and the infinitesimal, than frequentists require, if frequentests require these concepts at all. Would you find a finitist interpretation of the calculus to be more philosophically sound than the classical approach?

Comment author: shokwave 26 November 2011 03:10:45PM 0 points [-]

potato,

I don't think there's much value in replying to Phlebas' latest reply.