Phlebas comments on (Subjective Bayesianism vs. Frequentism) VS. Formalism - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (106)
.
Calling them interpretations seems to imply that at most one of them can be correct. "Displacement of a falling object on earth" and "kinetic energy of an 18.6 kg object" aren't competing interpretations of the math
f(x) = 9.8x^2, they're just two different things the equation applies to.If the frequentists are making any error, it's denying that beliefs must be updated according to the Kolmogorov Axioms, not asserting that frequencies can also be treated with the same laws. It's denying the former that might lead them to apply incorrect methods in inference, which is the only problem that really matters.
I think the interpretation of probability and what methods to use for inference are two separate debates. There was a really good discussion post on this a while back.
I'm also curious as to who exactly these frequentists are that you are arguing against. Perhaps I am spoiled by hanging out with people who regularly have to solve statistical problems, and therefore need to have a reasonable conception of statistics, but most frequentist sentiments that I encounter are fairly well-reasoned, sometimes even pointing out legitimate issues with Bayesian statistics. It is true that I sometimes get incorrect claims that I have to correct, but I don't think becoming a Bayesian magically protects you from this.
EDIT: To clarify, the "frequentist sentiments" I referred to did not explicitly distinguish between interpretations of probability and inference algorithms, but as the goal was engineering I think the arguments were all implicitly pragmatic.
I'm going by what I've read of Jaynes, Yudkowsky, and books by a couple of other writers on Bayesian statistics.
I don't believe there are any legitimate issues with Bayesian statistics, because Bayes's rule is derived from basic desiderata of rationality which I find entirely convincing, and it seems to me that the maximum entropy principle is the best computable approximation to Solomonoff induction (although I'd appreciate other opinions on that).
There may be legitimate issues with people failing to apply the simple mathematical laws of probability theory correctly, because the correct application can get very complicated - but that is not an issue with Bayesian statistics per se. I'm sure that in many cases, the wisest thing to do might be to use frequentist methods, but being a Bayesian does not prohibit someone from applying frequentist methods when they are a convenient approximation.
The two issues that come to mind are the difficulty of specifying priors and the computational infeasibility of performing Bayesian updates.
I don't think anyone can reasonably dispute that if the correct prior is handed to you, together with a black box for applying Bayes' rule, then you should perform Bayesian updates based on your data to get a posterior distribution. That is simply a mathematical theorem (Bayes' theorem). And yes, it is also a theorem (Cox's theorem) that any rational agent is implicitly using a prior. But we aren't yet in a position to create a perfectly rational agent, and until we are, worrying about the specific form of consistency that is invoked for Cox's theorem seems silly.
It's possible that we don't really disagree. As a purely abstract statement about what you should do given unlimited computational resources, sure, Solomonoff induction is the way to go. I definitely agree with that. But if you need to actually solve a specific practical problem, additional considerations come into play.
By the way, what do you mean by "the maximum entropy principle is the best computable approximation to Solomonoff induction"? That sounds intriguing, so I'd be interested to have you elaborate a bit.
Regarding frequentism vs. Bayesianity in practical applications, the message I take from Yudkowsky and Jaynes is that frequentists have tended historically to lack apprehension of the fact that their methods are ad-hoc, and in general they fail to use Bayesian power when it is in fact advisable to do so - whereas Bayesians feel they can use ad-hoc approximate methods or accurate methods, whichever is appropriate to the task. This is a case in which a questionable philosophy needn't hamstring someone's thinking in principle, but appears to do so fairly predictably as a matter of fact.
Incidentally I'm surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky's abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt. It's not necessarily a bad thing that some people here are sympathetic to frequentism - intellectual diversity is good - I'm just surprised that there are so many on a Bayesian rationality forum!
About Maxent: I had in mind chapter 5 of this book by Li and Vitanyi.
This is the MDL (minimum description length) principle.
Where K is Kolmogorov complexity.
So ideal MDL, like Solomonoff induction, is also incomputable!
They go on to discuss approximations, and on page 390 (I don’t know if you have a copy of the book) they provide a usable approximation to be referred to as “MDL”. Later on page 398 they discuss Maxent, and conclude that that too is an approximation to ideal MDL.
As far as I can see, Maxent is more useful in practical applications than their approximate MDL. I felt that Maxent needed to be defended, since Jaynes considered it to be a major element of Bayesian probability theory; and as far as I can see there is no clearly better practical method of generating priors at this point in time such that Maxent could be considered to be one of Bayesianity’s “legitimate issues” vis a vis frequentism.
My intuition here is that you are not observing so many people who are sympathetic to frequentism, so much as people who are unsympathetic to holding contempt.
In much of the comments here you seem to be missing a simple point about mathematics and reference due to its relationship to tribal signaling between the "Bayesians" and the "Frequentists".
I've yet to see anything in this article, or the resulting comments thread, to suggest that the OP has anything to say apart from "let's say 'models' instead of 'is' (but mean the same thing)". And the only consequence of this is to puff up frequentism.
I tried (and apparently failed miserably) to make the case that in the interests of sanity, we should define our terms such that probability ≡ subjective degrees of belief. That's all it is, a definition - there's no philosophical significance to this "is" beyond that. It is not a claim that the frequency interpretation doesn't fit Cox's postulates - this is a naive interpretation of how language is used on the OP's part.
The definitional dispute about sound is inapt, because there is nothing to be gained by defining sound as one thing or the other. In this case however there is a real benefit to defining our terms in one particular way.
I will however delete the downvoted posts in this thread, to honour the great disapproval with which my conception of rationality has apparently met in this case.
Generally, deleting posts with responses is impolite, as the discussion may be helpful to future readers.
I don't think you ever supplied a term other than "probability" that we should use for what the OP thought "probability" means. So we're still left with three entities and two words.
Seems like a non-problem. Just say "I am entering these frequencies into Bayes's theorem", "I am using the mathematical tools of probability theory" or something like that.
Or perhaps say "probability is a measure of subjectively objective degrees of belief", and "probability theory is the set of mathematical tools used to compute probabilities, which can also be used to compute frequencies as the case may be".
Which is pretty much what happens already! This is why I object to such an article - it's a solution looking for a problem, which creates the illusion of a problem by a) being illiterate, so making itself hard to pin down b) nitpicking the use of words.
They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
Would not retraction have served?
I did not find User:potato less-than-articulate.
I'm not sure what you mean by "illiterate" here, nor (thus) how it would make itself 'hard to pin down'.
The dispute was about the proper use of words. I did not see anything that looked like 'nitpicking' in that context.
The advantage of "Formalism" over "Bayesianism" or "Frequentism" is that it clearly marks the mathematical toolkit, makes it clear what Bayesians and Frequentists are separately talking about, gets rid of the slippage Frequentists allegedly make between "degrees of belief" and "frequencies", and removes the question of what "probability" is "really" about, all without having to raise a flag in the mind-killing tribal warfare between "Bayesians" and "Frequentists".
But then, it's been noted that "a philosopher has never met a distinction he didn't like", so perhaps I'm just biased in favor of making clearer the distinction.
FWIW, I think my three preferred terms are "Probabilities", "Frequencies", and "Normed Measure Theory". That's what Kolmogorov's formalization amounts to anyway, and as the OP said it truly need not be connected to either probabilities or frequencies in a given use.
I don't understand. Based on reading through the passages you referenced in PtLoS, maximum entropy is a way of choosing a distribution out of a family of distributions (which, by the way, is a frequentist technique, not a Bayesian one). Solomonoff induction is a choice of prior. I don't really understand in what sense these are related to each other, or in what sense Maxent generates priors at all.
I've always felt that the frequentists that Eliezer argues against are straw men. As I said earlier, I've never met a frequentist who is guilty of the accusations that you keep making, although I have met Bayesians whose philosophy interfered with their ability to do good statistical modeling / inference. Have you actually run into the people who you seem to be arguing against? If not, then I think you should restrict yourself to arguing against opinions that people are actually trying to support, although I also think that whether or not some very foolish people happen to be frequentists is irrelevant to the discussion (something Eliezer himself discussed in the "Reversed Stupidity is not Intelligence" post).
If you know nothing about a variable except that it's in the interval [a, b] your probability distribution must be from the class of distributions where p(x) = 0 for x outside of [a, b]. You pick the distribution of maximal entropy from this class as your prior, to encode ignorance of everything except that
x ∈ [a,b].That is one way Maxent may generate a prior, anyway.
We can call dibs on things now? Ooh, I call dibs on approximating a slowly varying function as a constant!
I'm pretty sure almost all of freqeuntist methods are derivable as from bayes, or close approximations of bayes. Do they have any tool which is radically un-bayesian?
See paulfchristiano's examples elsewhere in this thread.
Another example would be support vector machines, which work really well in practice but aren't Bayesian (although it's possible that they are actually Bayesian and I just can't figure out what prior they correspond to).
There are also neural networks, which are sort of Bayesian but (I think?) not really. I'm not actually that familiar with neural nets (or SVMs for that matter) so I could just be wrong.
ETA: It is the case that every non-dominated decision procedure is either a Bayesian procedure or the limit of Bayesian procedures (which I think could alternately be thought of as a Bayesian procedure with a potentially improper prior). So in that sense, for any frequentist procedure that is not Bayesian, there is another procedure that gets higher expected utility in all possible worlds, and is therefore strictly better. The only problem is that this is again an abstract statement about decision procedures, and doesn't take into account the computational difficulty of actually finding the better procedure.
This paper is the closest I've ever seen to a fully Bayesian interpretation of SVMs; mind you, the authors still use "pseudo-likelihood" to describe the data-dependent part of the optimization criterion.
Neural networks are just a kind of non-linear model. You can perform Bayes upon them if you want.
I completely agree with this. It seems to me that we should completely throw away the question of what probability is, and look at which form of inference is optimal.
There is a dispute, ever hear of the idealists and the realists? Luckily it is over now. But either way. It does not matter why you are using one word to stand for many things, you shouldn't do it if you can use a terminology that is more widely accepted. I still think that bayesianism is a better interpretation, a much better interpretation than frequentism, but what is it an interpretation of? Is it an interpretation of math? Seems to me like it as interpretation of typographical string manipulations applied to certain basic strings.
That wasn't another commenter, that was in my article, I'm pretty sure.
If bayesianism wins this argument, which it probably will, it should win because it is the ideal system of statistical inference, not because they managed to convince a bunch of people of a statement with absolutely no empirical consequences. If you argue about what probability is you argue about surface bubbles of your theory that are just irrelevant to the real dispute you are having, whether you are a realist and an idealist, or a frequentist and a bayesian.