alex_zag_al comments on A Fervent Defense of Frequentist Statistics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (125)
In a post addressing a crowd where a lot of people read Jaynes, you're not addressing the Jaynesian perspective on where priors come from.
When E. T. Jaynes does statistics, the assumptions are made very clear.
The Jaynesian approach is this:
The assumptions are explicit because the constraints are explicit.
As an example, imagine that you're measuring something you've measured before. In a lot of situations, you'll end up with constraints on the mean and variance of your prior distribution. This is because you reason in the following way:
If we take "best estimate" to mean "minimum expected least squared error", then these are constraints on the mean and variance of the prior distribution. If you maximize entropy subject to these constraints, you get a normal distribution. And that's your prior.
The assumptions are quite explicit here. You've assumed that these measurements are measuring some unchanging quantity in an unbiased way, and that the magnitude of the error tends to stay the same. And you haven't assumed any other relationship between the next measurement and the previous ones. You definitely didn't assume any autocorrelation, for example, because you didn't use the previous measurement in any of your estimates/constraints.
But since we're not assuming that the data are generated from the prior, we don't have the corresponding guarantee. Which brings up the question, what do we have? What's the point of this?
A paper I'm reading right now (Portilla & Simoncelli, 2000, "A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients") puts it this way:
Guarantee: "The maximum entropy density is optimal in the sense that it does not introduce any constraints on the [prior] besides those [we assumed]."
It's not a performance guarantee. Just a guarantee against hidden assumptions, I guess?
Despite apparently having nothing to do with performance, it makes a lot of sense from within Jaynes's perspective. To him, probability theory is logic: figuring out which conclusions are supported by your premises. It just extends deductive logic, allowing you to see to what degree each conclusion is supported by your premises.
And, I think your criticism of the Dutch Book argument applies here: is your time better spent making sure you don't have any hidden assumptions, or writing more programs to make more money? But it definitely makes your assumptions explicit, that's just part of the whole "prob theory as logic" paradigm.
(But I don't really understand what it means to not introduce any constraints besides those assumed, or why maxent is the procedure that achieves this. That's why I quoted the guarantee from someone who I assume does understand)
What does it mean to "assume that the prior satisfies these constraints"? As you already seem to indicate later in your comment, the notion of "a prior satisfying a constraint" is pretty nebulous. It's unclear what concrete statement about the world this would correspond to. So I still don't think this constitutes a particularly explicit assumption.
I'm responding to arguments that others have raised in the past that Bayesian methods make assumptions explicit while frequentist methods don't. If I then show that frequentist methods also make explicit assumptions, it seems like a weird response to say "oh, well who cares about explicit assumptions anyways?"
Yeah, sorry. I was getting a little off topic there. It's just that in your post, you were able to connect the explicit assumptions being true to some kind of performance guarantee. Here I was musing on the fact that I couldn't. It was meant to undermine my point, not to support it.
?? The answer to this is so obvious that I think I've misunderstood you. In my example, the constraints are on moments of the prior density. In many other cases, the constraints are symmetry constraints, which are also easy to express mathematically.
But then you bring up concrete statements about the world? Are you asking how you get from your prior knowledge about the world to constraints on the prior distribution?
EDIT: you don't "assume a constraint", a constraint follows from an assumption. Can you re-ask the question?
Ah my bad! Now I feel silly :).
So the prior is this thing you start with, and then you get a bunch of data and update it and get a posterior. In general it's pretty unclear what constraints on the prior will translate to in terms of the posterior. Or at least, I spent a while musing about this and wasn't able to make much progress. And furthermore, when I look back, even in retrospect it's pretty unclear how I would ever test if my "assumption" held if it was a constraint on the prior. I mean sure, if there's actually some random process generating my data, then I might be able to say something, but that seems like a pretty rare case... sorry if I'm being unclear, hopefully that was at least somewhat more clear than before. Or it's possible that I'm just nitpicking pointlessly.
Hmm. Considering that I was trying to come up with an example to illustrate how explicit the assumptions are, the assumptions aren't that explicit in my example are they?
Prior knowledge about the world --> mathematical constraints --> prior probability distribution
The assumptions I used to get the constraints are that the best estimate of your next measurement is the average of your previous ones, and that the best estimate of its squared deviation from that average is some number s^2, maybe the variance of your previous observations. But those aren't states of the world, those are assumptions about your inference behavior.
Then I added later that the real assumptions are that you're making unbiased measurements of some unchanging quantity mu, and that the mechanism of your instrument is unchanging. These are facts about the world. But these are not the assumptions that I used to derive the constraints, and I don't show how they lead to the former assumptions. In fact, I don't think they do.
Well. Let me assure you that the assumptions that lead to the constraints are supposed to be facts about the world. But I don't see how that's supposed to work.