A Bayesian will have a probability distribution over possible outcomes, some of which give her lower scores than her probabilistic expectation of average score, and some of which give her higher scores than this expectation.

I am unable to parse your above claim, and ask for specific math on a specific example. If you know your score will be lower than you expect, you should lower your expectation. If you know something will happen less often than the probability you assign, you should assign a lower probability. This sounds like an inconsistent epistemic state for a Bayesian to be in.

Comment author:Cyan
29 July 2009 02:32:24AM
*
2 points
[-]

I spent some time looking up papers, trying to find accessible ones. The main paper that kicked off the matching prior program is Welch and Peers, 1963, but you need access to JSTOR.

The best I can offer is the following example. I am estimating a large number of positive estimands. I have one noisy observation for each one; the noise is Gaussian with standard deviation equal to one. I have no information relating the estimands; per Jaynes, I give them independent priors, resulting in independent posteriors*. I do not have information justifying a proper prior. Let's say I use a flat prior over the positive real line. No matter the true value of each estimand, the sampling probability of the event "my posterior 90% quantile is greater than the estimand" is less than 0.9 (see Figure 6 of this working paper by D.A.S. Fraser). So the more estimands I analyze, the more sure I am that the intervals from 0 to my posterior 90% quantiles will contain less than 90% of the estimands.

I don't know if there's an exact matching prior in this problem, but I suspect it lacks the correct structure.

* This is a place I think Jaynes goes wrong: the quantities are best modeled as exchangeable, not independent. Equivalently, I put them in a hierarchical model. But this only kicks the problem of priors guaranteeing calibration up a level.

I'm sorry, but the level of frequentist gibberish in this paper is larger than I would really like to work through.

If you could be so kind, please state:

What the Bayesian is using as a prior and likelihood function;

and what distribution the paper assumes the actual parameters are being drawn from, and what the real causal process is governing the appearance of evidence.

If the two don't match, then of course the Bayesian posterior distributions, relative to the experimenter's higher knowledge, can appear poorly calibrated.

If the two do match, then the Bayesian should be well-calibrated. Sure looks QED-ish to me.

Comment author:Cyan
29 July 2009 05:08:56AM
*
6 points
[-]

The example doesn't come from the paper; I made it myself. You only need to believe the figure I cited -- don't bother with the rest of the paper.

Call the estimands mu_1 to mu_n; the data are x_1 to x_n. The prior over the mu parameters is flat in the positive subset of R^n, zero elsewhere. The sampling distribution for x_i is Normal(mu_i,1). I don't know the distribution the parameters actually follow. The causal process is irrelevant -- I'll stipulate that the sampling distribution is known exactly.

Call the 90% quantiles of my posterior distributions q_i. From the sampling perspective, these are random quantities, being monotonic functions of the data. Their sampling distributions satisfy the inequality Pr(q_i > mu_i | mu_i) < 0.9. (This is what the figure I cited shows.) As n goes to infinity, I become more and more sure that my posterior intervals of the form (0, q_i] are undercalibrated.

You might cite the improper prior as the source of the problem. However, if the parameter space were unrestricted and the prior flat over all of R^n, the posterior intervals would by correctly calibrated.

But it really is fair to demand a proper prior. How could we determine that prior? Only by Bayesian updating from some pre-prior state of information to the prior state of information (or equivalently, by logical deduction, provided that the knowledge we update on is certain). Right away we run into the problem that Bayesian updating does not have calibration guarantees in general (and for this, you really ought to read the literature), so it's likely that any proper prior we might justify does not have a calibration guarantee.

Comment author:wedrifid
27 July 2009 01:04:01PM
1 point
[-]

How about this: a Bayesian will always predict that she is perfectly calibrated, even though she knows the theorems proving she isn't.

Wanna bet? Literally. Have a Bayesian to make and a whole bunch of predictions and then offer her bets with payoffs based on what apparent calibration the results will reflect. See which bets she accepts and which she refuses.

Comment author:wedrifid
27 July 2009 03:23:27PM
0 points
[-]

Find a candidate.

I was about to suggest we could just bet raw ego points by publicly posting here... but then I realised I prove my point just by playing.

It should be obvious, by the way, that if the predictions you have me make pertain to black boxes that you construct then I would only bet if the odds gave a money pump. There are few cases in which I would expect my calibration to be superior to what you could predict with complete knowledge of the distribution.

Comment author:Cyan
27 July 2009 03:33:34PM
*
1 point
[-]

It should be obvious, by the way, that if the predictions you have me make pertain to black boxes that you construct then I would only bet if the odds gave a money pump.

Comment author:wedrifid
27 July 2009 04:40:58PM
*
0 points
[-]

I am convinced in full generality that being offered the option of a bet can only provide utility >= 0. So if the punch line is 'insuficiently constrained rationality' then yes, the joke is on me!

And yes, I suspect trying to get my head around that paper would (will) be rather costly! I'm a goddam programmer. :P

## Comments (155)

BestHow about this: a Bayesian will always predict that she is perfectly calibrated, even though she knows the theorems proving she isn't.

A Bayesian will have a probability distribution over possible outcomes, some of which give her lower scores than her probabilistic expectation of average score, and some of which give her higher scores than this expectation.

I am unable to parse your above claim, and ask for specific math on a specific example. If you

knowyour score will be lower than you expect, you should lower your expectation. If you know something will happen less often than the probability you assign, you should assign a lower probability. This sounds like an inconsistent epistemic state for a Bayesian to be in.*2 points [-]I spent some time looking up papers, trying to find accessible ones. The main paper that kicked off the matching prior program is Welch and Peers, 1963, but you need access to JSTOR.

The best I can offer is the following example. I am estimating a large number of positive estimands. I have one noisy observation for each one; the noise is Gaussian with standard deviation equal to one. I have no information relating the estimands; per Jaynes, I give them independent priors, resulting in independent posteriors*. I do not have information justifying a proper prior. Let's say I use a flat prior over the positive real line. No matter the true value of each estimand, the sampling probability of the event "my posterior 90% quantile is greater than the estimand" is less than 0.9 (see Figure 6 of this working paper by D.A.S. Fraser). So the more estimands I analyze, the more sure I am that the intervals from 0 to my posterior 90% quantiles will contain less than 90% of the estimands.

I don't know if there's an exact matching prior in this problem, but I suspect it lacks the correct structure.

* This is a place I think Jaynes goes wrong: the quantities are best modeled as exchangeable, not independent. Equivalently, I put them in a hierarchical model. But this only kicks the problem of priors guaranteeing calibration up a level.

I'm sorry, but the level of frequentist gibberish in this paper is larger than I would really like to work through.

If you could be so kind, please state:

What the Bayesian is using as a prior and likelihood function;

and what distribution the paper assumes the actual parameters are being drawn from, and what the real causal process is governing the appearance of evidence.

If the two don't match, then of course the Bayesian posterior distributions, relative to the experimenter's higher knowledge, can appear poorly calibrated.

If the two do match, then the Bayesian should be well-calibrated. Sure looks QED-ish to me.

*6 points [-]The example doesn't come from the paper; I made it myself. You only need to believe the figure I cited -- don't bother with the rest of the paper.

Call the estimands mu_1 to mu_n; the data are x_1 to x_n. The prior over the mu parameters is flat in the positive subset of R^n, zero elsewhere. The sampling distribution for x_i is Normal(mu_i,1). I don't know the distribution the parameters actually follow. The causal process is irrelevant -- I'll stipulate that the sampling distribution is known exactly.

Call the 90% quantiles of my posterior distributions q_i. From the sampling perspective, these are random quantities, being monotonic functions of the data. Their sampling distributions satisfy the inequality Pr(q_i > mu_i | mu_i) < 0.9. (This is what the figure I cited shows.) As n goes to infinity, I become more and more sure that my posterior intervals of the form (0, q_i] are undercalibrated.

You might cite the improper prior as the source of the problem. However, if the parameter space were unrestricted and the prior flat over all of R^n, the posterior intervals would by correctly calibrated.

But it really is fair to demand a proper prior. How could we determine that prior? Only by Bayesian updating from some pre-prior state of information to the prior state of information (or equivalently, by logical deduction, provided that the knowledge we update on is certain). Right away we run into the problem that Bayesian updating does not have calibration guarantees in general (and for this, you really ought to read the literature), so it's likely that any proper prior we might justify does not have a calibration guarantee.

Wanna bet? Literally. Have a Bayesian to make and a whole bunch of predictions and then offer her bets with payoffs based on what apparent calibration the results will reflect. See which bets she accepts and which she refuses.

Are you volunteering?

Sure. :)

But let me warn you... I actually predict my calibration to be pretty darn awful.

We need a trusted third party.

Find a candidate.

I was about to suggest we could just bet raw ego points by publicly posting here... but then I realised I prove my point just by playing.

It should be obvious, by the way, that if the predictions you have me make pertain to black boxes that you construct then I would only bet if the odds gave a money pump. There are few cases in which I would expect my calibration to be superior to what you could predict with complete knowledge of the distribution.

*1 point [-]Phooey. There goes plan A.

;)

Plan B involves trying to use some nasty posterior inconsistency results, so don't think you're out of the woods yet.

*0 points [-]I am convinced in full generality that being offered the option of a bet can only provide utility >= 0. So if the punch line is 'insuficiently constrained rationality' then yes, the joke is on me!

And yes, I suspect trying to get my head around that paper would (will) be rather costly! I'm a goddam programmer. :P