I recently ran across this post, which gives a lighter discussion of a recent paper on Bayesian inference ("On the Brittleness of Bayesian Inference"). I don't understand it, but I'd like to, and it seems like the sort of paper other people here might enjoy discussing.
I am not a statistician, and this summary is based on the blog post (I haven't had time to read the paper yet) so please discount my summary accordingly: It looks like the paper focuses on the effects of priors and underlying models on the posterior distribution. Given a continuous distribution (or a discrete approximation of one) to be estimated from finite observations (of sufficiently high precision), and finite priors, the range of posterior estimates is the same as the range of the distribution to be estimated. Given models that are arbitrarily close (I'm not familiar with the total variance metric, but the impression I had was that, for finite accuracy, they produce the same observations with arbitrarily similar probability), you can have posterior estimates that are arbitrarily distant (within the range of the distribution to be estimated) given the same information. My impression is that implicitly relying on arbitrary precision of a prior can give updates that are diametrically opposed to the ones you'd get with different, but arbitrarily similar priors.
First, of course, I want to know if my summary's accurate, misses the point, or wrong.
Second, I'd be interested in hearing discussions of the paper in general and whether it might have any immediate impact on practical applications.
Some other areas of discussion that would be of interest to me: I'm also not entirely sure what 'sufficiently high precision' would be. I also have only a vague idea of the circumstances where you'd be implicitly relying on the arbitrary precision of a prior. I'm also just generally interest in hearing what people more experienced/intelligent than I am might have to say here.
I'm unclear on your terminology. I take a prior to be a distribution over distributions; in practice, usually a distribution over the parameters of a parameterised family. Let P1 and P2 be two priors of this sort, distributions over some parameter space Q. Write P1(q) for the probability density at q, and P1(x|q) for the probability density at x for parameter q. x varies over the data space X.
Is the distance measure you are proposing max_{q in Q} abs log( P1(q) / P2(q) )?
Or is it max_{q in Q,x in X} abs log( P1(x|q) / P2(x|q) )?
Or max_{q in Q,x in X} abs log( (P1(q)P1(x|q)) / (P2(q)P2(x|q)) )?
Or something else?
A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I'm proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).
I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it's one-in-a-million, and Bob think it's one-in-a-billion, then Alice would need a thousand-to-one evidence ratio t... (read more)