RichardKennaway comments on Comments on "When Bayesian Inference Shatters"? - Less Wrong

8 Post author: Crystalist 07 January 2015 10:56PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: RichardKennaway 09 January 2015 01:31:06PM 0 points [-]

I'm unclear on your terminology. I take a prior to be a distribution over distributions; in practice, usually a distribution over the parameters of a parameterised family. Let P1 and P2 be two priors of this sort, distributions over some parameter space Q. Write P1(q) for the probability density at q, and P1(x|q) for the probability density at x for parameter q. x varies over the data space X.

Is the distance measure you are proposing max_{q in Q} abs log( P1(q) / P2(q) )?

Or is it max_{q in Q,x in X} abs log( P1(x|q) / P2(x|q) )?

Or max_{q in Q,x in X} abs log( (P1(q)P1(x|q)) / (P2(q)P2(x|q)) )?

Or something else?

Comment author: DanielLC 09 January 2015 08:48:04PM *  1 point [-]

A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I'm proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).

I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it's one-in-a-million, and Bob think it's one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn't matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.

The metric for this would be:

integral_x log(max(P1(x), P2(x)) max(P1(x) / P2(x), P2(x) / P1(x)))

= integral_x log(max(P1^2(x) / P2(x), P2^2(x) / P1(x)))