DanielLC comments on Comments on "When Bayesian Inference Shatters"? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (31)
The maximum of the absolute value of the log of the ratio between the probability of a given hypothesis on each prior. That is the log of the highest possible odds of a piece of evidence that brings you from one prior to the other.
I'm unclear on your terminology. I take a prior to be a distribution over distributions; in practice, usually a distribution over the parameters of a parameterised family. Let P1 and P2 be two priors of this sort, distributions over some parameter space Q. Write P1(q) for the probability density at q, and P1(x|q) for the probability density at x for parameter q. x varies over the data space X.
Is the distance measure you are proposing max_{q in Q} abs log( P1(q) / P2(q) )?
Or is it max_{q in Q,x in X} abs log( P1(x|q) / P2(x|q) )?
Or max_{q in Q,x in X} abs log( (P1(q)P1(x|q)) / (P2(q)P2(x|q)) )?
Or something else?
A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I'm proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).
I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it's one-in-a-million, and Bob think it's one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn't matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.
The metric for this would be:
integral_x log(max(P1(x), P2(x)) max(P1(x) / P2(x), P2(x) / P1(x)))
= integral_x log(max(P1^2(x) / P2(x), P2^2(x) / P1(x)))