You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

RichardKennaway comments on Comments on "When Bayesian Inference Shatters"? - Less Wrong Discussion

8 Post author: Crystalist 07 January 2015 10:56PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: RichardKennaway 09 January 2015 01:31:06PM 0 points [-]

I'm unclear on your terminology. I take a prior to be a distribution over distributions; in practice, usually a distribution over the parameters of a parameterised family. Let P1 and P2 be two priors of this sort, distributions over some parameter space Q. Write P1(q) for the probability density at q, and P1(x|q) for the probability density at x for parameter q. x varies over the data space X.

Is the distance measure you are proposing max_{q in Q} abs log( P1(q) / P2(q) )?

Or is it max_{q in Q,x in X} abs log( P1(x|q) / P2(x|q) )?

Or max_{q in Q,x in X} abs log( (P1(q)P1(x|q)) / (P2(q)P2(x|q)) )?

Or something else?

Comment author: DanielLC 09 January 2015 08:48:04PM *  1 point [-]

A distribution over distributions just becomes a distribution. Just use P(x) = integral_{p} P(x|q)P(q)dq. The distance I'm proposing is max_x abs log(P1(x) / P2(x)) = max_x abs (log(integral_{p} P1(x|q) P1(q) dq) - integral_p P2(x|q) P2(q) dq)).

I think it might be possible to make this better. If Alice and Bob both agree that x is unlikely, then both disagreeing about the probability seems like less of a problem. For example, if Alice thinks it's one-in-a-million, and Bob think it's one-in-a-billion, then Alice would need a thousand-to-one evidence ratio to believe what Bob believes which means that that piece of evidence has a one-in-a-thousand chance of occurring, but since it only has a one-in-a-million chance of being needed, that doesn't matter much. It seems like it would only make a one-in-a-thousand difference. If you do it this way, it would need to be additive, but the distance is still at most the metric I just gave.

The metric for this would be:

integral_x log(max(P1(x), P2(x)) max(P1(x) / P2(x), P2(x) / P1(x)))

= integral_x log(max(P1^2(x) / P2(x), P2^2(x) / P1(x)))