Some thoughts on meta-probabilties

iarwain1

I often like to think of my epistemic probability assignments in terms of probabilities-of-probabilities, or meta-probabilities. In other words, what probability would I assign that my probability estimate is accurate? Am I very confident, am I only mildly confident, or do I only have a vague clue?

I often think of it as a sort of bell curve, with the x-axis being possible probability estimates and the y-axis being my confidence in those estimates. So if I have very low confidence in my estimate then the height of the bell will be very low, and if I have high confidence it'll be pretty high.

Here are a few issues and insights that have come up when discussing or thinking about this:

What would a meta-probability actually mean?

There's two ways I have for thinking about it:

1) The meta-probability is my prediction for how likely I am to change my mind (and to what extent) as I learn more information about the topic.

2) I know that I'm not even close to being an ideal Bayesian agent, and that my best shots at a probability estimate are fuzzy, imprecise, and likely mistaken anyway. The meta-probability is my prediction for what an ideal Bayesian agent would assign as the probability for the question at hand.

What's the point?

Primarily it's just useful for conveying how sure I am of the probability estimate I'm assigning. It's a way of conveying that a coin flip is 50% heads in a very different sense than me saying "I have not the slightest clue whether it'll rain tomorrow on the other side of the world, and if I need to bet on it I'd give it ~50% odds". I've seen other people convey related sentiments by saying things like, "well 90% is probably too low an estimate, and 99% is probably too high, so somewhere between those". I'd just view the 90% and 99% figures as maybe 95% confidence bounds on a bell curve.

Why not keep going and say how confident you are about your confidence estimates?

True, I could do this, and I sometimes will do this if needed by visualizing a bit of fuzziness in my bell curve. But in any case it's usually enough for my purposes.

Is there any use for such a view in terms of instrumental or utilitarian calculations?

Not sure. I've seen some relevant discussion by Scott Alexander and Holden Karnofsky, but I'm not sure I followed everything there. I also suspect that if you view it as a prediction of how your views might change if you learned more about the subject, then this might imply that it's useful in deciding how much time to invest in further research.

Thoughts?

[Note 1: I discussed this topic about a year ago on LessWrong, and got some insightful responses then. Some commenters disagreed with me then and I'll predict that they'll do so again here - I'd give it, oh, say an 80% chance, moderate confidence ;).]

[Note 2: If you could try to avoid complicated math in your responses that would be appreciated. I'm still on the precalculus level here.]

[Note 3: As I finished writing this I dug up some interesting LessWrong posts on the subject, with links to yet more relevant posts.]

Here are a few issues and insights that have come up when discussing or thinking about this:

What would a meta-probability actually mean?

There's two ways I have for thinking about it:

1) The meta-probability is my prediction for how likely I am to change my mind (and to what extent) as I learn more information about the topic.

What's the point?

Why not keep going and say how confident you are about your confidence estimates?

True, I could do this, and I sometimes will do this if needed by visualizing a bit of fuzziness in my bell curve. But in any case it's usually enough for my purposes.

Is there any use for such a view in terms of instrumental or utilitarian calculations?

Thoughts?

[Note 2: If you could try to avoid complicated math in your responses that would be appreciated. I'm still on the precalculus level here.]

[Note 3: As I finished writing this I dug up some interesting LessWrong posts on the subject, with links to yet more relevant posts.]

If you made no approximations, the normatively correct approach is to carry around your current probability estimate p, and a table which contains what p would be updated to under all possible pieces of evidence you could receive. For example, I might say "I know very little about sports, so I'll assign probability 50% that the Dallas Cowboys will win their next game, but if my friend who follows football tells me they will, I'll assign probability 75%, and if I see a bookie's odds, I'll adopt the implied probability estimate." (This is, of course, an incomplete list--there are many, many other pieces of evidence I could see.) Obviously, these updates should follow the laws of probability on pain of paradox.

Why is this necessary to do things correctly? You can work out that I thought my friend's prediction, because it moved me from 1:1 odds to 3:1 odds, has a likelihood ratio of three. But where did 3 come from? It's the interaction between my knowledge and my friend's knowledge. If the same friend makes the same prediction, then I shouldn't update my probability, because the first time they give me useful info, and the second time they don't give me any useful info. If a second friend also predicts that the Cowboys will win, then I need to estimate how correlated their predictions are in order to determine how to update.

The hyperparameter approach is the clean way to do this in cases where the likelihood of incoming evidence given existing evidence is easy to determine. If I've got a coin flipped in a random fashion (but weighted in an unknown way), then I think that successive flips are independent and equally indicative of the underlying propensity of the coin to land heads when flipped randomly. But if I've got a coin flipped in a precisely controlled deterministic fashion, then I don't think that successive flips are independent and equally indicative of the underlying propensity of the coin to land heads, because that "propensity" is not longer a useful node in my model.