Some thoughts on meta-probabilties

iarwain1

I often like to think of my epistemic probability assignments in terms of probabilities-of-probabilities, or meta-probabilities. In other words, what probability would I assign that my probability estimate is accurate? Am I very confident, am I only mildly confident, or do I only have a vague clue?

I often think of it as a sort of bell curve, with the x-axis being possible probability estimates and the y-axis being my confidence in those estimates. So if I have very low confidence in my estimate then the height of the bell will be very low, and if I have high confidence it'll be pretty high.

Here are a few issues and insights that have come up when discussing or thinking about this:

What would a meta-probability actually mean?

There's two ways I have for thinking about it:

1) The meta-probability is my prediction for how likely I am to change my mind (and to what extent) as I learn more information about the topic.

2) I know that I'm not even close to being an ideal Bayesian agent, and that my best shots at a probability estimate are fuzzy, imprecise, and likely mistaken anyway. The meta-probability is my prediction for what an ideal Bayesian agent would assign as the probability for the question at hand.

What's the point?

Primarily it's just useful for conveying how sure I am of the probability estimate I'm assigning. It's a way of conveying that a coin flip is 50% heads in a very different sense than me saying "I have not the slightest clue whether it'll rain tomorrow on the other side of the world, and if I need to bet on it I'd give it ~50% odds". I've seen other people convey related sentiments by saying things like, "well 90% is probably too low an estimate, and 99% is probably too high, so somewhere between those". I'd just view the 90% and 99% figures as maybe 95% confidence bounds on a bell curve.

Why not keep going and say how confident you are about your confidence estimates?

True, I could do this, and I sometimes will do this if needed by visualizing a bit of fuzziness in my bell curve. But in any case it's usually enough for my purposes.

Is there any use for such a view in terms of instrumental or utilitarian calculations?

Not sure. I've seen some relevant discussion by Scott Alexander and Holden Karnofsky, but I'm not sure I followed everything there. I also suspect that if you view it as a prediction of how your views might change if you learned more about the subject, then this might imply that it's useful in deciding how much time to invest in further research.

Thoughts?

[Note 1: I discussed this topic about a year ago on LessWrong, and got some insightful responses then. Some commenters disagreed with me then and I'll predict that they'll do so again here - I'd give it, oh, say an 80% chance, moderate confidence ;).]

[Note 2: If you could try to avoid complicated math in your responses that would be appreciated. I'm still on the precalculus level here.]

[Note 3: As I finished writing this I dug up some interesting LessWrong posts on the subject, with links to yet more relevant posts.]

Here are a few issues and insights that have come up when discussing or thinking about this:

What would a meta-probability actually mean?

There's two ways I have for thinking about it:

1) The meta-probability is my prediction for how likely I am to change my mind (and to what extent) as I learn more information about the topic.

What's the point?

Why not keep going and say how confident you are about your confidence estimates?

True, I could do this, and I sometimes will do this if needed by visualizing a bit of fuzziness in my bell curve. But in any case it's usually enough for my purposes.

Is there any use for such a view in terms of instrumental or utilitarian calculations?

Thoughts?

[Note 2: If you could try to avoid complicated math in your responses that would be appreciated. I'm still on the precalculus level here.]

[Note 3: As I finished writing this I dug up some interesting LessWrong posts on the subject, with links to yet more relevant posts.]

You do not need a probability distribution on your probability distribution to represent uncertainty. The uncertainty is captured by the spread (variance) of your prior. I think you are confusing the map and the map of the map.

First, I think you should think about whether the thing you are interested in knowing the truth about is a true/false proposition or something that can have more than two possible values.

Let's imagine you want to know the true value of a number X between negative and positive infinity. Scientist 1 tells you "My prior is represented by a standard deviation with mean 0 and standard deviation 1". Scientist 2 says the same thing, except his standard deviation is 10.

These two scientists have the same belief about the most likely value of X, but they have different certainties. This difference will be reflected in how they respond to data: Scientist 2 will always adjust his beliefs more in response to any new evidence. The point is that you are able to reflect the uncertainty of the beliefs in the prior itself.

Next, let's imagine you are interested in a true/false statement. Since there are only two possibilities (law of excluded middle) you can represent your beliefs with a Bernoulli distribution. This distribution has only one parameter, its variance is equal to p(1-p). Therefore, your estimate p tells me everything there is to know about how certain you are.

If you claim "I believe the statement is true with probability 50%" you have committed yourself to updating that probability only by the likelihood ratio associated with future evidence, which depends only on the probability of the outcome given the hypothesis. This likelihood ratio simply cannot depend on how certain you are about the hypothesis.

The only meaningful interpretation of a probability on a probability, is if you are unsure about what you actually believe. In other words, you are trying to make a map of your map. For example, you can say that "I believe with probability 1/4 that I believe that p=0.40, and I believe with probability 3/4 that I believe that p=0.60.". This however logically implies that you believe the statement is true with p=0.45, which is the only thing that determines how you update your beliefs in response to new evidence.

Of course, if you obtain new information about what you truly believe (which is independent of whether the statement is true), you could update your prior on your prior. However, I fail to see what this represents or why this idea would be useful.

You do not need a probability distribution on your probability distribution to represent uncertainty.

I think I do.

The uncertainty is captured by the spread (variance) of your prior.

First, my prior is a probability distribution, isn't it? Second, some but not all uncertainty is captured by the variance of my prior. For example, I could be uncertain about the shape of the distribution -- say, it might be skewed but I'm not sure whether it actually is. Or I don't know whether I'm looking at a Student's-t (which e.g. has a defined mean) or I'm looking a... (read more)