Hey all,

After reading "How to Measure Anything" I've experimented a bit with calibration training and using his calibration tools, and after being convinced by his data on the usefulness of calibration in forecasting for the real world, have seen a big update in my own calibration.

I'm wondering if anybody knows of similar tools and studies on calibration of Bayesian updating.  Broadly,I imagine it would look like:

1. Using the tools and calibration methods I already use to figure out how the feeling of "correctness" of my prior correlates to a numerical value.

2. Using similar (but probably not identical) tools to figure out how "convincing" the new data feels correlates to specific numbers.

3. Calibrating these two numbers to bayes theorom, such that I know approximately how much to update the original feeling to reflect the new information

4. Using mmenomic or visualization techniques to pair the new feeling with the belief, so that next time I remembered the belief, I'd feel the slightly different calibration.

Anyways, I'm curious if anyone has experimented with these processes, if there's any research on it, or it has been previously experimented with on lesswrong. I'd definitely like to lock down a similar procedure for myself.

I should note that many times, I already do this naturally... but my guess is I systematically over and under update the feeling based on confirmation bias.  I'd like to recalibrate my recalibration :).

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 7:54 PM

It would be easier to use your calibration curve directly, I think. Does it not represent how you should feel given how you do feel?

Calibration is important, but pretty easy to fake. It's actually even easier. I can think of one person on PredictionBook who I believe to be highly overconfident. I've caught them making a postdictions on the website, which unfortunately still counts towards their calibration curve. They also "jump the gun" and judge a prediction as verifying their belief before the date.

Further, if you put the right confidence interval around your calibration curve, you'll see that it takes a very large number (hundreds) of predictions for each probability to actually determine if you are well or poorly calibrated.

Given this, I think "calibration" should refer to a set of uniform predetermined predictions which are precise enough such that you can't wiggle out of them, and also judged by an impartial third party. And you need to make a LOT of predictions.

(Also, I imagine calibration is subject dependent. I've tried to estimate the mass of objects and I'm not sure if me being good or bad at that says much about my ability to make political predictions, for example.)

[-][anonymous]9y00

I'm not sure what you're getting at in your first sentence. Are you saying that my feelings will be updated automatically given new evidence? This does happen to some extent, but as I've noted, I suspect that how much I change the feeling is supremely uncalibrated to an actual Bayesian update - I'd like to calibrate how much those feelings change based on evidence. There's a whole category of biases around the concept of "updating beliefs based on new information" - and I'd like to systematically reduce all those biases in one fell swoop with this training.

That's an interesting take by Nate on gaming calibration. I haven't noticed that tendency myself - mostly because I don't keep a global prediction curve for myself, but rather create a new one every time I'm doing calibration training, and only look at it a few times a session. I'd like to use prediction book more but haven't set up a habit of recording predictions in it.

I think that you can definitely hone calibration in an individual field, but in my experience (and based on the research of Doug Hubbard) there's definitely a global calibration factor. Just remembering to use the equivalent bet test and the absurdity test significantly improves my prediction. To get back to your mass example, I actually did a training session once on object weight, and another on predicting calories, and I find that even though I don't have much experience with doing either of those things, I was actually quite well calibrated. The only difference with them was that my confidence interval was significantly wider than in fields in which I'm more confident.

Perhaps I don't understand what you were suggesting in the first place. I interpreted what you wrote as taking into account your calibration curve into a Bayesian update to make better calibrated predictions. But the calibration curve itself is basically the actual probability something will occur conditioned on your belief. So, you can use the calibration curve to get an idea of what your real confidence level should be. For example, if you say you are 70% confident for a certain prediction, look at your calibration curve. Let's say the curve says that 60% of the things you say will occur 70% of the time actually occur. Then you have reason to believe assigning a probability of 60% would be better.

Similar ideas are used in other contexts, e.g., here's the idea applied to scheduling.

This seems to be the most straightforward way to take into account your calibration information. Maybe I don't understand what you're suggesting, though. If you could make an example (with numbers) I'd be interested.

I suppose you're looking to make Bayesian updates more accurate now that I think about it, and I don't know if anyone has a systematic way of doing this. I have used various probability rules, Bayes among them, to calculate conditional probabilities to base my predictions on (e.g., here I demonstrate myself to be overconfident about how likely I would be to regret getting a vasectomy), but this is not feeling based, which seems to be what you are going for. I don't really trust my feelings here, but perhaps doing this would help calibrate them.

[-][anonymous]9y10

Yes, the primary benefit of calibration training for me has been that I can now say "hmm, if I "feel" this confident about something, it's 90% likely. If I "feel" this confident of something after running it through my basic calibration exercises, I'm 80% likely, etc. Also, if I'm asked to give a numerical estimate, I'm very good at giving a 90% confidence interval that is within range 90% of the time.

If you haven't used calibration training in this way, I highly recommend it.

In terms of what I'm trying to accomplish, you're right that I want a way to make my Bayesian updates more accurate. Part of the problem in training this is that I AM a bit fuzzy on the math. Like, I mostly get it when talking about toy problems like taking balls from a cup or figuring out how likely a disease test is to make false positives, but it gets very confusing how all the numbers work when you're talking about a previous prior you had where you thought you were 30% likely to reget a vasectomy, and then you get new information that suggests the base rate is 10% of the population (and of course, that's still a relatively simple example where you have hard numbers).

My basic idea was to calibrate on toy problems, then use the same feelings based "ok, I know that this feeling correlates to "x decibels of evidence" - but I don't really have a surefire plan more than that.

Here's my idea to get better at doing updates:

  1. Estimate the base rate.
  2. Based on your predicted base rate, estimate a conditional probability based on new information.
  3. Compare the estimated base rate against the actual base rate.
  4. Using the actual base rate, now estimate a new conditional probability.
  5. Compare both the estimated conditional probabilities against the actual conditional probability.

So, I think there are multiple levels here. You want to make sure you get the base rate part right. You also want to make sure that you get the update right. You can see how well calibrated you are for each. You might find that you're okay at estimating conditional probabilities, but bad at estimating the base rate, etc.

I tend not to use my old estimates as a prior. I'm not an expert at Bayesian probability (so maybe I get all of this wrong!). I interpret what I'm looking for as a conditional probability, maybe with an estimated prior/base rate (which you could call your "old estimate", I guess). I prefer data whenever it is available.

The toy problems are okay, and I'm sure you can generate a lot of them.

The vasectomy example was much less straightforward than I would have expected. I spent at least 10 minutes rearranging different equations for the conditional probability before finding one where I could get what I wanted in terms of what data I could find. The problem is that the data you can find in the literature often does not fit so nicely into a simple statement of Bayes rule.

Another example I found to be useful was computing my risk for developing a certain cancer. The base rate of this cancer is very low, but I have a family member who developed the cancer (and recovered, thankfully), and the relative risk for me is considerably higher. I had felt this gave me a probability of developing the cancer on the order of 10% or so, but doing the math showed that while it was higher than the base rate, it's still basically negligible. This sounds to me like the sort of exercise you want to do.