Wow, that's a great point. We can't measure anyone's "true" calibration by asking them a specific set of questions, because we're not drawing questions from the same distribution as nature! That's up there with the obvious-in-retrospect point that the placebo effect gets stronger or weaker depending on the size of the placebo group in the experiment. Good work :-)
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.