Overconfidence is the main failure of calibration that people tend to make in the published research. If LWers are barely overconfident, then that is pretty interesting.
I used linear regression because perfect calibration is reflected by a linear relationship between subjective probability and correct answers, with a slope of 1.
If you prefer, here is a graph in the same style that Yvain used.

X-axis shows subjective probability, with responses divided into 11 bins (<5, <15, ..., <95, and 95+). Y-axis shows proportion correct in each bin, blue dots show data from all LWers on all calibration questions (after data cleaning), and the line indicates perfect calibration. Dots below the line indicate overconfidence, dots above the line indicate underconfidence. Sample size for the bins ranges from 461 to 2241.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.