Sorry, I misread your comment originally. You were careful to say that you were talking about 3 different biases, while most people say that there is a right way to orient each question.
But you weren't careful to say that calibration — the measure of over- and under-confidence — is different from bias. There are four questions here. Introducing new questions that make sense at 50% is irrelevant to the fact that calibration doesn't make sense at 50%. If we are just doing calibration, some of our tests are wasted. If we add a test of a bias, that part of the calibration test is still wasted. If we force the bin away from 50%, then that improves the calibration test. Moreover, I don't think that it harms the test of bias.
Ideally, we would look at everything, but is it worth the effort? If we start with one thing, what is most important? I think that overconfidence is the biggest problem and one should start there. In some sense the annotations you suggest are not much more work, but in making the difference between doing and not doing, I think small increments matter.
(While most people are overconfident and calibration exercises are mainly about reducing overconfidence, the problem of 50% is actually a problem of underconfidence.)
TL;DR: Prediction & calibration parties are an exciting way for your EA/rationality/LessWrong group to practice rationality skills and celebrate the new year.
On December 30th, Seattle Rationality had a prediction party. Around 15 people showed up, brought snacks, brewed coffee, and spent several hours making predictions for 2017, and generating confidence levels for those predictions.
This was heavily inspired by Scott Alexander’s yearly predictions. (2014 results, 2015 results, 2016 predictions.) Our move was to turn this into a communal activity, with a few alterations to meet our needs and make it work better in a group.
Procedure:
To make this work in a group, we recommend the following:
This makes a good activity for rationality/EA groups for the following reasons:
Some examples of the predictions people used:
Also relevant: