If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
People use PredictionBook to make predictions about many heterogeneous questions, in order to train calibration. Couldn't we train calibration more efficiently by making a very large number of predictions about a fairly small, homogeneous group of questions?
For instance, at the moment people are producing a single probability for each of n questions about e.g. what will happen in HPMOR's final arc. This has a high per-question cost (people must think up individual questions, formalize them, judge edge cases, etc.) and you only get one piece of data from each question (the probability assigned to the correct outcome).
Suppose instead we get some repeatable, homogeneous question-template with a numerical answer, e.g. "what time is it?", "how many dots are in this randomly-generated picture?", or "how long is the Wikipedia article named _?". Then instead of producing only one probability for each question, you give your {1,5,10,20,...,90,95,99}-percentile estimates. Possible advantages of this approach:
Possible disadvantages of this approach:
Overall, I'd guess:
An alternative, roughly between the two groups discussed above, would be to find some repeatable way of generating questions that are at least slightly interesting. For instance, play online Mafia and privately make lots of predictions about which players have which roles, who will be lynched or murdered, etc. Or predict chess or poker. Or predict karma scores of LW/Reddit comments. Or use a spaced repetition system, but before showing the answer estimate the probability that you got the answer right. Any better ideas?
You can predict how long tasks/projects will take you (stopwatch and/or calendar time). Even if calibration doesn't generalize, it's potentially useful on its own there. And while you can't quite mass-produce questions/predictions, it's not such a hassle to rack up a lot if you do them in batches. Malcolm Ocean wrote about doing this with a spreadsheet, and I threw together an Android todo-with-predictions app for a similar self experiment.