Cool idea! I am not sure you'd be able to move to real money betting given that cheating is trivial (just google the text of the article).
Here's an alpha version of a Firefox version!
If you run into any problems, it would be great to hear about them (e.g. by email).
most potentially dangerous capabilities should be highly correlated, such that measuring any of them should be okay. Thus, I think it should be fine to mostly focus on measuring the capabilities that are most salient to policymakers and most clearly demonstrate risks.
Once labs are trying to pass capability evaluations, they will spend effort trying to suppress the specific capabilities being evaluated*, so I think we'd expect them to stop being so highly correlated.
* If they try methods of more generally suppressing the kinds of capabilities that might be dangerous, I think they're likely to test them most on the capabilities being evaluated by RSPs.
We've added a new deck of questions to the calibration training app - The World, then and now.
What was the world like 200 years ago, and how has it changed? Featuring charts from Our World in Data.
Thanks to Johanna Einsiedler and Jakob Graabak for helping build this deck!
We've also split the existing questions into decks, so you can focus on the topics you're most interested in:
This should be fixed now (it was a timezone-related bug!)
I've also added the ability to import your forecasts from a spreadsheet/CSV file, which I think is also useful for switching tools: fatebook.io/import-from-spreadsheet
I've now added this! You can also see your track record for questions with specific tags, e.g.:
I think it's still very useful to be able to predict your own behaviour (including in the case where you know you've made a prediction about it).
Things can get weird if you care more about the outcome of the prediction than the outcome of the event in itself, but this should rarely be the case - and is worth avoiding, I think.