ChristianWilliams

Wiki Contributions

Comments

Sorted by

Metaculus is conducting its first user survey in nearly three years. If you have read analyses, consumed forecasts, or made predictions on Metaculus, we want to hear from you! Your feedback helps us better meet the needs of the forecasting community and is incredibly important to us. 


Take the short survey here — we truly appreciate it! (We'll be sure to share what we learn.) 

Hi @gwern, we are currently in the process of combing through winners' documentation of their bots and which models they used. We haven't yet encountered anyone who claims to have used one of the base models. 

We will share here if we learn a participant did indeed use one. 

Maybe "sidestep the data leakage issue" then. The series was designed with these issues in mind. (I work at Metaculus.)  

Hi @Odd anon, thanks for the feedback and questions. 

1. To your point about copying the Community Prediction: It's true that if you copy the CP at all times you would indeed receive a high Baseline Accuracy score. The CP is generally a great forecast! Now, CP hidden periods do mitigate this issue somewhat. We are monitoring user behavior on this front, and will address it if it becomes an issue. We do have some ideas in our scoring trade-offs doc for further ways to address CP copying, e.g.: 

We could have a leaderboard that only considers the last prediction made before the hidden period ends to calculate Peer scores. This largely achieves the goal above: it rewards judgement, it does not require updates or tracking the news constantly. It does not reward finding stale questions.

Have a look here, and let us know what you think! (We also have some ideas we're tinkering with that are not listed in that doc, like accuracy metrics that don't include forecasts that are on the CP or +/- some delta.)

2. On indicating confidence:  You'll see in the tradeoffs doc that we're also considering the idea of letting users exclude a particular forecast from their peer score (Idea # 3), which could somewhat address this. (Interestingly, indicating confidence was attempted at Good Judgment Project, but ultimately didn't work and was abandoned.) 

We're continuing to develop ideas on the above, and we'd definitely welcome further feedback!

Interesting read, thanks for writing it up. FYI the link "The report on the 2022 results is now available" leads to a private Google Drive file.