I would love to see an analysis and overview of predictions from the Dwarkesh podcast with Leopold. One for Situational awareness would be great too.
That's good to know - transcripts from Dwarkesh's podcast are one of the things I'd be most excited about evaluating too and agreed the one with Leopold seems like a great one to start with.
As uncertainty grows around how AI development will affect culture and society, it becomes more valuable to compare track records of predictions about technological progress.
I've recently been working on automating parts of the methodology from Arb's Scoring The Big 3's Predictive Performance report[1], and have had some promising preliminary results. I hope to try to automate most of the steps in the original report, making it feasible to analyse many more track records and publish the results.
I am particularly interested in the following questions:
See also original Cold Takes post explaining why such evaluations are valuable