As uncertainty grows around how AI development will affect culture and society, it becomes more valuable to compare track records of predictions about technological progress. 

I've recently been working on automating parts of the methodology from Arb's Scoring The Big 3's Predictive Performance report[1], and have had some promising preliminary results. I hope to try to automate most of the steps in the original report, making it feasible to analyse many more track records and publish the results.

I am particularly interested in the following questions:

  1. Which track record(s) would you find valuable to have evaluated in a similar way to Asimov, Clarke and Heinlein’s, as in the Arb report?
  2. What would you want to see from an LLM-based evaluation that would give you confidence that the results are meaningful and accurate?

 

  1. ^

    See also original Cold Takes post explaining why such evaluations are valuable 

New Answer
New Comment

1 Answers sorted by

George Ingebretsen

20

I would love to see an analysis and overview of predictions from the Dwarkesh podcast with Leopold. One for Situational awareness would be great too.

That's good to know - transcripts from Dwarkesh's podcast are one of the things I'd be most excited about evaluating too and agreed the one with Leopold seems like a great one to start with.

1 comment, sorted by Click to highlight new comments since:

Who’s track record of AI predictions would you like to see evaluated?

Whoever has the best track record :)

Curated and popular this week