The numbers you use from Holden says that he thinks AGI by 2036 is more than 10%. But when fitting the curves you put that at exactly 10%, which will predictably be an underestimate. It seems better to fit the curves without that number and just check that the result is higher than 10%.
Thanks very much for catching this. We've updated the extrapolation to only consider the two datapoints that are precisely specified. With so few points, the extrapolation isn't all that trustworthy, so we've also added some language to (hopefully) make that clear.
To improve the review, an important addition would be to account for the degree to which different methods influence one another.
E.g. Holden and Ajeya influence one another heavily through conversations. And as Metaculus and Samotsvety, they already incorporate the other models, most notably the bioanchors framework. Maybe you are already correcting for this in the weighted average?
Also, note that e.g., Ajeya uses her own judgment to set the weights for the different models within the bioanchors framework.
Overall, I think right now there is a severe echo chamber effect within most of the forecasts that lets me weigh full outside views, such as the semi-informative priors much higher
We summarize and compare several models and forecasts predicting when transformative AI will be developed.
Highlights
Introduction
Over the last few years, we have seen many attempts to quantitatively forecast the arrival of transformative and/or general Artificial Intelligence (TAI/AGI) using very different methodologies and assumptions. Keeping track of and assessing these models’ relative strengths can be daunting for a reader unfamiliar with the field. As such, the purpose of this review is to:
For aggregating internal weights, we split the timelines into “model-based” and “judgment-based” timelines. Model-based timelines are given by the output of an explicit model. In contrast, judgment-based timelines are either aggregates of group predictions on, e.g., prediction markets, or the timelines of some notable individuals. We decompose in this way as these two categories roughly correspond to “prior-forming” and “posterior-forming” predictions respectively.
In both cases, we elicit subjective probabilities from each Epoch team member reflective of:
respectively. Weights are normalized and linearly aggregated across the team to arrive at a summary probability. These numbers should not be interpreted too literally as exact credences, but rather a rough approximation of how the team views the “relative trustworthiness” of each model/forecast.
Caveats
Results
Read the rest of the review here