Planned summary:
One [methodology](https://www.overcomingbias.com/2012/08/ai-progress-estimate.html) for forecasting AI timelines is to ask experts how much progress they have made to human-level AI within their subfield over the last T years. You can then extrapolate linearly to see when 100% of the problem will be solved. The post linked above collects such estimates, with a typical estimate being 5% of a problem being solved in the twenty year period between 1992 and 2012. Overall these estimates imply a timeline of [372 years](https://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/).
This post provides a reductio argument against this pair of methodology and estimate. The core argument is that if you linearly extrapolate, then you are effectively saying “assume that business continues as usual: then how long does it take”? But “business as usual” in the case of the last 20 years involves an increase in the amount of compute used by AI researchers by a factor of ~1000, so this effectively says that we’ll get to human-level AI after a 1000^{372/20} = 10^56 increase in the amount of available compute. (The authors do a somewhat more careful calculation that breaks apart improvements in price and growth of GDP, and get 10^53.)
This is a stupendously large amount of compute: it far dwarfs the amount of compute used by evolution, and even dwarfs the maximum amount of irreversible computing we could have done with all the energy that has ever hit the Earth over its lifetime (the bound comes from [Landauer’s principle](https://en.wikipedia.org/wiki/Landauer%27s_principle)).
Given that evolution _did_ produce intelligence (us), we should reject the argument. But what should we make of the expert estimates then? One interpretation is that “proportion of the problem solved” behaves more like an exponential, because the inputs are growing exponentially, and so the time taken to do the last 90% can be much less than 9x the time taken for the first 10%.
Planned opinion:
This seems like a pretty clear reductio to me, though it is possible to argue that this argument doesn’t apply because compute isn’t the bottleneck, i.e. even with infinite compute we wouldn’t know how to make AGI. (That being said, I mostly do think we could build AGI if only we had enough compute; see also <@last week’s highlight on the scaling hypothesis@>(@The Scaling Hypothesis@).)
"Overall these estimates imply a timeline of [372 years](https://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/)."
That was only for Hanson's convenience sample, other surveys using the method gave much shorter timelines, as discussed in the post.
Ah, fair point, looking back at this summary I probably should have clarified that the methodology could be applied with other samples and those look much less long.
"Extrapolating the historic 10x fall in $/FLOP every 7.7 years for 372 years yields a 10^48x increase in the amount of compute that can be purchased for that much money (we recognize that this extrapolation goes past physical limits)."
If you are aware that this extrapolation goes past physical limits, why are you using it in your models? Why not use a model where compute plateaus after it reaches those physical limits? That seems more useful than a model that knowingly breaks the laws of physics.
In truth, I expect many respondents believe that the trajectory for such things is slowing down massively as we speak. [We do appear to have passed an inflection point toward slowing down]. There are good technological reasons for this belief, and we would need extremely advance technologies we have no current concept of to reach even a small fraction of that far. Now, that has always worked before, but it is quite understandable not to expect it to continue.
Believing extrapolations of trends have the status of law is very dangerous, but very tempting.
FWIW, Hanson has elsewhere promoted the idea that algorithmic progress is primarily due to hardware progress. Relevant passage:
Maybe there are always lots of decent ideas for better algorithms, but most are hard to explore because of limited computer hardware. As hardware gets better, more new ideas can be explored, and some of them turn out to improve on the prior best algorithms. This story seems to at least roughly fit what I’ve heard about the process of algorithm design.
So he presumably would endorse the claim that HLMI will likely requires several tens of OOM more compute than we currently have, but that a plateauing in other inputs (such as AI researchers) won't be as relevant. (Here's also another post of Hanson where he endorses a somewhat related claim that we should expect exponential increases in hardware to translate to ~linear social impact and rate of automation.)
This post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views. A draft was sent to Robin Hanson for review but received no response.
Summary
Introduction
Suppose that you start with $1 that grows at 10% per year. At this rate, it will take ~241 years to get $10 billion ($1010). When will you think that you’re ten percent of the way there?
You might say that you’re ten percent of the way to $10 billion when you have $1 billion. However, since your money is growing exponentially, it takes 217 years to go from $1 to $1 billion and only 24 more to go from $1 billion to $10 billion, even though the latter gap is larger in absolute terms. If you tried to guess when you would have $10 billion by taking 10x the amount of time to $1 billion, you would guess 2174 years, off by a factor of nine.
Instead, you might say you’re ten percent of the way to $1010 when you have $101, equally spacing the percentile markers along the exponent and measuring progress in terms of log(wealth). Since your money is growing perfectly exponentially, multiplying the number of years it takes to go from $1 to $10 by ten will produce the correct amount of time it will take to go from $1 to $1010.
When employing linear extrapolations, choosing a suitable metric that better tracks progress, like log wealth over wealth for investment, can make an enormous difference to forecast accuracy.
Hanson’s AI timelines estimation methodology
Hanson’s preferred method for estimating AI timelines begins with asking experts what percentage of the way to human level performance the field has come in the last n years, and whether progress has been stable, slowing, or accelerating. In Hanson’s convenience sample of his AI acquaintances he reports typical answers of 5-10% of stable progress over 20 years, and gives a similar estimate himself. He then produces an estimate for human-level performance by dividing the 20 year period by the % of progress to produce estimates of 200-400 years. Age of Em:
Before we engage with the substance of this estimate, we should note that a larger more systematic recent survey using this methodology gives much shorter timeline estimates and more reports of acceleration, as summarized by AI impacts:
Directly asking researchers for their timelines estimates also gives much shorter estimates than using the above methodology on Hanson’s informal survey. Overall we think responses to these sorts of surveys are generally not very considered, so we will focus on the object-level henceforth.
Inputs to AI research have grown enormously in the past
William Nordhaus reports:
Computation available for AI has grown enormously in the decades before these surveys. If, for example, 5-10% progress is attributed to 20 years in which computation for AI research increased 1000x, then Hanson’s extrapolation method would seem to be making predictions about years of similarly explosive growth in compute inputs until human level AI performance.
We have a number of metrics showing this massive input growth:
Extrapolating past input growth yields ludicrously high estimates of the resource requirements for human level AI performance
Hanson’s survey was conducted in 2012, so we must use trends from the two decades prior when extrapolating. We conservatively ignore the major historical growth in spending on AI compute as a fraction of the economy, and especially the surge in investment in large models that drove Amodei and Hernandez’s finding of annual 10x growth in compute used in the largest deep learning models over several years. Accounting for that would yield even more extreme estimates in the extrapolation.
Extrapolating world GDP’s historical 3.5% growth for 372 years yields a 10^5x increase in the amount of money spent on the largest training run. Extrapolating the historic 10x fall in $/FLOP every 7.7 years for 372 years yields a 10^48x increase in the amount of compute that can be purchased for that much money (we recognize that this extrapolation goes past physical limits). Together, 372 years of world GDP growth and Moore’s law yields a 10^53x increase in the amount of available compute for a single training run. Assuming GPT-3 represents the current frontier at 10^23 floating point operations (FLOP), multiplying suggests that 10^76 FLOP of compute will be available for the largest training run in 2393.
10^76 FLOP is vast relative to the evolutionary process that produce animal and human intelligence on Earth, and ludicrous overkill for existing machine learning methods to train models vastly larger than human brains:[1]
Some of Hanson’s writing suggests he is indeed endorsing these sorts of requirements. E.g. in one post he writes:
This seems to be saying that the timeline estimates he is using are indeed based on input growth rather than serial time, and so AGI requires multiplying log input growth (referred to as doublings above) of the 20 year past periods many fold.
Conclusion
So on the object-level we can reject such estimates of the resource requirements for human-level performance. If extrapolating survey responses about fractional progress per Hanson yields such absurdities, we should instead believe that respondents’ subjective progress estimates will accelerate as a function of resource inputs. In particular, that they would be end-loaded, putting higher weight on final orders of magnitude in input growth. This position is supported by the enormous differences in cognitive capabilities between humans and chimpanzees despite less than an order of magnitude difference in the quantity of brain tissue. It seems likely that ‘chimpanzee AI’ would be rated as a lot less than 90% progress towards human level performance, but chimpanzees only appeared after 99%+ of the timespan of evolution, and 90%+ of the growth in brain size.
This view also reconciles better with the survey evidence reporting acceleration and direct timeline estimates much shorter than Hanson’s.
For the more recent survey estimate of 142 years of progress with the above assumptions the result is 10^43 FLOP (more with incorporation of past growth in spending), which is more arguable but still extremely high, suggesting evolutionary levels of compute without any of the obvious advantages of intentional human design providing major efficiencies, and with scaling much worse than observed for deep learning today.
The aggregate and deep learning linear extrapolation results of several decades still suffer a version of this problem, primarily because of the unsustainably rapid growth of expenditures, e.g. it is impossible to maintain annual 10x growth in compute for the largest models for 30 years. While they report much more rapid progress and acceleration than the Hanson survey respondents, we would still expect more acceleration in subjective progress estimates as we come closer to across-the-board superhuman performance.
We don’t think that anthropic distortions conceal large amounts of additional difficulty because of evolutionary timings and convergent evolution, combined with already existing computer hardware and software’s demonstrated capabilities. See Shulman and Bostrom (2012) for more details. ↩︎