I think there is a fundamental misunderstanding of the nature of software performance in this kind of arguments.
Software performance, according to any metric of your choice (speed, memory usage, energy consumption, etc.) is fundamentally a measure of efficiency.
For any given task, and any given hardware architecture, there is one program that maximizes the performance metric: that's 100% efficiency.
The fact that efficiency is bounded means that you can't keep doubling it. If your program is 25% efficient, then the best you can hope for is to double its efficiency twice and then you are done.
In practice, when you try to improve the efficiency of a program, you quickly run into diminishing returns: you get the biggest gains from chosing the proper general forms of the algorithms and data structures, then the more you fiddle with the details, down to machine code level, the less gains you get, despite the effort.
In fact, it can be shown that obtaining the most efficient program for a given problem is uncomputable in the general case.
Therefore, self-improving AI or not, you only get so far with software improvements. So you are left with hardware improvements, which bring us to another...
I find the use of schematic differential equations, as if they actually meant something, to be horrifically bad. Yudkowsky's original point in Hard Takeoff was that there is no a priori reason to expect than an agent that can RSI should improve at a rate that humans can react to.
Even naive dimensional analysis is enough to show that these equations don't mean anything.
Saying that "compiler technology" has only made floating point programs 8 times faster is somewhat too much of an apples-to-apples comparison. Sure, if you take the exact same Fortran program and recompile it decades later you may only see an 8x speedup (I'd have guessed 2x or 4x, myself, depending on how much the hardware benefits from vectorized instructions). But if you instead take a more modern program designed to solve the same higher-level problem, you are more likely to see a three order of magnitude speedup. Graph from the SCaLeS Report, Vol. 2, D. Keyes et. al. eds, 2004; it specifically refers to magnetohydrodynamics simulations but it's pretty typical of a wide class of computational mathematics problems.
An AGI will presumably be able to optimize not only its own source code compilation, but also its own algorithm choices. That process will also eventually hit diminishing returns, but who knows how many orders of magnitude it could get before things plateau? The first AGI is likely to be using a lot of relatively new and suboptimal algorithms almost by definition of "first".
I've wondered about the possibility of FOOM-FLOP. Eventually, the AI is exploring unknown territory as it tries to improve itself, and it seems at least possible that it tries something plausible which breaks itself. Backups are no guarantee of safety-- the AI could have "don't use the backups" as part of the FLOP.
In effect the AI would need to be provably friendly to its past self.
A handful of the many, many problems here:
It would be trivial for even a Watson-level AI, specialized to the task, to hack into pretty much every existing computer system; almost all software is full of holes and is routinely hacked by bacterium-complexity viruses
"The world's AI researchers" aren't remotely close to a single entity working towards a single goal; a human (appropriately trained) is much more like that than Apple, which is much more like than than the US government, which is much more like that than a nebulous cluster of people
Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically. If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.
The economy is ...
Just a sidenote:
Average neuron density within that brain hardware.
There is no strictly linear correlation between neuron density and brain power:
An impressive abstract showcasing synaptic pruning:
The aim of this study was to quantify the total number of neurons and glial cells in the mediodorsal nucleus of the thalamus (MD) of 8 newborn human brains, in comparison to 8 adult human brains. (...) In the case of the adults, the total number of neurons in the entire MD was an average of 41% lower than in the newborn (...).
Human cognition is fundamentally limited by biological drives, tiredness, boredom, limited working memory, and low precision. Humans can't recursively improve their own minds and so our exponential growth rate is constant. An AI's improvement rate will not be constant and so I think it is unreasonable to estimate the rate of exponential growth of an AI based on how long it takes human researchers to develop an AI with an equivalent level of ability.
For instance, let's say that in 2080 we develop an AI capable of designing itself from scratch in exactly 8...
I wanted to talk a bit more about what biology may or may not tell us about the ease of AGI.
This OB post discusses the importance of brain hardware differences in intelligence. One of the papers mentioned writes:
...It remains open whether humans have truly unique cognitive properties. Experts recognize aspects of imitation, theory of mind, grammatical–syntactical language and consciousness in non-human primates and other large-brained mammals. This would mean that the outstanding intelligence of humans results not so much from qualitative differences, but
Let's assume AGI that's on par with the world AI research community is reached in 2080 (LW's median "singularity" estimate in 2011). We'll pretend AI research has only been going on since 2000, meaning 80 "standard research years" of progress have gone in to the AGI's software. So at the moment our shiny new AGI is fired up, u = 80, and it's doing research at the rate of one "human AGI community research year" per year, so du/dt = 1. That's an effective rate of return on...
Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically. If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.
The economy is ...
I haven't read this paper in detail, but it seems to suggest that Moore's Law-style exponential growth may not be that far off for most technologies:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0052669
(Which probably counts as a point for Foom proponents.)
A flurry of insights that either dies out or expands exponentially doesn't seem like a very good description of how human minds work, and I don't think it would describe an AGI well either.
That's how technological evolution works, though. If you're in olden-days-Tasmania, you get devolution. Otherwise, you get progress. There's a threshold effect involved. We have to reproduce the progress seen in cultural evolution - not just make a mind.
You can't know the difficulty of a problem until you've solved it. Look at Hilbert's problems. Some were solved immediately while others are still open today. Proving the you can color a map with five colors is easy and only takes up half a page. Proving that you can color a map with four colors is hard and takes up hundreds of pages. The same is true of science - a century ago physics was thought to soon be a dead field until the minor glitches with blackbody radiation and Mercury's orbit turned out to be more than minor and actually dictated by mathemat
A point against there being important, chunky undiscovered insights in to intelligence is that if there were such insights, they'd likely be simple, and if they'd be simple, they likely would have been discovered already. So the fact that no one has yet discovered any such brilliant, simple idea is evidence against them existing. (I'm not the first to point out the increasing difficulty of making new contributions in math/science; gwern says anything referencing Jones on this page may be relevant. Compare the breadth and applicability of the discoveries...
Summary
Takeoff models discussed in the Hanson-Yudkowsky debate
The supercritical nuclear chain reaction model
Yudkowsky alludes to this model repeatedly, starting in this post:
I don't like this model much for the following reasons:
The "differential equations folded on themselves" model
This is another model Eliezer alludes to, albeit in a somewhat handwavey fashion:
It's not exactly clear to me what the "whole chain of differential equations" is supposed to refer to... there's only one differential equation in the preceding paragraph, and it's a standard exponential (which could be scary or not, depending on the multiplier in the exponent. Rabbit populations and bank account balances both grow exponentially in a way that's slow enough for humans to understand and control.)
Maybe he's referring to the levels he describes here: metacognitive, cognitive, metaknowledge, knowledge, and object. How might we paramaterize this system?
Let's say c is our AGI's cognition ability, dc/dt is the rate of change in our AGI's cognitive ability, m is our AGI's "metaknowledge" (about cognition and metaknowledge), and dm/dt is the rate of change in metaknowledge. What I've got in mind is:
where p and q are constants.
In other words, both change in cognitive ability and change in metaknowledge are each individually directly proportionate to both cognitive ability and metaknowledge.
I don't know much about understanding systems of differential equations, so if you do, please comment! I put the above system in to Wolfram Alpha, but I'm not exactly sure how to interpret the solution provided. In any case, fooling around with this script suggests sudden, extremely sharp takeoff for a variety of different test parameters.
The straight exponential model
To me, the "proportionality thesis" described by David Chalmers in his singularity paper, "increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems", suggests a single differential equation that looks like
where u represents the number of upgrades that have been made to an AGI's source code, and s is some constant. The solution to this differential equation is going to look like
where the constant c1 is determined by our initial conditions.
(In Recursive Self-Improvement, Eliezer calls this a "too-obvious mathematical idiom". I'm inclined to favor it for its obviousness, or at least use it as a jumping-off point for further analysis.)
Under this model, the constant s is pretty important... if u(t) was the amount of money in a bank account, s would be the rate of return it was receiving. The parameter s will effectively determine the "doubling time" of an AGI's intelligence. It matters a lot whether this "doubling time" is on the scale of minutes or years.
So what's going to determine s? Well, if the AGI's hardware is twice as fast, we'd expect it to come up with upgrades twice as fast. If the AGI had twice as much hardware, and it could parallelize the search for upgrades perfectly (which seems like a reasonable approximation to me), we'd expect the same thing. So let's decompose s and make it the product of two parameters: h representing the hardware available to the AGI, and r representing the ease of finding additional improvements. The AGI's intelligence will be on the order of u * h, i.e. the product of the AGI's software quality and hardware capability.
Considerations affecting our choice of model
Diminishing returns
The consideration here is that the initial improvements implemented by an AGI will tend to be those that are especially easy to implement and/or especially fruitful to implement, with subsequent improvements tending to deliver less intelligence bang for the implementation buck. Chalmers calls this "perhaps the most serious structural obstacle" to the proportionality thesis.
To think about this consideration, one could imagine representing a given improvement as a pair of two values (u, d). u represents a factor by which existing performance will be multiplied, e.g. if u is 1.1, then implementing this improvement will improve performance by a factor of 1.1. d represents the cognitive difficulty or amount of intellectual labor to required to implement a given improvement. If d is doubled, then at any given level of intelligence, implementing this improvement will take twice as long (because it will be harder to discover and/or harder to translate in to code).
Now let's imagine ordering our improvements in order from highest to lowest u to d ratio, so we implement those improvements that deliver the greatest bang for the buck first.
Thus ordered, let's imagine separating groups of consecutive improvements in to "tiers". Each tier's worth of improvements, when taken together, will represent the doubling of an AGI's software quality, i.e. the product of the u's in that cluster will be roughly 2. For a steady doubling time, each tier's total difficulty will need sum to approximately twice the difficulty of the tier before it. If tier difficulty tends to more than double, we're likely to see sub-exponential growth. If tier difficulty tends to less than double, we're likely to see super-exponential growth. If a single improvement delivers a more-than-2x improvement, it will span multiple "tiers".
It seems to me that the quality of fruit available at each tier represents a kind of logical uncertainty, similar to asking whether an efficient algorithm exists for some task, and if so, how efficient.
On the this diminishing returns consideration, Chalmers writes:
Eliezer Yudkowsky's objection is similar:
First, hunter-gatherers can't design toys that are a thousand times as neat as the ones chimps design--they aren't programmed with the software modern humans get through the education (some may be unable to count), and educating apes has produced interesting results.
Speaking as someone who's basically clueless about neuroscience, I can think of many different factors that might contribute to intelligence differences within the human race or between humans and other apes:
It seems to me like these factors (or ones like them) may multiply together to produce intelligence, i.e. the "intelligence equation", as it were, could be something like intelligence = processing_speed * cc_abstract_hardware * neuron_density * connections_per_neuron * propensity_for_abstraction * mental_algorithms. If the ancestral environment rewarded intelligence, we should expect all of these characteristics to be selected for, and this could explain the "low acceleration factor" in human intelligence increase. (Increasing your processing speed by a factor of 1.2 does more when you're already pretty smart, so all these sources of intelligence increase would feed in to one another.)
In other words, it's not that clear what relevance the evolution of human intelligence has to the ease and quality of the upgrades at different "tiers" of software improvements, since evolution operates on many non-software factors, but a self-improving AI (properly boxed) can only improve its software.
Bottlenecks
In the Hanson/Yudkowsky debate, Yudkowsky declares Douglas Englebart's plan to radically bootstrap his team's productivity though improving their computer and software tools "insufficiently recursive". I agree with this assessment. Here's my modelling of this phenomenon.
When a programmer makes an improvement to their code, their work of making the improvement requires the completion of many subtasks:
Each of those subtasks will consist of further subtasks like poking through their code, staring off in to space, typing, and talking to their rubber duck.
Now the programmer improves their development environment so they can poke through their code slightly faster. But if poking through their code takes up only 5% of their development time, even an extremely large improvement in code-poking abilities is not going to result in an especially large increase in his development speed... in the best case, where code-poking time is reduced to zero, the programmer will only work about 5% faster.
This is a reflection of Amdahl's Law-type thinking. The amount you can gain through speeding something up depends on how much it's slowing you down.
Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically. If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.
Case studies in technological development speed
Moore's Law
From this McKinsey report. So Moore's Law is an outlier where technological development is concerned. I suspect that making transistors smaller and faster doesn't require finding ways to improve dozens of heterogeneous components. And when you zoom out to view a computer system as a whole, other bottlenecks typically appear.
(It's also worth noting that research budgets in the semiconductor field have also risen greatly in the semiconductor industry since its inception, but obviously not following the same curve that chip speeds have.)
Compiler technology
This paper on "Proebstig's Law" suggests that the end result of all the compiler research done between 1970 or so and 2001 was that a typical integer-intensive program was compiled to run 3.3 times faster, and a typical floating-point-intensive program was compiled to run 8.1 times faster. When it comes to making programs run quickly, it seems that software-level compiler improvements are swamped by hardware-level chip improvements--perhaps because, like an AGI, a compiler has to deal with a huge variety of different scenarios, so improving it in the average case is tough. (This represents supertask heterogeneity, rather than subtask heterogeneity, so it's a different objection than the one mentioned above.)
Database technology
According to two analyses (full paper for that second one), it seems that improvement in database performance benchmarks has largely been due to Moore's Law.
AI (so far)
Robin Hanson's blog post "AI Progress Estimate" was the best resource I could find on this.
Why smooth exponential growth implies soft takeoff
Let's suppose we consider all of the above, deciding that the exponential model is the best, and we agree with Robin Hanson that there are few deep, chunky, undiscovered AI insights.
Under the straight exponential model, if you recall, we had
where u is the degree of software quality, h is the hardware availability, and r is a parameter representing the difficulty of doing additional upgrades. Our AGI's overall intelligence is given by u * h--the quality of the software times the amount of hardware.
Now we can solve for r by substituting in human intelligence for u * h, and substituting in the rate of human AI progress for du/dt. Another way of saying this is: When the AI is as smart as all the world's AI researchers working together, it will produce new AI insights at the rate that all the world's AI researchers working together produce new insights. At some point our AGI will be just as smart as the world's AI researchers, but we can hardly expect to start seeing super-fast AI progress at that point, because the world's AI researchers haven't produced super-fast AI progress.
Let's assume AGI that's on par with the world AI research community is reached in 2080 (LW's median "singularity" estimate in 2011). We'll pretend AI research has only been going on since 2000, meaning 80 "standard research years" of progress have gone in to the AGI's software. So at the moment our shiny new AGI is fired up, u = 80, and it's doing research at the rate of one "human AGI community research year" per year, so du/dt = 1. That's an effective rate of return on AI software progress of 1 / 80 = 1.3%, giving a software quality doubling time of around 58 years.
You could also apply this kind of thinking to individual AI projects. For example, it's possible that at some point EURISKO was improving itself about as fast as Doug Lenat was improving it. You might be able to do a similar calculation to take a stab at EURISKO's insight level doubling time.
The importance of hardware
According to my model, you double your AGI's intelligence, and thereby the speed with which your AGI improves itself, by doubling the hardware available for your AGI. So if you had an AGI that was interesting, you could make it 4x as smart by giving it 4x the hardware. If an AGI that was 4x as smart could get you 4x as much money (through impressing investors, or playing the stock market, or monopolizing additional industries), that'd be a nice feedback loop. For maximum explosivity, put half your AGI's mind to the task of improving its software, and the other half to the task of making more money with which to buy more hardware.
But it seems pretty straightforward to prevent a non-superintelligent AI from gaining access to additional hardware with careful planning. (Note: One problem with AI boxing experiments thus far is that all of the AIs have been played by human beings. Human beings have innate understanding of human psychology and possess specialized capabilities for running emulations of one another. It seems pretty easy to prevent an AGI from acquiring such understanding. But there may exist box-breaking techniques that don't rely on understanding human psychology. Another note about boxing: FAI requires getting everything perfect, which is a conjunctive calculation. Given multiple safeguards, only one has to work for the box as a whole to work, which is a disjunctive calculation.)
AGI's impact on the economy
Is it possible that the first group to create a successful AGI might begin monopolizing different sections of the economy? Robin Hanson argues that technology insights typically leak between different companies, due to conferences and employee poaching. But we can't be confident these factors would affect the research an AGI does on itself. And if an AGI is still dumb enough that a significant portion of its software upgrades are coming from human researchers, it can hardly be considered superintelligent.
Given what looks like a winner-take-all dynamic, an important factor may be the number of serious AGI competitors. If there are only two, the #1 company may not wish to trade insights with the #2 company for fear of losing its lead. If there are more than two, all but the leading company might ally against the leading company in trading insights. If their alliance is significantly stronger than the leading company, perhaps the leading company would wish to join their alliance.
But if AI is about getting lots of details right, as Hanson suggests, improvements may not even transfer between different AI architectures.
What should we do?
I've argued that soft takeoff is a strong possibility. Should that change our strategy as people concerned with x-risk?
If we are basically screwed in the event that hard takeoff is possible, it may be that preparing for a soft takeoff is a better use of resources on the margin. Shane Legg has proposed that people concerned with friendliness become investors in AGI projects so they can affect the outcome of any that seem to be succeeding.
Concluding thoughts
Expert forecasts are famously unreliable even in the relatively well-understood field of political forecasting. So given the number of unknowns involved in the emergence of smarter-than-human intelligence, it's hard to say much with certainty. Picture a few Greek scholars speculating on the industrial revolution.
I don't have a strong background in these topics, so I fully expect that the above essay will reveal my ignorance, which I'd appreciate your pointing out in the comments. This essay should be taken as at attempt to hack away at the edges, not come to definitive conclusions. As always, I reserve the right to change my mind about anything ;)