Followup to: Cascades, Cycles, Insight, Recursion, Magic
We seem to have a sticking point at the concept of "recursion", so I'll zoom in.
You have a friend who, even though he makes plenty of money, just spends all that money every month. You try to persuade your friend to invest a little - making valiant attempts to explain the wonders of compound interest by pointing to analogous processes in nature, like fission chain reactions.
"All right," says your friend, and buys a ten-year bond for $10,000, with an annual coupon of $500. Then he sits back, satisfied. "There!" he says. "Now I'll have an extra $500 to spend every year, without my needing to do any work! And when the bond comes due, I'll just roll it over, so this can go on indefinitely. Surely, now I'm taking advantage of the power of recursion!"
"Um, no," you say. "That's not exactly what I had in mind when I talked about 'recursion'."
"But I used some of my cumulative money earned, to increase my very earning rate," your friend points out, quite logically. "If that's not 'recursion', what is? My earning power has been 'folded in on itself', just like you talked about!"
"Well," you say, "not exactly. Before, you were earning $100,000 per year, so your cumulative earnings went as 100000 * t. Now, your cumulative earnings are going as 100500 * t. That's not really much of a change. What we want is for your cumulative earnings to go as B * e^At for some constants A and B - to grow exponentially."
"Exponentially!" says your friend, shocked.
"Yes," you say, "recursification has an amazing power to transform growth curves. In this case, it can turn a linear process into an exponential one. But to get that effect, you have to reinvest the coupon payments you get on your bonds - or at least reinvest some of them, instead of just spending them all. And you must be able to do this over and over again. Only then will you get the 'folding in' transformation, so that instead of your cumulative earnings going as y = F(t) = A*t, your earnings will go as the differential equation dy/dt = F(y) = A*y whose solution is y = e^(A*t)."
(I'm going to go ahead and leave out various constants of integration; feel free to add them back in.)
"Hold on," says your friend. "I don't understand the justification for what you just did there."
"Right now," you explain, "you're earning a steady income at your job, and you also have $500/year from the bond you bought. These are just things that go on generating money at a constant rate per unit time, in the background. So your cumulative earnings are the integral of that constant rate. If your earnings are y, then dy/dt = A, which resolves to y = At. But now, suppose that instead of having these constant earning forces operating in the background, we introduce a strong feedback loop from your cumulative earnings to your earning power."
"But I bought this one bond here -" says your friend.
"That's not enough for a strong feedback loop," you say. "Future increases in your cumulative earnings aren't going to increase the value of this one bond, or your salary, any further. One unit of force transmitted back is not a feedback loop - it has to be repeatable. You need a sustained recursion, not a one-off event."
"Okay," says your friend, "how about if I buy a $100 bond every year, then? Will that satisfy the strange requirements of this ritual?"
"Still not a strong feedback loop," you say. "Suppose that next year your salary went up $10,000/year - no, an even simpler example; suppose $10,000 fell in your lap out of the sky. If you only buy $100/year of bonds, that extra $10,000 isn't going to make any long-term difference to the earning curve. But if you're in the habit of investing 50% of found money, then there's a strong feedback loop from your cumulative earnings back to your earning power - we can pump up the cumulative earnings and watch the earning power rise as a direct result."
"How about if I just invest 0.1% of all my earnings, including the coupons on my bonds?" asks your friend.
"Well..." you say slowly. "That would be a sustained feedback loop but an extremely weak one, where marginal changes to your earnings have relatively small marginal effects on future earning power. I guess it would genuinely be a recursified process, but it would take a long time for the effects to become apparent, and any stronger recursions would easily outrun it."
"Okay," says your friend, "I'll start by investing a dollar, and I'll fully reinvest all the earnings from it, and the earnings on those earnings as well -"
"I'm not really sure there are any good investments that will let you invest just a dollar without it being eaten up in transaction costs," you say, "and it might not make a difference to anything on the timescales we have in mind - though there's an old story about a king, and grains of wheat placed on a chessboard... But realistically, a dollar isn't enough to get started."
"All right," says your friend, "suppose I start with $100,000 in bonds, and reinvest 80% of the coupons on those bonds plus rolling over all the principle, at a 5% interest rate, and we ignore inflation for now."
"Then," you reply, "we have the differential equation dy/dt = 0.8 * 0.05 * y, with the initial condition y = $100,000 at t=0, which works out to y = $100,000 * e^(.04*t). Or if you're reinvesting discretely rather than continuously, y = $100,000 * (1.04)^t."
We can similarly view the self-optimizing compiler in this light - it speeds itself up once, but never makes any further improvements, like buying a single bond; it's not a sustained recursion.
And now let us turn our attention to Moore's Law.
I am not a fan of Moore's Law. I think it's a red herring. I don't think you can forecast AI arrival times by using it, I don't think that AI (especially the good kind of AI) depends on Moore's Law continuing. I am agnostic about how long Moore's Law can continue - I simply leave the question to those better qualified, because it doesn't interest me very much...
But for our next simpler illustration of a strong recursification, we shall consider Moore's Law.
Tim Tyler serves us the duty of representing our strawman, repeatedly telling us, "But chip engineers use computers now, so Moore's Law is already recursive!"
To test this, we perform the equivalent of the thought experiment where we drop $10,000 out of the sky - push on the cumulative "wealth", and see what happens to the output rate.
Suppose that Intel's engineers could only work using computers of the sort available in 1998. How much would the next generation of computers be slowed down?
Suppose we gave Intel's engineers computers from 2018, in sealed black boxes (not transmitting any of 2018's knowledge). How much would Moore's Law speed up?
I don't work at Intel, so I can't actually answer those questions. I think, though, that if you said in the first case, "Moore's Law would drop way down, to something like 1998's level of improvement measured linearly in additional transistors per unit time," you would be way off base. And if you said in the second case, "I think Moore's Law would speed up by an order of magnitude, doubling every 1.8 months, until they caught up to the '2018' level," you would be equally way off base.
In both cases, I would expect the actual answer to be "not all that much happens". Seventeen instead of eighteen months, nineteen instead of eighteen months, something like that.
Yes, Intel's engineers have computers on their desks. But the serial speed or per-unit price of computing power is not, so far as I know, the limiting resource that bounds their research velocity. You'd probably have to ask someone at Intel to find out how much of their corporate income they spend on computing clusters / supercomputers, but I would guess it's not much compared to how much they spend on salaries or fab plants.
If anyone from Intel reads this, and wishes to explain to me how it would be unbelievably difficult to do their jobs using computers from ten years earlier, so that Moore's Law would slow to a crawl - then I stand ready to be corrected. But relative to my present state of partial knowledge, I would say that this does not look like a strong feedback loop.
However...
Suppose that the researchers themselves are running as uploads, software on the computer chips produced by their own factories.
Mind you, this is not the tiniest bit realistic. By my standards it's not even a very interesting way of looking at the Singularity, because it does not deal with smarter minds but merely faster ones - it dodges the really difficult and interesting part of the problem.
Just as nine women cannot gestate a baby in one month; just as ten thousand researchers cannot do in one year what a hundred researchers can do in a hundred years; so too, a chimpanzee cannot do four years what a human can do in one year, even though the chimp has around one-fourth the human's cranial capacity. And likewise a chimp cannot do in 100 years what a human does in 95 years, even though they share 95% of our genetic material.
Better-designed minds don't scale the same way as larger minds, and larger minds don't scale the same way as faster minds, any more than faster minds scale the same way as more numerous minds. So the notion of merely faster researchers, in my book, fails to address the interesting part of the "intelligence explosion".
Nonetheless, for the sake of illustrating this matter in a relatively simple case...
Suppose the researchers and engineers themselves - and the rest of the humans on the planet, providing a market for the chips and investment for the factories - are all running on the same computer chips that are the product of these selfsame factories. Suppose also that robotics technology stays on the same curve and provides these researchers with fast manipulators and fast sensors. We also suppose that the technology feeding Moore's Law has not yet hit physical limits. And that, as human brains are already highly parallel, we can speed them up even if Moore's Law is manifesting in increased parallelism instead of faster serial speeds - we suppose the uploads aren't yet being run on a fully parallelized machine, and so their actual serial speed goes up with Moore's Law. Etcetera.
In a fully naive fashion, we just take the economy the way it is today, and run it on the computer chips that the economy itself produces.
In our world where human brains run at constant speed (and eyes and hands work at constant speed), Moore's Law for computing power s is:
s = R(t) = e^t
The function R is the Research curve that relates the amount of Time t passed, to the current Speed of computers s.
To understand what happens when the researchers themselves are running on computers, we simply suppose that R does not relate computing technology to sidereal time - the orbits of the planets, the motion of the stars - but, rather, relates computing technology to the amount of subjective time spent researching it.
Since in our world, subjective time is a linear function of sidereal time, this hypothesis fits exactly the same curve R to observed human history so far.
Our direct measurements of observables do not constrain between the two hypotheses
Moore's Law is exponential in the number of orbits of Mars around the Sun
and
Moore's Law is exponential in the amount of subjective time that researchers spend thinking, and experimenting and building using a proportional amount of sensorimotor bandwidth.
But our prior knowledge of causality may lead us to prefer the second hypothesis.
So to understand what happens when the Intel engineers themselves run on computers (and use robotics) subject to Moore's Law, we recursify and get:
dy/dt = s = R(y) = e^y
Here y is the total amount of elapsed subjective time, which at any given point is increasing according to the computer speed s given by Moore's Law, which is determined by the same function R that describes how Research converts elapsed subjective time into faster computers. Observed human history to date roughly matches the hypothesis that R is exponential with a doubling time of eighteen subjective months (or whatever).
Solving
dy/dt = e^y
yields
y = -ln(C - t)
One observes that this function goes to +infinity at a finite time C.
This is only to be expected, given our assumptions. After eighteen sidereal months, computing speeds double; after another eighteen subjective months, or nine sidereal months, computing speeds double again; etc.
Now, unless the physical universe works in a way that is not only different from the current standard model, but has a different character of physical law than the current standard model, you can't actually do infinite computation in finite time.
Let us suppose that if our biological world had no Singularity, and Intel just kept on running as a company, populated by humans, forever, that Moore's Law would start to run into trouble around 2020. Say, after 2020 there would be a ten-year gap where chips simply stagnated, until the next doubling occurred after a hard-won breakthrough in 2030.
This just says that R(y) is not an indefinite exponential curve. By hypothesis, from subjective years 2020 to 2030, R(y) is flat, corresponding to a constant computer speed s. So dy/dt is constant over this same time period: Total elapsed subjective time y grows at a linear rate, and as y grows, R(y) and computing speeds remain flat, until ten subjective years have passed. So the sidereal bottleneck lasts ten subjective years times the current sidereal/subjective conversion rate at 2020's computing speeds.
In short, the whole scenario behaves exactly like what you would expect - the simple transform really does describe the naive scenario of "drop the economy into the timescale of its own computers".
After subjective year 2030, things pick up again, maybe - there are ultimate physical limits on computation, but they're pretty damned high, and we've got a ways to go until there. But maybe Moore's Law is slowing down - going subexponential, and then as the physical limits are approached, logarithmic, and then simply giving out.
But whatever your beliefs about where Moore's Law ultimately goes, you can just map out the way you would expect the research function R to work as a function of sidereal time in our own world, and then apply the transformation dy/dt = R(y) to get the progress of the uploaded civilization over sidereal time t. (Its progress over subjective time is simply given by R.)
If sensorimotor bandwidth is the critical limiting resource, then we instead care about R&D on fast sensors and fast manipulators. We want R_sm(y) instead R(y), where R_sm is the progress rate of sensors and manipulators, as a function of elapsed sensorimotor time. And then we write dy/dt = R_sm(y) and crank on the equation again to find out what the world looks like from a sidereal perspective.
We can verify that the Moore's Researchers scenario is a strong positive feedback loop by performing the "drop $10,000" thought experiment. Say, we drop in chips from another six doublings down the road - letting the researchers run on those faster chips, while holding constant their state of technological knowledge.
Lo and behold, this drop has a rather large impact, much larger than the impact of giving faster computers to our own biological world's Intel. Subjectively the impact may be unnoticeable - as a citizen, you just see the planets slow down again in the sky. But sidereal growth rates increase by a factor of 64.
So this is indeed deserving of the names, "strong positive feedback loop" and "sustained recursion".
As disclaimed before, all this isn't really going to happen. There would be effects like those Robin Hanson prefers to analyze, from being able to spawn new researchers as the cost of computing power decreased. You might be able to pay more to get researchers twice as fast. Above all, someone's bound to try hacking the uploads for increased intelligence... and then those uploads will hack themselves even further... Not to mention that it's not clear how this civilization cleanly dropped into computer time in the first place.
So no, this is not supposed to be a realistic vision of the future.
But, alongside our earlier parable of compound interest, it is supposed to be an illustration of how strong, sustained recursion has much more drastic effects on the shape of a growth curve, than a one-off case of one thing leading to another thing. Intel's engineers running on computers is not like Intel's engineers using computers.
I'll try to estimate as requested, but substituting fixed computing power for "riding the curve" (as Intel does now) is a bit of an apples to fruit cocktail comparison, so I'm not sure how useful it is. A more direct comparison would be with always having a computing infrastructure from 10 years in the future or past.
Even with this amendment, the (necessary) changes to design, test, and debugging processes make this hard to answer...
I'll think out loud a bit.
Here's the first quick guess I can make that I'm moderately sure of: The length of time to go through a design cycle (including shrinks and transitions to new processes) would scale pretty closely with computing power, keeping the other constraints pretty much constant. (Same designers, same number of bugs acceptable, etc.) So if we assume the power follows Moore's law (probably too simple as others have pointed out) cycles would run hundreds of times faster with computing power from 10 years in the future.
This more or less fits the reality, in that design cycles have stayed about the same length while chips have gotten hundreds of times more complex, and also much faster, both of which soak up computing power.
Probably more computing power would have also allowed faster process evolution (basically meaning smaller feature sizes) but I was never a process designer so I can't really generate a firm opinion on that. A lot of physical experimentation is required and much of that wouldn't go faster. So I'm going to assume very conservatively that the increased or decreased computing power would have no effect on process development.
The number of transistors on a chip is limited by process considerations, so adding computing power doesn't directly enable more complex chips. Leaving the number of devices the same and just cycling the design of chips with more or less the same architecture hundreds of times faster doesn't make much economic sense. Maybe instead Intel would create hundreds of times as many chip designs, but that implies a completely different corporate strategy so I won't pursue that.
In this scenario, experimentation via computing gets hundreds of times "cheaper" than in our world, so it would get used much more heavily. Given these cheap experiments, I'd guess Intel would have adopted much more radical designs.
Examples of more radical approaches would be self-clocked chips, much more internal parallelism (right now only about 1/10 of the devices change state on any clock), chips that directly use more of the quantum properties of the material, chips that work with values other than 0 and 1, direct use of probabilistic computing, etc. In other words, designers would have pushed much further out into the micro-architectural design space, to squeeze more function out of the devices. Some of this (e.g. probabilistic or quantum-enhanced computing) could propagate up to the instruction set level.
(This kind of weird design is exactly what we get when evolutionary search is applied directly to a gate array, which roughly approximates the situation Intel would be in.)
Conversely, if Intel had hundreds of times less computing power, they'd have to be extremely conservative. Designs would have to stay further from any possible timing bugs, new designs would appear much more slowly, they'd probably make the transition to multiple cores much sooner because scaling processor designs to large numbers of transistors would be intractable, there's be less fine grained internal parallelism, etc.
If we assumed that progress in process design was also more or less proportional to computing power available, then in effect we'd just be changing the exponent on the curve; to a first approximation we could assume no qualitative changes in design. However as I say this is a very big "if".
Now however we have to contend with an interesting feedback issue. Suppose we start importing computing from ten years in the future in the mid-1980s. If it speeds everything up proportionally, the curve gets a lot steeper, because that future is getting faster faster than ours. Conversely if Intel had to run on ten year old technology the curve would be a lot flatter.
On the other hand if there is skew between different aspects of the development process (as above with chip design vs. process design) we could go somewhere else entirely. For example if Intel develops some way to use quantum effects in 2000 due to faster simulations from 1985 on, and then that gets imported (in a black box) back to 1990, things could get pretty crazy.
I think that's all for now. Maybe I'll have more later. Further questions welcome.