or: Why our universe has already had its one and only foom
In the late 1980s, I added half a megabyte of RAM to my Amiga 500. A few months ago, I added 2048 megabytes of RAM to my Dell PC. The later upgrade was four thousand times larger, yet subjectively they felt about the same, and in practice they conferred about the same benefit. Why? Because each was a factor of two increase, and it is a general rule that each doubling tends to bring about the same increase in capability.
That's a pretty important rule, so let's test it by looking at some more examples.
How does the performance of a chess program vary with the amount of computing power you can apply to the task? The answer is that each doubling of computing power adds roughly the same number of ELO rating points. The curve must flatten off eventually (after all, the computation required to fully solve chess is finite, albeit large), yet it remains surprisingly constant over a surprisingly wide range.
Is that idiosyncratic to chess? Let's look at Go, a more difficult game that must be solved by different methods, where the alpha-beta minimax algorithm that served chess so well, breaks down. For a long time, the curve of capability also broke down: in the 90s and early 00s, the strongest Go programs were based on hand coded knowledge such that some of them literally did not know what to do with extra computing power; additional CPU speed resulted in zero improvement.
The breakthrough came in the second half of last decade, with Monte Carlo tree search algorithms. It wasn't just that they provided a performance improvement, it was that they were scalable. Computer Go is now on the same curve of capability as computer chess: whether measured on the ELO or the kyu/dan scale, each doubling of power gives a roughly constant rating improvement.
Where do these doublings come from? Moore's Law is driven by improvements in a number of technologies, one of which is chip design. Each generation of computers is used, among other things, to design the next generation. Each generation needs twice the computing power of the last generation to design in a given amount of time.
Looking away from computers to one of the other big success stories of 20th-century technology, space travel, from Goddard's first crude liquid fuel rockets, to the V2, to Sputnik, to the half a million people who worked on Apollo, we again find that successive qualitative improvements in capability required order of magnitude after order of magnitude increase in the energy a rocket could deliver to its payload, with corresponding increases in the labor input.
What about the nuclear bomb? Surely that at least was discontinuous?
At the simplest physical level it was: nuclear explosives have six orders of magnitude more energy density than chemical explosives. But what about the effects? Those are what we care about, after all.
The death tolls from the bombings of Hiroshima and Nagasaki have been estimated respectively at 90,000-166,000 and 60,000-80,000. That from the firebombing of Hamburg in 1943 has been estimated at 42,600; that from the firebombing of Tokyo on the 10th of March 1945 alone has been estimated at over 100,000. So the actual effects were in the same league as other major bombing raids of World War II. To be sure, the destruction was now being carried out with single bombs, but what of it? The production of those bombs took the labor of 130,000 people, the industrial infrastructure of the worlds most powerful nation, and $2 billion of investment in 1945 dollars, nor did even that investment at that time gain the US the ability to produce additional nuclear weapons in large numbers at short notice. The construction of the massive nuclear arsenals of the later Cold War took additional decades.
(To digress for a moment from the curve of capability itself, we may also note that destructive power, unlike constructive power, is purely relative. The death toll from the Mongol sack of Baghdad in 1258 was several hundred thousand; the total from the Mongol invasions was several tens of millions. The raw numbers, of course, do not fully capture the effect on a world whose population was much smaller than today's.)
Does the same pattern apply to software as hardware? Indeed it does. There's a significant difference between the capability of a program you can write in one day versus two days. On a larger scale, there's a significant difference between the capability of a program you can write in one year versus two years. But there is no significant difference between the capability of a program you can write in 365 days versus 366 days. Looking away from programming to the task of writing an essay or a short story, a textbook or a novel, the rule holds true: each significant increase in capability requires a doubling, not a mere linear addition. And if we look at pure science, continued progress over the last few centuries has been driven by exponentially greater inputs both in number of trained human minds applied and in the capabilities of the tools used.
If this is such a general law, should it not apply outside human endeavor? Indeed it does. From protozoa which pack a minimal learning mechanism into a single cell, to C. elegans with hundreds of neurons, to insects with thousands, to vertebrates with millions and then billions, each increase in capability takes an exponential increase in brain size, not the mere addition of a constant number of neurons.
But, some readers are probably thinking at this point, what about...
... what about the elephant at the dining table? The one exception that so spectacularly broke the law?
Over the last five or six million years, our lineage upgraded computing power (brain size) by about a factor of three, and upgraded firmware to an extent that is unknown but was surely more like a percentage than an order of magnitude. The result was not a corresponding improvement in capability. It was a jump from almost no to fully general symbolic intelligence, which took us from a small niche to mastery of the world. How? Why?
To answer that question, consider what an extraordinary thing is a chimpanzee. In raw computing power, it leaves our greatest supercomputers in the dust; in perception, motor control, spatial and social reasoning, it has performance our engineers can only dream about. Yet even chimpanzees trained in sign language cannot parse a sentence as well as the Infocom text adventures that ran on the Commodore 64. They are incapable of arithmetic that would be trivial with an abacus let alone an early pocket calculator.
The solution to the paradox is that a chimpanzee could make an almost discontinuous jump to human level intelligence because it wasn't developing across the board. It was filling in a missing capability - symbolic intelligence - in an otherwise already very highly developed system. In other words, its starting point was staggeringly lopsided.
(Is there an explanation why this state of affairs came about in the first place? I think there is - in a nutshell, most conscious observers should expect to live in a universe where it happens exactly once - but that would require a digression into philosophy and anthropic reasoning, so it really belongs in another post; let me know if there's interest, and I'll have a go at writing that post.)
Can such a thing happen again? In particular, is it possible for AI to go foom the way humanity did?
If such lopsidedness were to repeat itself... well even then, the answer is probably no. After all, an essential part of what we mean by foom in the first place - why it's so scarily attractive - is that it involves a small group accelerating in power away from the rest of the world. But the reason why that happened in human evolution is that genetic innovations mostly don't transfer across species. The dolphins couldn't say hey, these apes are on to something, let's snarf the code for this symbolic intelligence thing, oh and the hands too, we're going to need manipulators for the toolmaking application, or maybe octopus tentacles would work better in the marine environment. Human engineers carry out exactly this sort of technology transfer on a routine basis.
But it doesn't matter, because the lopsidedness is not occurring. Obviously computer technology hasn't lagged in symbol processing - quite the contrary. Nor has it really lagged in areas like vision and pattern matching - a lot of work has gone into those, and our best efforts aren't clearly worse than would be expected given the available development effort and computing power. And some of us are making progress on actually developing AGI - very slow, as would be expected if the theory outlined here is correct, but progress nonetheless.
The only way to create the conditions for any sort of foom would be to shun a key area completely for a long time, so that ultimately it could be rapidly plugged into a system that is very highly developed in other ways. Hitherto no such shunning has occurred: every even slightly promising path has had people working on it. I advocate continuing to make progress across the board as rapidly as possible, because every year that drips away may be an irreplaceable loss; but if you believe there is a potential threat from unfriendly AI, then such continued progress becomes the one reliable safeguard.
Let's assume that you contemplate the possibility of an outcome Z. Now you come across a discussion between agent A and agent B discussing the prediction that Z is true. If agent B does proclaim the argument X in favor of Z being true and you believe that X is not convincing then this still gives you new information about agent B and the likelihood of Z being true. You might now conclude that Z is slightly more likely to be true because of additional information in favor of Z and the confidence of agent B necessary to proclaim that Z is true. Agent A does however proclaim argument Y in favor of Z being false and you believe that Y is equally unconvincing than argument X in favor of Z being true. You might now conclude again that the truth-value of Z is ~unknown as each argument and the confidence of its facilitator ~outweigh each other.
Therefore no information is irrelevant if it is the only information about any given outcome in question. Your judgement might weigh less than the confidence of an agent with possible unknown additional substantiation in favor of its argument. If you are unable to judge the truth-value of an exclusive disjunction then that any given argument about it is not compelling does tell more about you than the agent that does proclaim it.
Any argument alone has to be taken into account, if only due to its logical consequence. Every argument should be incorporated into your probability estimations for that it signals a certain confidence (for that it is proclaimed at all) of the agent that is uttering it. Yet if there exists a counterargument that is inverse to the original argument you'll have to take that into account as well. This counterargument might very well outweigh the original argument. Therefore there are no arguments that lack the power to convince, however small, yet arguments can outweigh and trump each other.
ETA: Fixed the logic, thanks Vladimir_Nesov.
B believes that X argues for Z, but you might well believe that X argues against Z. (You are considering a model of a public debate, while this comment was more about principles for an argument between two people.)
Also, it's strange that you are con... (read more)