Even when contrarians win, they lose: Jeff Hawkins
Related: Even When Contrarians Win, They Lose
I had long thought that Jeff Hawkins (and the Redwood Center, and Numentia) were pursuing an idea that didn't work, and were continuing to fail to give up for a prolonged period of time. I formed this belief because I had not heard of any impressive results or endorsements of their research. However, I recently read an interview with Andrew Ng, a leading machine learning researcher, in which he credits Jeff Hawkins with publicizing the "one learning algorithm" hypothesis - the idea that most of the cognitive work of the brain is done by one algorithm. Ng says that, as a young researcher, this pushed him into areas that could lead to general AI. He still believes that AGI is far though.
I found out about Hawkins' influence on Ng after reading an old SL4 post by Eliezer and looking for further information about Jeff Hawkins. It seems that the "one learning algorithm" hypothesis was widely known in neuroscience, but not within AI until Hawkins' work. Based on Eliezer's citation of Mountcastle and his known familiarity with cognitive science, it seems that he learned of this hypothesis independently of Hawkins. The "one learning algorithm" hypothesis is important in the context of intelligence explosion forecasting, since hard takeoff is vastly more likely if it is true. I have been told that further evidence for this hypothesis has been found recently, but I don't know the details.
This all fits well with Robin Hanson's model. Hawkins had good evidence that better machine learning should be possible, but the particular approaches that he took didn't perform as well as less biologically-inspired ones, so he's not really recognized today. Deep learning would definitely have happened without him; there were already many people working in the field, and they started to attract attention because of improved performance due to a few tricks and better hardware. At least Ng's career though can be credited to Hawkins.
I've been thinking about Robin's hypothesis a lot recently, since many researchers in AI are starting to think about the impacts of their work (most still only think about the near-term societal impacts rather than thinking about superintelligence though). They recognize that this shift towards thinking about societal impacts is recent, but they have no idea why it is occurring. They know that many people, such as Elon Musk, have been outspoken about AI safety in the media recently, but few have heard of Superintelligence, or attribute the recent change to FHI or MIRI.
I know when the Singularity will occur
More precisely, if we suppose that sometime in the next 30 years, an artificial intelligence will begin bootstrapping its own code and explode into a super-intelligence, I can give you 2.3 bits of further information on when the Singularity will occur.
Between midnight and 5 AM, Pacific Standard Time.
The failure of counter-arguments argument
Suppose you read a convincing-seeming argument by Karl Marx, and get swept up in the beauty of the rhetoric and clarity of the exposition. Or maybe a creationist argument carries you away with its elegance and power. Or maybe you've read Eliezer's take on AI risk, and, again, it seems pretty convincing.
How could you know if these arguments are sound? Ok, you could whack the creationist argument with the scientific method, and Karl Marx with the verdict of history, but what would you do if neither was available (as they aren't available when currently assessing the AI risk argument)? Even if you're pretty smart, there's no guarantee that you haven't missed a subtle logical flaw, a dubious premise or two, or haven't got caught up in the rhetoric.
One thing should make you believe the argument more strongly: and that's if the argument has been repeatedly criticised, and the criticisms have failed to puncture it. Unless you have the time to become an expert yourself, this is the best way to evaluate arguments where evidence isn't available or conclusive. After all, opposite experts presumably know the subject intimately, and are motivated to identify and illuminate the argument's weaknesses.
If counter-arguments seem incisive, pointing out serious flaws, or if the main argument is being continually patched to defend it against criticisms - well, this is strong evidence that main argument is flawed. Conversely, if the counter-arguments continually fail, then this is good evidence that the main argument is sound. Not logical evidence - a failure to find a disproof doesn't establish a proposition - but good Bayesian evidence.
In fact, the failure of counter-arguments is much stronger evidence than whatever is in the argument itself. If you can't find a flaw, that just means you can't find a flaw. If counter-arguments fail, that means many smart and knowledgeable people have thought deeply about the argument - and haven't found a flaw.
And as far as I can tell, critics have constantly failed to counter the AI risk argument. To pick just one example, Holden recently provided a cogent critique of the value of MIRI's focus on AI risk reduction. Eliezer wrote a response to it (I wrote one as well). The core of Eliezer's and my response wasn't anything new; they were mainly a rehash of what had been said before, with a different emphasis.
And most responses to critics of the AI risk argument take this form. Thinking for a short while, one can rephrase essentially the same argument, with a change in emphasis to take down the criticism. After a few examples, it becomes quite easy, a kind of paint-by-numbers process of showing that the ideas the critic has assumed, do not actually make the AI safe.
You may not agree with my assessment of the critiques, but if you do, then you should adjust your belief in AI risk upwards. There's a kind of "conservation of expected evidence" here: if the critiques had succeeded, you'd have reduced the probability of AI risk, so their failure must push you in the opposite direction.
In my opinion, the strength of the AI risk argument derives 30% from the actual argument, and 70% from the failure of counter-arguments. This would be higher, but we haven't yet seen the most prominent people in the AI community take a really good swing at it.
Why AI may not foom
Summary
- There's a decent chance that the intelligence of a self-improving AGI will grow in a relatively smooth exponential or sub-exponential way, not super-exponentially or with large jump discontinuities.
- If this is the case, then an AGI whose effective intelligence matched that of the world's combined AI researchers would make AI progress at the rate they do, taking decades to double its own intelligence.
- The risk that the first successful AGI will quickly monopolize many industries, or quickly hack many of the computers connected to the internet, seems worth worrying about. In either case, the AGI would likely end up using the additional computing power it gained to self-modify so it was superintelligent.
- AI boxing could mitigate both of these risks greatly.
- If hard takeoff could be impossible, it might be best to assume this case and concentrate our resources on ensuring a safe soft takeoff, given that the prospects for a safe hard takeoff look grim.
Takeoff models discussed in the Hanson-Yudkowsky debate
The supercritical nuclear chain reaction model
Yudkowsky alludes to this model repeatedly, starting in this post:
When a uranium atom splits, it releases neutrons - some right away, some after delay while byproducts decay further. Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission. The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission...
It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior. In one sense it does. But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there's one hell of a big difference between k of 0.9994 and k of 1.0006.
I don't like this model much for the following reasons:
- The model doesn't offer much insight in to the time scale over which an AI might self-improve. The "mean generation time" (time necessary for the next "generation" of neutrons to be released) of a nuclear chain reaction is short, and the doubling time for neutron activity in Fermi's experiment was just two minutes, but it hardly seems reasonable to generalize this to self-improving AIs.
- A flurry of insights that either dies out or expands exponentially doesn't seem like a very good description of how human minds work, and I don't think it would describe an AGI well either. Many people report that taking time to think about problems is key to their problem-solving process. It seems likely that an AGI unable to immediately generate insight in to some problem would have a slower and more exhaustive "fallback" search process that would allow it to continue making progress. (Insight could also work via a search process in the first place--over the space of permutations in one's mental model, say.)
The "differential equations folded on themselves" model
This is another model Eliezer alludes to, albeit in a somewhat handwavey fashion:
When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM.
It's not exactly clear to me what the "whole chain of differential equations" is supposed to refer to... there's only one differential equation in the preceding paragraph, and it's a standard exponential (which could be scary or not, depending on the multiplier in the exponent. Rabbit populations and bank account balances both grow exponentially in a way that's slow enough for humans to understand and control.)
Maybe he's referring to the levels he describes here: metacognitive, cognitive, metaknowledge, knowledge, and object. How might we paramaterize this system?
Let's say c is our AGI's cognition ability, dc/dt is the rate of change in our AGI's cognitive ability, m is our AGI's "metaknowledge" (about cognition and metaknowledge), and dm/dt is the rate of change in metaknowledge. What I've got in mind is:
where p and q are constants.
In other words, both change in cognitive ability and change in metaknowledge are each individually directly proportionate to both cognitive ability and metaknowledge.
I don't know much about understanding systems of differential equations, so if you do, please comment! I put the above system in to Wolfram Alpha, but I'm not exactly sure how to interpret the solution provided. In any case, fooling around with this script suggests sudden, extremely sharp takeoff for a variety of different test parameters.
The straight exponential model
To me, the "proportionality thesis" described by David Chalmers in his singularity paper, "increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems", suggests a single differential equation that looks like
where u represents the number of upgrades that have been made to an AGI's source code, and s is some constant. The solution to this differential equation is going to look like
where the constant c1 is determined by our initial conditions.
(In Recursive Self-Improvement, Eliezer calls this a "too-obvious mathematical idiom". I'm inclined to favor it for its obviousness, or at least use it as a jumping-off point for further analysis.)
Under this model, the constant s is pretty important... if u(t) was the amount of money in a bank account, s would be the rate of return it was receiving. The parameter s will effectively determine the "doubling time" of an AGI's intelligence. It matters a lot whether this "doubling time" is on the scale of minutes or years.
So what's going to determine s? Well, if the AGI's hardware is twice as fast, we'd expect it to come up with upgrades twice as fast. If the AGI had twice as much hardware, and it could parallelize the search for upgrades perfectly (which seems like a reasonable approximation to me), we'd expect the same thing. So let's decompose s and make it the product of two parameters: h representing the hardware available to the AGI, and r representing the ease of finding additional improvements. The AGI's intelligence will be on the order of u * h, i.e. the product of the AGI's software quality and hardware capability.
Considerations affecting our choice of model
Diminishing returns
The consideration here is that the initial improvements implemented by an AGI will tend to be those that are especially easy to implement and/or especially fruitful to implement, with subsequent improvements tending to deliver less intelligence bang for the implementation buck. Chalmers calls this "perhaps the most serious structural obstacle" to the proportionality thesis.
To think about this consideration, one could imagine representing a given improvement as a pair of two values (u, d). u represents a factor by which existing performance will be multiplied, e.g. if u is 1.1, then implementing this improvement will improve performance by a factor of 1.1. d represents the cognitive difficulty or amount of intellectual labor to required to implement a given improvement. If d is doubled, then at any given level of intelligence, implementing this improvement will take twice as long (because it will be harder to discover and/or harder to translate in to code).
Now let's imagine ordering our improvements in order from highest to lowest u to d ratio, so we implement those improvements that deliver the greatest bang for the buck first.
Thus ordered, let's imagine separating groups of consecutive improvements in to "tiers". Each tier's worth of improvements, when taken together, will represent the doubling of an AGI's software quality, i.e. the product of the u's in that cluster will be roughly 2. For a steady doubling time, each tier's total difficulty will need sum to approximately twice the difficulty of the tier before it. If tier difficulty tends to more than double, we're likely to see sub-exponential growth. If tier difficulty tends to less than double, we're likely to see super-exponential growth. If a single improvement delivers a more-than-2x improvement, it will span multiple "tiers".
It seems to me that the quality of fruit available at each tier represents a kind of logical uncertainty, similar to asking whether an efficient algorithm exists for some task, and if so, how efficient.
On the this diminishing returns consideration, Chalmers writes:
If anything, 10% increases in intelligence-related capacities are likely to lead all sorts of intellectual breakthroughs, leading to next-generation increases in intelligence that are significantly greater than 10%. Even among humans, relatively small differences in design capacities (say, the difference between Turing and an average human) seem to lead to large differences in the systems that are designed (say, the difference between a computer and nothing of importance).
Eliezer Yudkowsky's objection is similar:
...human intelligence does not require a hundred times as much computing power as chimpanzee intelligence. Human brains are merely three times too large, and our prefrontal cortices six times too large, for a primate with our body size.
Or again: It does not seem to require 1000 times as many genes to build a human brain as to build a chimpanzee brain, even though human brains can build toys that are a thousand times as neat.
Why is this important? Because it shows that with constant optimization pressure from natural selection and no intelligent insight, there were no diminishing returns to a search for better brain designs up to at least the human level. There were probably accelerating returns (with a low acceleration factor). There are no visible speedbumps, so far as I know.
First, hunter-gatherers can't design toys that are a thousand times as neat as the ones chimps design--they aren't programmed with the software modern humans get through the education (some may be unable to count), and educating apes has produced interesting results.
Speaking as someone who's basically clueless about neuroscience, I can think of many different factors that might contribute to intelligence differences within the human race or between humans and other apes:
- Processing speed.
- Cubic centimeters brain hardware devoted to abstract thinking. (Gifted technical thinkers often seem to suffer from poor social intuition--perhaps a result of reallocation of brain hardware from social to technical processing.)
- Average number of connections per neuron within that brain hardware.
- Average neuron density within that brain hardware. This author seems to think that a large part of the human brain's remarkableness comes largely from the fact that it's the largest primate brain, and primate brains maintain the same neuron density when enlarged while other types of brains don't. "If absolute brain size is the best predictor of cognitive abilities in a primate (13), and absolute brain size is proportional to number of neurons across primates (24, 26), our superior cognitive abilities might be accounted for simply by the total number of neurons in our brain, which, based on the similar scaling of neuronal densities in rodents, elephants, and cetaceans, we predict to be the largest of any animal on Earth (28)."
- Propensity to actually use your capacity for deliberate System 2 reasoning. Richard Feynman's second wife on why she divorced him: "He begins working calculus problems in his head as soon as he awakens. He did calculus while driving in his car, while sitting in the living room, and while lying in bed at night." (By the way, does anyone know of research that's been done on getting people to use System 2 more? Seems like it could be really low-hanging fruit for improving intellectual output. Sometimes I wonder if the reason intelligent people tend to like math is because they were reinforced for the behaviour of thinking abstractly as kids (via praise, good grades, etc.) while those not at the top of the class were not so reinforced.)
- Extended neuroplasticity in to "childhood".
- Increased calories to think with due to the invention of cooking.
- And finally, mental algorithms ("software"). Which are probably at least somewhat important.
It seems to me like these factors (or ones like them) may multiply together to produce intelligence, i.e. the "intelligence equation", as it were, could be something like intelligence = processing_speed * cc_abstract_hardware * neuron_density * connections_per_neuron * propensity_for_abstraction * mental_algorithms. If the ancestral environment rewarded intelligence, we should expect all of these characteristics to be selected for, and this could explain the "low acceleration factor" in human intelligence increase. (Increasing your processing speed by a factor of 1.2 does more when you're already pretty smart, so all these sources of intelligence increase would feed in to one another.)
In other words, it's not that clear what relevance the evolution of human intelligence has to the ease and quality of the upgrades at different "tiers" of software improvements, since evolution operates on many non-software factors, but a self-improving AI (properly boxed) can only improve its software.
Bottlenecks
In the Hanson/Yudkowsky debate, Yudkowsky declares Douglas Englebart's plan to radically bootstrap his team's productivity though improving their computer and software tools "insufficiently recursive". I agree with this assessment. Here's my modelling of this phenomenon.
When a programmer makes an improvement to their code, their work of making the improvement requires the completion of many subtasks:
- choosing a feature to add
- reminding themselves of how the relevant part of the code works and loading that information in to their memory
- identifying ways to implement the feature
- evaluating different methods of implementing the feature according to simplicity, efficiency, and correctness
- coding their chosen implementation
- testing their chosen implementation, identifying bugs
- identifying the cause of a given bug
- figuring out how to fix the given bug
Each of those subtasks will consist of further subtasks like poking through their code, staring off in to space, typing, and talking to their rubber duck.
Now the programmer improves their development environment so they can poke through their code slightly faster. But if poking through their code takes up only 5% of their development time, even an extremely large improvement in code-poking abilities is not going to result in an especially large increase in his development speed... in the best case, where code-poking time is reduced to zero, the programmer will only work about 5% faster.
This is a reflection of Amdahl's Law-type thinking. The amount you can gain through speeding something up depends on how much it's slowing you down.
Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically. If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.
Case studies in technological development speed
Moore's Law
It has famously been noted that if the automotive industry had achieved similar improvements in performance [to the semiconductor industry] in the last 30 years, a Rolls-Royce would cost only $40 and could circle the globe eight times on one gallon of gas—with a top speed of 2.4 million miles per hour.
From this McKinsey report. So Moore's Law is an outlier where technological development is concerned. I suspect that making transistors smaller and faster doesn't require finding ways to improve dozens of heterogeneous components. And when you zoom out to view a computer system as a whole, other bottlenecks typically appear.
(It's also worth noting that research budgets in the semiconductor field have also risen greatly in the semiconductor industry since its inception, but obviously not following the same curve that chip speeds have.)
Compiler technology
This paper on "Proebstig's Law" suggests that the end result of all the compiler research done between 1970 or so and 2001 was that a typical integer-intensive program was compiled to run 3.3 times faster, and a typical floating-point-intensive program was compiled to run 8.1 times faster. When it comes to making programs run quickly, it seems that software-level compiler improvements are swamped by hardware-level chip improvements--perhaps because, like an AGI, a compiler has to deal with a huge variety of different scenarios, so improving it in the average case is tough. (This represents supertask heterogeneity, rather than subtask heterogeneity, so it's a different objection than the one mentioned above.)
Database technology
According to two analyses (full paper for that second one), it seems that improvement in database performance benchmarks has largely been due to Moore's Law.
AI (so far)
Robin Hanson's blog post "AI Progress Estimate" was the best resource I could find on this.
Why smooth exponential growth implies soft takeoff
Let's suppose we consider all of the above, deciding that the exponential model is the best, and we agree with Robin Hanson that there are few deep, chunky, undiscovered AI insights.
Under the straight exponential model, if you recall, we had
where u is the degree of software quality, h is the hardware availability, and r is a parameter representing the difficulty of doing additional upgrades. Our AGI's overall intelligence is given by u * h--the quality of the software times the amount of hardware.
Now we can solve for r by substituting in human intelligence for u * h, and substituting in the rate of human AI progress for du/dt. Another way of saying this is: When the AI is as smart as all the world's AI researchers working together, it will produce new AI insights at the rate that all the world's AI researchers working together produce new insights. At some point our AGI will be just as smart as the world's AI researchers, but we can hardly expect to start seeing super-fast AI progress at that point, because the world's AI researchers haven't produced super-fast AI progress.
Let's assume AGI that's on par with the world AI research community is reached in 2080 (LW's median "singularity" estimate in 2011). We'll pretend AI research has only been going on since 2000, meaning 80 "standard research years" of progress have gone in to the AGI's software. So at the moment our shiny new AGI is fired up, u = 80, and it's doing research at the rate of one "human AGI community research year" per year, so du/dt = 1. That's an effective rate of return on AI software progress of 1 / 80 = 1.3%, giving a software quality doubling time of around 58 years.
You could also apply this kind of thinking to individual AI projects. For example, it's possible that at some point EURISKO was improving itself about as fast as Doug Lenat was improving it. You might be able to do a similar calculation to take a stab at EURISKO's insight level doubling time.
The importance of hardware
According to my model, you double your AGI's intelligence, and thereby the speed with which your AGI improves itself, by doubling the hardware available for your AGI. So if you had an AGI that was interesting, you could make it 4x as smart by giving it 4x the hardware. If an AGI that was 4x as smart could get you 4x as much money (through impressing investors, or playing the stock market, or monopolizing additional industries), that'd be a nice feedback loop. For maximum explosivity, put half your AGI's mind to the task of improving its software, and the other half to the task of making more money with which to buy more hardware.
But it seems pretty straightforward to prevent a non-superintelligent AI from gaining access to additional hardware with careful planning. (Note: One problem with AI boxing experiments thus far is that all of the AIs have been played by human beings. Human beings have innate understanding of human psychology and possess specialized capabilities for running emulations of one another. It seems pretty easy to prevent an AGI from acquiring such understanding. But there may exist box-breaking techniques that don't rely on understanding human psychology. Another note about boxing: FAI requires getting everything perfect, which is a conjunctive calculation. Given multiple safeguards, only one has to work for the box as a whole to work, which is a disjunctive calculation.)
AGI's impact on the economy
Is it possible that the first group to create a successful AGI might begin monopolizing different sections of the economy? Robin Hanson argues that technology insights typically leak between different companies, due to conferences and employee poaching. But we can't be confident these factors would affect the research an AGI does on itself. And if an AGI is still dumb enough that a significant portion of its software upgrades are coming from human researchers, it can hardly be considered superintelligent.
Given what looks like a winner-take-all dynamic, an important factor may be the number of serious AGI competitors. If there are only two, the #1 company may not wish to trade insights with the #2 company for fear of losing its lead. If there are more than two, all but the leading company might ally against the leading company in trading insights. If their alliance is significantly stronger than the leading company, perhaps the leading company would wish to join their alliance.
But if AI is about getting lots of details right, as Hanson suggests, improvements may not even transfer between different AI architectures.
What should we do?
I've argued that soft takeoff is a strong possibility. Should that change our strategy as people concerned with x-risk?
If we are basically screwed in the event that hard takeoff is possible, it may be that preparing for a soft takeoff is a better use of resources on the margin. Shane Legg has proposed that people concerned with friendliness become investors in AGI projects so they can affect the outcome of any that seem to be succeeding.
Concluding thoughts
Expert forecasts are famously unreliable even in the relatively well-understood field of political forecasting. So given the number of unknowns involved in the emergence of smarter-than-human intelligence, it's hard to say much with certainty. Picture a few Greek scholars speculating on the industrial revolution.
I don't have a strong background in these topics, so I fully expect that the above essay will reveal my ignorance, which I'd appreciate your pointing out in the comments. This essay should be taken as at attempt to hack away at the edges, not come to definitive conclusions. As always, I reserve the right to change my mind about anything ;)
Limits on self-optimisation
Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.
Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.
Context: FOOM, Friendliness, optimisation processes.
Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.
A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)
For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.
Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.
It seems to follow that you cannot build a fully general proper-optimisation detector.
However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.
Drive-less AIs and experimentation
One of the things I've been thinking about is how to safely explore the nature of intelligence. I'm unconvinced of FOOMing and would rather we didn't avoid AI entirely if we can't solve Yudkowsky style Friendliness. So some method of experimentation is needed to determine how powerful intelligence actually is.
Requirements for AI to go FOOM
Related to: Should I believe what the SIAI claims?; What I would like the SIAI to publish
The argument, that an AI can go FOOM (undergo explosive recursive self-improvement), requires various premises (P#) to be true simultaneously:
- P1: The human development of artificial general intelligence will take place quickly.
- P2: Any increase in intelligence does vastly outweigh its computational cost and the expenditure of time needed to discover it.
- P3: AGI is able to create, or acquire, resources, empowering technologies or civilisatory support.
- P4: AGI can undergo explosive recursive self-improvement and reach superhuman intelligence without having to rely on slow environmental feedback.
- P5: Goal stability and self-preservation are not requirements for an AGI to undergo explosive recursive self-improvement.
- P6: AGI researchers will be smart enough and manage to get everything right, including a mathematically precise definition of the AGI's utility-function, yet fail to implement spatio-temporal scope boundaries, resource usage and optimization limits.
Therefore the probability of an AI to go FOOM (P(FOOM)) is the probability of the conjunction (P#∧P#) of its premises:
P(FOOM) = P(P1∧P2∧P3∧P4∧P5∧P6)
Of course, there are many more premises that need to be true in order to enable an AI to go FOOM, e.g. that each level of intelligence can effectively handle its own complexity, or that most AGI designs can somehow self-modify their way up to massive superhuman intelligence. But I believe that the above points are enough to show that the case for a hard takeoff is not disjunctive, but rather strongly conjunctive.
Premise 1 (P1): If the development of AGI takes place slowly, a gradual and controllable development, we might be able to learn from small-scale mistakes, or have enough time to develop friendly AI, while having to face other existential risks.
This might for example be the case if intelligence can not be captured by a discrete algorithm, or is modular, and therefore never allow us to reach a point where we can suddenly build the smartest thing ever that does just extend itself indefinitely.
Premise 2 (P2): If you increase intelligence you might also increase the computational cost of its further improvement, the distance to the discovery of some unknown unknown that could enable another quantum leap, by reducing the design space with every iteration.
If an AI does need to apply a lot more energy to get a bit more complexity, then it might not be instrumental for an AGI to increase its intelligence, rather than using its existing intelligence to pursue its terminal goals or to invest its given resources to acquire other means of self-improvement, e.g. more efficient sensors.
Premise 3 (P3): If artificial general intelligence is unable to seize the resources necessary to undergo explosive recursive self-improvement (FOOM), then, the ability and cognitive flexibility of superhuman intelligence in and of itself, as characteristics alone, would have to be sufficient to self-modify its way up to massive superhuman intelligence within a very short time.
Without advanced real-world nanotechnology it will be considerable more difficult for an AI to FOOM. It will have to make use of existing infrastructure, e.g. buy stocks of chip manufactures and get them to create more or better CPU’s. It will have to rely on puny humans for a lot of tasks. It won’t be able to create new computational substrate without the whole economy of the world supporting it. It won’t be able to create an army of robot drones overnight without it either.
Doing so it would have to make use of considerable amounts of social engineering without its creators noticing it. But, more importantly, it will have to make use of its existing intelligence to do all of that. The AGI would have to acquire new resources slowly, as it couldn’t just self-improve to come up with faster and more efficient solutions. In other words, self-improvement would demand resources, therefore the AGI could not profit from its ability to self-improve, regarding the necessary acquisition of resources, to be able to self-improve in the first place.
Therefore the absence of advanced nanotechnology constitutes an immense blow to the possibility of explosive recursive self-improvement.
One might argue that an AGI will solve nanotechnology on its own and find some way to trick humans into manufacturing a molecular assembler and grant it access to it. But this might be very difficult.
There is a strong interdependence of resources and manufacturers. The AI won’t be able to simply trick some humans to build a high-end factory to create computational substrate, let alone a molecular assembler. People will ask questions and shortly after get suspicious. Remember, it won’t be able to coordinate a world-conspiracy, it hasn’t been able to self-improve to that point yet, because it is still trying to acquire enough resources, which it has to do the hard way without nanotech.
Anyhow, you’d probably need a brain the size of the moon to effectively run and coordinate a whole world of irrational humans by intercepting their communications and altering them on the fly without anyone freaking out.
If the AI can’t make use of nanotechnology it might make use of something we haven’t even thought about. What, magic?
Premise 4 (P4): Just imagine you emulated a grown up human mind and it wanted to become a pick up artist, how would it do that with an Internet connection? It would need some sort of avatar, at least, and then wait for the environment to provide a lot of feedback.
So, even if we're talking about the emulation of a grown up mind, it will be really hard to acquire some capabilities. Then how is the emulation of a human toddler going to acquire those skills? Even worse, how is some sort of abstract AGI going to do it that misses all of the hard coded capabilities of a human toddler?
Can we even attempt to imagine what is wrong about a boxed emulation of a human toddler, that makes it unable to become a master of social engineering in a very short time?
Can we imagine what is missing that would enable one of the existing expert systems to quickly evolve vastly superhuman capabilities in its narrow area of expertise?
Premise 5 (P5): A paperclip maximizer wants to guarantee that its goal of maximizing paperclips will be preserved when it improves itself.
By definition, a paperclip maximizer is unfriendly, does not feature inherent goal-stability (a decision theory of self-modifying decision systems), and therefore has to use its initial seed intelligence to devise a sort of paperclip-friendliness before it can go FOOM.
Premise 6 (P6): Complex goals need complex optimization parameters (the design specifications of the subject of the optimization process against which it will measure its success of self-improvement).
Even the creation of paperclips is a much more complex goal than telling an AI to compute as many digits of Pi as possible.
For an AGI, that was designed to design paperclips, to pose an existential risk, its creators would have to be capable enough to enable it to take over the universe on its own, yet forget, or fail to, define time, space and energy bounds as part of its optimization parameters. Therefore, given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), the nonhazardous subset of all possible outcomes might be much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc.
[Altruist Support] LW Go Foom
In which I worry that the Less Wrong project might go horribly right. This post belongs to my Altruist Support sequence.
Every project needs a risk assessment.
There's a feeling, just bubbling under the surface here at Less Wrong, that we're just playing at rationality. It's rationality kindergarten. The problem has been expressed in various ways:
- not a whole lot of rationality
- rationalist porn for daydreamers
- not quite as great as everyone seems to think
- shiny distraction
- only good for certain goals
And people are starting to look at fixing it. I'm not worried that their attempts - and mine - will fail. At least we'd have fun and learn something.
I'm worried that they will succeed.
What would such a Super Less Wrong community do? Its members would self-improve to the point where they had a good chance of succeeding at most things they put their mind to. They would recruit new rationalists and then optimize that recruitment process, until the community got big. They would develop methods for rapidly generating, classifying and evaluating ideas, so that the only ideas that got tried would be the best that anyone had come up with so far. The group would structure itself so that people's basic social drives - such as their desire for status - worked in the interests of the group rather than against it.
It would be pretty formidable.
What would the products of such a community be? There would probably be a self-help book that works. There would be an effective, practical guide to setting up effective communities. There would be an intuitive, practical guide to human behavior. There would be books, seminars and classes on how to really achieve your goals - and only the materials which actually got results would be kept. There would be a bunch of stuff on the Dark Arts too, no doubt. Possibly some AI research.
That's a whole lot of material that we wouldn't want to get into the hands of the wrong people.
Dangers include:
- Half-rationalists: people who pick up on enough memes to be really dangerous, but not on enough to realise that what they're doing might be foolish. For example, building an AI without adding the friendliness features.
- Rationalists with bad goals: Someone could rationally set about trying to destroy humanity, just for the lulz.
- Dangerous information discovered: e.g. the rationalist community develops a Theory of Everything that reveals a recipe for a physics disaster (e.g. a cheap way to turn the Earth into a block hole). A non-rationalist decides to exploit this.
If this is a problem we should take seriously, what are some possible strategies for dealing with it?
- Just go ahead and ignore the issue.
- The Bayesian Conspiracy: only those who can be trusted are allowed access to the secret knowledge.
- The Good Word: mix in rationalist ideas with do-good and stay-safe ideas, to the extent that they can't be easily separated. The idea being that anyone who understands rationality will also understand that it must be used for good.
- Rationality cap: we develop enough rationality to achieve our goals (e.g. friendly AI) but deliberately stop short of developing the ideas too far.
- Play at rationality: create a community which appears rational enough to distract people who are that way inclined, but which does not dramatically increase their personal effectiveness.
- Risk management: accept that each new idea has a potential payoff (in terms of helping us avoid existential threats) and a potential cost (in terms of helping "bad rationalists"). Implement the ideas which come out positive.
In the post title, I have suggested an analogy with AI takeoff. That's not entirely fair; there is probably an upper bound to how effective a community of humans can be, at least until brain implants come along. We're probably talking two orders of magnitude rather than ten. But given that humanity already has technology with slight existential threat implications (nuclear weapons, rudimentary AI research), I would be worried about a movement that aims to make all of humanity more effective at everything they do.
Ranking the "competition" based on optimization power
Most long term users on Less Wrong understand the concept of optimization power and how a system can be called intelligent if it can restrict the future in significant ways. Now I believe that in this world, only institutions are close to superintelligence in any significant way.
I believe it is important for us to have atleast some outside idea of which institutions/systems are powerful in today's world so that we can atleast see some outlines of how the increasing optimization power will end up affecting normal people.
So, my question is - what are the present institutions or systems that you would classify as having the maximum optimization power. Please present your thought behind it if you feel you are mentioning some unknown institution. I am presenting my guesses after the break.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)