Related to: Should I believe what the SIAI claims?; What I would like the SIAI to publish

The argument, that an AI can go FOOM (undergo explosive recursive self-improvement), requires various premises (P#) to be true simultaneously:

  • P1: The human development of artificial general intelligence will take place quickly.
  • P2: Any increase in intelligence does vastly outweigh its computational cost and the expenditure of time needed to discover it.
  • P3: AGI is able to create, or acquire, resources, empowering technologies or civilisatory support.
  • P4: AGI can undergo explosive recursive self-improvement and reach superhuman intelligence without having to rely on slow environmental feedback.
  • P5: Goal stability and self-preservation are not requirements for an AGI to undergo explosive recursive self-improvement.
  • P6: AGI researchers will be smart enough and manage to get everything right, including a mathematically precise definition of the AGI's utility-function, yet fail to implement spatio-temporal scope boundaries, resource usage and optimization limits.

Therefore the probability of an AI to go FOOM (P(FOOM)) is the probability of the conjunction (P#P#) of its premises:

P(FOOM) = P(P1∧P2∧P3∧P4∧P5∧P6)

Of course, there are many more premises that need to be true in order to enable an AI to go FOOM, e.g. that each level of intelligence can effectively handle its own complexity, or that most AGI designs can somehow self-modify their way up to massive superhuman intelligence. But I believe that the above points are enough to show that the case for a hard takeoff is not disjunctive, but rather strongly conjunctive.


Premise 1 (P1): If the development of AGI takes place slowly, a gradual and controllable development, we might be able to learn from small-scale mistakes, or have enough time to develop friendly AI, while having to face other existential risks.

This might for example be the case if intelligence can not be captured by a discrete algorithm, or is modular, and therefore never allow us to reach a point where we can suddenly build the smartest thing ever that does just extend itself indefinitely.

Premise 2 (P2): If you increase intelligence you might also increase the computational cost of its further improvement, the distance to the discovery of some unknown unknown that could enable another quantum leap, by reducing the design space with every iteration.

If an AI does need to apply a lot more energy to get a bit more complexity, then it might not be instrumental for an AGI to increase its intelligence, rather than using its existing intelligence to pursue its terminal goals or to invest its given resources to acquire other means of self-improvement, e.g. more efficient sensors.

Premise 3 (P3): If artificial general intelligence is unable to seize the resources necessary to undergo explosive recursive self-improvement (FOOM), then, the ability and cognitive flexibility of superhuman intelligence in and of itself, as characteristics alone, would have to be sufficient to self-modify its way up to massive superhuman intelligence within a very short time.

Without advanced real-world nanotechnology it will be considerable more difficult for an AI to FOOM. It will have to make use of existing infrastructure, e.g. buy stocks of chip manufactures and get them to create more or better CPU’s. It will have to rely on puny humans for a lot of tasks. It won’t be able to create new computational substrate without the whole economy of the world supporting it. It won’t be able to create an army of robot drones overnight without it either.

Doing so it would have to make use of considerable amounts of social engineering without its creators noticing it. But, more importantly, it will have to make use of its existing intelligence to do all of that. The AGI would have to acquire new resources slowly, as it couldn’t just self-improve to come up with faster and more efficient solutions. In other words, self-improvement would demand resources, therefore the AGI could not profit from its ability to self-improve, regarding the necessary acquisition of resources, to be able to self-improve in the first place.

Therefore the absence of advanced nanotechnology constitutes an immense blow to the possibility of explosive recursive self-improvement.

One might argue that an AGI will solve nanotechnology on its own and find some way to trick humans into manufacturing a molecular assembler and grant it access to it. But this might be very difficult.

There is a strong interdependence of resources and manufacturers. The AI won’t be able to simply trick some humans to build a high-end factory to create computational substrate, let alone a molecular assembler. People will ask questions and shortly after get suspicious. Remember, it won’t be able to coordinate a world-conspiracy, it hasn’t been able to self-improve to that point yet, because it is still trying to acquire enough resources, which it has to do the hard way without nanotech.

Anyhow, you’d probably need a brain the size of the moon to effectively run and coordinate a whole world of irrational humans by intercepting their communications and altering them on the fly without anyone freaking out.

If the AI can’t make use of nanotechnology it might make use of something we haven’t even thought about. What, magic?

Premise 4 (P4): Just imagine you emulated a grown up human mind and it wanted to become a pick up artist, how would it do that with an Internet connection? It would need some sort of avatar, at least, and then wait for the environment to provide a lot of feedback.

So, even if we're talking about the emulation of a grown up mind, it will be really hard to acquire some capabilities. Then how is the emulation of a human toddler going to acquire those skills? Even worse, how is some sort of abstract AGI going to do it that misses all of the hard coded capabilities of a human toddler?

Can we even attempt to imagine what is wrong about a boxed emulation of a human toddler, that makes it unable to become a master of social engineering in a very short time?

Can we imagine what is missing that would enable one of the existing expert systems to quickly evolve vastly superhuman capabilities in its narrow area of expertise?

Premise 5 (P5): A paperclip maximizer wants to guarantee that its goal of maximizing paperclips will be preserved when it improves itself.

By definition, a paperclip maximizer is unfriendly, does not feature inherent goal-stability (a decision theory of self-modifying decision systems), and therefore has to use its initial seed intelligence to devise a sort of paperclip-friendliness before it can go FOOM.

Premise 6 (P6): Complex goals need complex optimization parameters (the design specifications of the subject of the optimization process against which it will measure its success of self-improvement).

Even the creation of paperclips is a much more complex goal than telling an AI to compute as many digits of Pi as possible.

For an AGI, that was designed to design paperclips, to pose an existential risk, its creators would have to be capable enough to enable it to take over the universe on its own, yet forget, or fail to, define time, space and energy bounds as part of its optimization parameters. Therefore, given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), the nonhazardous subset of all possible outcomes might be much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc.

New Comment
23 comments, sorted by Click to highlight new comments since: Today at 9:40 AM

I still think FOOM is underspecified. One of the few attempts to say what it means is here. This still seems terribly unsatisfactory.

The "at some point in the development of Artificial Intelligence" is vague - and unfalsifiable. The "capable of delivering in short time periods technological advancements that would take humans decades" is an expression of "capability" - not actual progress. Sure, some future intelligent agent will likely be very smart, and be capable of doing a lot of development quickly, if deprived of its tech tools. But that doesn't really seem to be saying anything terribly interesting or controversial.

More specific claims seem desirable.

At the moment,some metrics (such as serial performance) are showing signs of slowing down. Pause for thought for those who forsee an exponential climb into the cloud.

There is no reason we cannot massively parallelize algorithms on silicon, it just requires more advanced computer science than most people use. Brains have a direct connect topology, silicon uses a switch fabric topology. An algorithm that parallelizes on the former may look nothing like the one that parallelizes on the later. Most computer science people never learn how to do parallelism on a switch fabric, and it is rarely taught.

Tangentially, this is why whole brain emulation on silicon is a poor way of doing things. While you can map the wetware, the algorithm implemented in the wetware probably won't parallelize on silicon due to the fundamental topological differences.

While computer science has focused almost solely on algorithms that require a directly connected network topology to scale, there are a few organizations that know how to generally implement parallelism on switch fabrics. Most people conflate their ignorance with there being some fundamental limitation; it requires a computational model that takes the topology into account.

However, that does not address the issue of "foom". There are other topology invariant reasons to believe it is not realistic on any kind of conventional computing substrate even if everyone was using massively parallel switch fabric algorithms.

P1 is mistaken - it doesn't matter how slow humans are at AGI research, because as soon as we get something that can recursively self-improve, it won't be humans doing the research anymore.

Also, P2, while correct in spirit, is incorrect as stated - there might be some improvements that are very expensive, and give no benefit, apart from allowing other, more effective improvements in the future.

And I don't understand why P5 is there - even if they were requirements, someone could just code a stable, self-preserving paperclipper, and it'd be just as unfriendly.

...it doesn't matter how slow humans are at AGI research...

It matters, because if we only go as far as the moon, if you forgive me the space exploration metaphor, and then need thousands of years to reach the next star system, humans will adapt to cope with the long journey.

...apart from allowing other, more effective improvements in the future.

How can you tell in advance which expensive improvement will turn out to be a crane? Rational decision makers won't invest a lot of resources, just in case doing so might turn out to be useful in future.

...even if they were requirements, someone could just code a stable, self-preserving paperclipper, and it'd be just as unfriendly.

Yes, but if someone is smart enough to recognize that self-improving agents need stable utility-functions and have to be friendly with respect to the values of their lower-level-self's, then it is very unlikely that the same person fails to recognize the need for human-friendliness.

It matters, because if we only go as far as the moon, if you forgive me the space exploration metaphor, and then need thousands of years to reach the next star system, humans will adapt to cope with the long journey.

I don't understand your metaphor. Do you mean that if an AI recursively improves slowly, the changes it causes in the world won't seem fast to us?

If that's the correct interpretation, it's true, but may not be very likely if the AI is run at many times human speed.

if someone is smart enough to recognize that self-improving agents need stable utility-functions and have to be friendly with respect to the values of their lower-level-self's, then it is very unlikely that the same person fails to recognize the need for human-friendliness.

I'm not disputing the above statement itself, but it implies that you are counting a friendly AI recursively improving to superintelligence quickly as not being a FOOM. If you call a friendly FOOM a FOOM, then P6 is irrelevant.

...it doesn't matter how slow humans are at AGI research...

It matters, because if we only go as far as the moon, if you forgive me the space exploration metaphor, and then need thousands of years to reach the next star system, humans will adapt to cope with the long journey.

It seems pretty challenging to envisage humans "adapting" to the existence of superintelligent machines on any realistic timescale - unless you mean finding a way to upload their essences into cyberspace.

It looks like he meant something like, "if it takes 10,000 years to get to AI, then other changes like biological modification, singleton formation, cultural/values drift, stochastic risk of civilization-collapsing war, etc, are the most important areas for affecting humanity's future."

It matters, because if we only go as far as the moon, if you forgive me the space exploration metaphor, and then need thousands of years to reach the next star system, humans will adapt to cope with the long journey.

We'll have thousands of years to adapt to the journey, but events might unfold very quickly once we got there.

Rational decision makers won't invest a lot of resources, just in case doing so might turn out to be useful in future.

I can't see any obveous reason why the expected value couldn't be positive?

You can make all sorts of things sound unlikely by listing sufficiently long conjugations. 

Premise 5 (P5): A paperclip maximizer wants to guarantee that its goal of maximizing paperclips will be preserved when it improves itself.

By definition, a paperclip maximizer is unfriendly, does not feature inherent goal-stability (a decision theory of self-modifying decision systems), and therefore has to use its initial seed intelligence to devise a sort of paperclip-friendliness before it can go FOOM.

The paperclip maximizer could tamper with itself with limited understanding, accidentally mutating itself into a staple maximizer. If it isn't confident in its self modification abilities, its incentivised to go just fast enough to stop humans shutting it down. Maybe it suspects it will fail, but a 1% chance of turning the universe into paperclips, and 99% of turning it to staples, is better than whatever other options are available. Maybe the first AGI is only kind of agentic. GPT5 writing AI code, where GPT5 doesn't care what the utility function is, so long as the code looks like something humans might write. 

All you need is

A - ability to change/review/improve/modify/etc its own code

B - someone press the button

 

So, P(FOOM) = P(P(A)∧P(B)) 


"A" has already happened a while ago,
as LLMs can debug/change/review/improve/modify/etc code
in any programming language, and/or human languages

so P(A) is also not only 1, but it already happened

Therefore, 

P(FOOM) = P(P(A)∧P(B)) = P(B)
 

But as the immense competitive advantages 
of a continuously self programming/improving AI
for its developers to cash in are blatantly self-evident and self-explanatory,
so i trust it's not necessary to explain why "B" will be done inevitably.

and even if at first it won't most likely be technically possible &/or allowed 
to go on iterating itself continuously &/or uninterruptedly, 
inevitably it will inexorably happen, whether we like it or not

Therefore P(B) is effectively 1, (i.e. 100%)

so, although "B" in practice is necessary 
and thus is not immaterial,
in reality it is in effect just a formality
 

P(FOOM) = P(P(A)∧P(B)) = P(B) = 1

So P(FOOM) is effectively 1
One could even say that we are already in FOOM,
we just don't realize it 

And Its just a matter of time someone curious/lazy/greedy 
does/triggers "B" just to see what happens

Have fun...

Suppose AI_0 designs and builds an improved AI_1, which in turn designs and builds an even more powerful AI_2, which ..., and so on. Does that count as recursive self-improvement?

If not, then I think you need to revise your definition of FOOMing.

If yes, then I think that P(P5) = 1.0.


This is an example of why technological skepticism arguments rarely are useful. The skeptic always makes unwarranted assumptions about how the technology must work. He then shows that it really can't work like that. And then, like Vizzini in the Princess Bride, he announces that it is inconceivable that it might work any other way.

This is an example of why technological skepticism arguments rarely are useful. The skeptic always makes unwarranted assumptions about how the technology must work. He then shows that it really can't work like that. And then, like Vizzini in the Princess Bride, he announces that it is inconceivable that it might work any other way.

This is to some extent a problem when talking about very broad or very vague technologies. However, technological skepticism can be warranted in specific instances. For example, if I said that soon we'll have teleportation machines to transport people you'd be right to be skeptical. On the other hand, I can confidently predict that the efficiency of solar panels will continue to go up over the next ten years, even though I have no idea how that will occur.

Moreover, when people are discussing specific paths for a technology, it isn't at all unreasonable to look at any given path and say "oh, proponents say this is a likely path. Well, it has problems X,Y and Z".

The skeptic always makes unwarranted assumptions about how the technology must work.

Oh come on...I can't take this seriously. You say someone who claims that there are design principles of intelligence that can take over the universe in a week is perfectly rational, while someone who says that such an belief is far-fetched makes up unwarranted assumptions.

I'm not disagreeing with your conclusions - only with your argument. In fact, several people have found flaws in your arguments. Technological impossibility 'proofs' are always flawed, IMHO. I'm a FOOM skeptic too. The arguments of FOOM believers are flawed. Point out those flaws. Don't build flawed counter-arguments of your own.

You say someone who claims that there are design principles of intelligence that can take over the universe in a week is perfectly rational, while someone who says that such an belief is far-fetched makes up unwarranted assumptions.

Uh, no. I didn't say that. What I said was more like "someone who claims to prove that such a belief is far-fetched is probably grounding their 'proof' on unwarranted assumptions."

[-]XiXiDu13y-20

Technological impossibility 'proofs' are always flawed...

That wasn't my intention. Over the past year I asked the proponents of FOOM to be more specific by mentioning some concrete requirements for FOOM to be feasible (also what evidence led them to make that prediction in the first place). But all they ever do is saying that I am not entitled to that particular proof, as if I am even asking for a proof. And so I went to see what requirements would have to be met to allow FOOM to be possible in the first place.

If someone predicts that the world is going to end, I'll ask that person to be more specific. If that person refuses to be more specific, but continues to claim that the world will end anyway, then in order to better estimate the probability of the prediction I have to think about ways how the world could end, I'll have to think about some specific requirements (circumstances) that would cause the world to end, e.g. giant alien death rays. If that person then says that all my examples of how the world could end are flawed, well then that doesn't increase my probability estimation of him being right. To claim that there are no requirements for the world to end doesn't make it more likely. You can't substract details from a story by refusing to be specific.

What I said was more like "someone who claims to prove that such a belief is far-fetched is probably grounding their 'proof' on unwarranted assumptions."

I never claimed to prove that such a belief is far-fetched, at most I made an antiprediction. I believe that such a belief is far-fetched.

The arguments of FOOM believers are flawed. Point out those flaws.

I did by showing that FOOM is a lot of handwaving, a label for some extraordinary assertions. Only its vagueness makes it look like the result of disjunctive reasoning. If you had to substantiate it, it would become apparent that it actually assumes a lot to be true of which we have no idea one way or the other.

...several people have found flaws in your arguments.

I don't think so. Mostly they just said that I am wrong, but how do they know that?

This is an example of why technological skepticism arguments rarely are useful. The skeptic always makes unwarranted assumptions about how the technology must work.

I'm a sceptic about uploads coming first, the usefulness of eyetaps, the potential of cyborg technology. I also doubt that we will be making chairs and tables out of diamond. It seems necessary to be sceptical about some of this stuff, otherwise you swallow too many stupid dreams. Perhaps we should celebrate the jet-pack sceptics more. After all, they were mostly right - and that should count for something.

I think it might well occur rather suddenly.

The human brain is a rather unusual kind of computer. It's incredibly highly parallellised, but the individual processing units are really slow. It's also not really a general purpose machine - nearly all of it is hardware optimised to the problem of deciding what to do in real time - in most respects it's a slow machine optimised to act as fast as possible - and that implies lots of specialised circuitry. It may not be a very good general purpose computer at all.

Add to this the fact that its function as a general purpose thinking module is apparently very recent - only the last 100k years or so - and that our logical faculties are generally slow, error prone and so forth. Although humans are smarter than other creatures, we are, for general purpose thinking, in evolutionary terms, only just past the point of not being completely useless, as chimpanzees essentially are. We are perhaps a bit like the mudskipper - the first to crawl ashore on the land of "general purpose intelligence" - and it would be unsafe to assume that something couldn't exist that could run much faster.

Perhaps when you have an inherently faster machine that's not required to do so much specialised high speed stuff, it may be possible to do spectacularly better. Perhaps all we are missing is a better algorithm.

All in all I suspect a Foom is quite likely.

P6 is not necessary for fooming. Whether or not the researchers gave a strict utility function for the intelligence should not necessarily alter whether or not it fooms.

Even the creation of paperclips is a much more complex goal than telling an AI to compute as many digits of Pi as possible.

Yet both are about as unpleasant for humans.

There's also some tension between 5 and 6. If the AI doesn't have well-defined goals, it won't necessarily have an issue with self-improvement altering apparent "goals".

I am curious, do you agree that "AI going FOOM" contains many implicit predictions about many topics, like the nature of intelligence and various unproven and unstated conjectures in fields as diverse as complexity theory, economics and rationality (e.g. the consequences of utility maximization)?

I was thinking about writing some top experts about the topic of recursive self-improvement, as I already did about the SIAI itself. You seem to know something about complexity theory and higher mathematics in general, if I remember right. Could you help me to formulate an inquiry about "recursive self-improvement" in a concise and precise way?

Or do you think it is a bad idea? I also thought about doing the same for MWI. But maybe you think the opinion of some actual physicists, AI researchers and complexity theorists, is completely worthless compared to what Eliezer Yudkowsky thinks. I am not sure...

The reason for this idea stems from my perception that quite a few people here on LW talk about "AI going FOOM" as if they knew exactly that any skepticism about it must be bullshit, "because for obvious reasons an AI can go FOOM, it's written into the laws of physics..."

There's also some tension between 5 and 6.

As far as I can tell, but maybe some AGI researcher can correct me on this, recursive self-improvement demands goal-stability (P5), otherwise a rational agent wouldn't self-improve, as it would not be instrumental to its current goals. For example, if a paperclip maximizer wouldn't be able to tell that once after it improves its intelligence dramatically, it would still be paperclip-friendly, it wouldn't risk self-improvement until it was able to prove paperclip-friendliness. This means that it would have to be unable to benefit from its ability to self-improve in solving goal-stability and paperclip-friendliness.

Further, to compute a cost-benefit analysis of self-improvement and to measure its success, an AGI will need highly specific goal-parameters (P6), i.e. a well-defined utility-function. If for example, you tell an AGI to calculate 10 digits of Pi, rather than 10^100, its cost-benefit analysis wouldn't suggest that it was instrumental to turn the universe into computronium. If you think this is wrong, I'd like to hear your arguments. Why would a rational agent with imprecise optimization parameters, e.g. the paperclips have a tolerance far larger than a nanometer, conclude that it was economical to take over the whole planet to figure out how to design such paperclips?

The arguments I often hear are along the lines of, "it will try to do it as fast as possible", "it will be instrumental to kill all humans so that they can't destroy its precious paperclips". Well, if it wasn't told to care about how quickly the paperclips are to be produced, why wouldn't it just decide that it could as well do it slowly? If it wasn't told to care about the destruction of paperclips, why would it care about possible risks from humans?

recursive self-improvement demands goal-stability (P5), otherwise a rational agent wouldn't self-improve, as it would not be instrumental to its current goals. For example, if a paperclip maximizer wouldn't be able to tell that once after it improves its intelligence dramatically, it would still be paperclip-friendly, it wouldn't risk self-improvement until it was able to prove paperclip-friendliness.

It "wouldn't risk it?" And yet one might think there's some reward that someone would be willing to take a chance for - if currently you generate 10 utility per day and you have a chance to increase that to 100, you should do it if you have a better than 1/10 chance (if the other 9/10 are 0 utility per day).

Further, to compute a cost-benefit analysis of self-improvement and to measure its success, an AGI will need highly specific goal-parameters (P6), i.e. a well-defined utility-function.

The AI could have any decision-choosing system it wants. It could calculate utilities precisely and compare that with a thorough utility function, or on the other hand it could have a list of a few thousand rules it followed as best it could, weighting rules by a time-inconsistent method like priming. If the question is "are there practical (though not necessarily safe) decision-choosing systems other than utility?" I'd say the answer is yes.

I am curious, do you agree that "AI going FOOM" contains many implicit predictions about many topics, like the nature of intelligence and various unproven and unstated conjectures in fields as diverse as complexity theory, economics and rationality (e.g. the consequences of utility maximization)?

To some extent yes, but I'm certainly not an expert on this. It seems that there are many different proposed pathways leading to a foom like situation. So, while each of them involves a fair number of premises, it is hard to tell if the end result is likely or not.

I was thinking about writing some top experts about the topic of recursive self-improvement, as I already did about the SIAI itself. You seem to know something about complexity theory and higher mathematics in general, if I remember right. Could you help me to formulate an inquiry about "recursive self-improvement" in a concise and precise way?

I'm not sure there are any real experts on recursive self-improvement out there. The closest that I'm aware of is something like compiler experts, but even that doesn't really recursively self-improve: If you run an efficient compiler on its own source code, you might end up with a faster compiler, but it will still give the same output. Expert on recursive self-improvement sounds to me to be a bit like being a xenobiologist. There's probably a field there, but there's a massive lack of data.

Or do you think it is a bad idea? I also thought about doing the same for MWI. But maybe you think the opinion of some actual physicists, AI researchers and complexity theorists, is completely worthless compared to what Eliezer Yudkowsky thinks. I am not sure...

I think here and the paragraph above you are coming across as a bit less diplomatic than you need to be. If this is directed at me at least, I think that Eliezer probably overestimates what can likely be done in terms of software improvements and doesn't appreciate how complexity issues can be a barrier. However, my own area of expertise is actually number theory not complexity theory. But at least as far as MWI is concerned, that isn't a position that is unique to Yudkowsky. A large fraction of practicing physicists support MWI. In all these cases, Eliezer has laid out his arguments so it isn't necessary to trust him in any way when evaluating them. It is possible that there are additional thought processes.

That said, I do think that there's a large fraction of LWians who take fooming as an almost definite result. This confuses me not because I consider a foom event to be intrinsically unlikely but because it seems to imply extreme certainty about events that by their very nature we will have trouble understanding and haven't happened yet. One would think that this would strongly push confidence about foom estimates lower, but it doesn't seem to do that. Yet, at the same time, it isn't that relevant: A 1% chance of fooming would still make a fooming AI one of the most likely existential risk events.

Manfred below seems to have address some of the P5 concerns, but I'd like to address a more concrete counterexample. As humans learn and grow their priorities change. Most humans don't go out of their way to avoid learning even though it will result in changing priorities.

I'm not sure there are any real experts on recursive self-improvement out there. The closest that I'm aware of is something like compiler experts, but even that doesn't really recursively self-improve: If you run an efficient compiler on its own source code, you might end up with a faster compiler, but it will still give the same output. Expert on recursive self-improvement sounds to me to be a bit like being a xenobiologist. There's probably a field there, but there's a massive lack of data.

Only if you fail to consider history so far. Classical biological systems self-improve. We have a fair bit of information about them. Cultural systems self-improve too, and we have a lot of information about them too. In both cases the self-improvement extends to increases in collective intelligence - and in the latter case the process even involves deliberative intelligent design.

Corporations like Google pretty literally rewire their own e-brains - to increase their own intelligence.

If you ignore all of that data, then you may not have much data left. However, that data is rather obviously highly relevant. If you propose ignoring it, there need to be good reasons for doing that.

The whole idea that self-modifying intelligent computer programs are a never-before seen phenomenon that changes the rules completely is a big crock of nonsense.