Disclaimer

I wrote this three months ago, and abandoned it[1]. I currently do not plan to return to it anytime soon, but I have nonetheless found myself wanting of a document I could point to for why I'm not particularly enthused by or overly concerned with "recursive self improvement". As such I publish anyway; I apologise for not being a more competent writer.


Epistemic Status

Unconfident on some object level details; I don't have a technical grasp of machine learning[2].

 

Author's Note

Despite appearances, this is not actually a rhetorical question — phrasing this as a question post was a deliberate decision — do please provide answers to the questions I posited.

The footnotes are unusually extensive in this post; I'd recommend the curious reader not skip them[3].

 

 

Abstract

The main questions this post posits:

  1. Is "seed AI flavoured recursive self-improvement" applicable to the deep learning paradigm?
    • My current position: largely "no"
  2. In light of the answer to the above, should we revise down our estimates of the likelihood (and magnitude[4]) of fast/hard takeoff[5]?
    • My current position: largely "yes"

Introduction

Back in the day, Yudkowsky formulated recursive self improvement (RSI) as a viable[6] path to foom, under the paradigm of "seed AI". From the wiki[7] (emphasis mine):

Recursive self-improvement and AI takeoff

Recursively self-improving AI is considered to be the push behind the intelligence explosion. While any sufficiently intelligent AI will be able to improve itself, Seed AIs are specifically designed to use recursive self-improvement as their primary method of gaining intelligence. Architectures that had not been designed with this goal in mind, such as neural networks or large "hand-coded" projects like Cyc, would have a harder time self-improving.

Eliezer Yudkowsky argues that a recursively self-improvement AI seems likely to deliver a hard AI takeoff – a fast, abruptly, local increase in capability - since the exponential increase in intelligence would yield an exponential return in benefits and resources that would feed even more returns in the next step, and so on. In his view a soft takeoff scenario seems unlikely: "it should either flatline or blow up. You would need exactly the right law of diminishing returns to fly through the extremely narrow soft takeoff keyhole."1.

 

Reading "Hard Takeoff", it seems that Yudkowsky's belief that a soft takeoff is unlikely is heavily dependent on his belief in recursive self-improvement being a component on the critical path to transformative AI:

Recursive self-improvement - an AI rewriting its own cognitive algorithms - identifies the object level of the AI with a force acting on the metacognitive level; it "closes the loop" or "folds the graph in on itself".  E.g. the difference between returns on a constant investment in a bond, and reinvesting the returns into purchasing further bonds, is the difference between the equations y = f(t) = m*t, and dy/dt = f(y) = m*y whose solution is the compound interest exponential, y = e^(m*t).

When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM.  An exactly right law of diminishing returns that lets the system fly through the soft takeoff keyhole is unlikely - far more unlikely than seeing such behavior in a system with a roughly-constant underlying optimizer, like evolution improving brains, or human brains improving technology.  Our present life is no good indicator of things to come.

 

I get the impression that Yudkowsky was imagining programming a seed AI that could then look upon its own source code with greater cognitive prowess than its programmers, identify obvious flaws, inefficiencies or suboptimal features in its source code and "quickly"[8][9] rewrite itself to greater capabilities (what I'm calling "seed AI flavoured recursive self-improvement"). From "Recursive Self-Improvement" (bolded mine):

Now suppose that instead you hand the AI the problem, "Write a better algorithm than X for storing, associating to, and retrieving memories".  At first glance this may appear to be just another object-level problem that the AI solves using its current knowledge, metaknowledge, and cognitive algorithms.  And indeed, in one sense it should be just another object-level problem.  But it so happens that the AI itself uses algorithm X to store associative memories, so if the AI can improve on this algorithm, it can rewrite its code to use the new algorithm X+1.

This means that the AI's metacognitive level - the optimization process responsible for structuring the AI's cognitive algorithms in the first place - has now collapsed to identity with the AI's object level.

...

Recursion that can rewrite the cognitive level[10] is worth distinguishing.

 

The successor AI so produced would be even smarter and better able to make algorithmic/architectural improvements to its source code; this process recurses, leading to an intelligence explosion.

I don't necessarily think this was a well-founded position even within the seed AI paradigm[11][12][13], but whether that was the case is largely behind the point of this question.

I'm mostly curious if RSI is still a plausible hypothesis for deep learning systems.


Challenges

On the Feasibility of Recursive Self Improvement

Trillions to quadrillions of parameter models trained at the cost of tens of millions to billions of dollars do not seem particularly amenable to the kind of RSI Yudkowsky envisioned back in the day. They don't meet any of the criteria for "Seed AI":

Properties

A Seed AI has abilities that previous approaches lack:

  • Understanding its own source code. It must understand the purpose, syntax and architecture of its own programming. This type of self-reflection enables the AGI to comprehend its utility and thus preserve it.
  • Rewriting its own source code. The AGI must be able to overhaul the very code it uses to fulfill its utility. A critical consideration is that it must remain stable under modifications, preserving its original goals.

This combination of abilities would, in theory, allow an AGI to recursively improve itself by becoming smarter within its original purpose. A Gödel machine rigorously defines a specification for such an AGI.

 

Our most powerful models aren't designed but selected for via "search like" processes. They aren't by default well factored or particularly modular[14]. Even with advanced interpretability tools/techniques, it seems like it would be hard to make the kind of modifications/improvements that are possible in well written software programs.

An AI inspecting its mind, identifying flaws/inefficiencies and making nontrivial algorithmic/architectural improvements seems to not be particularly feasible under the deep learning paradigm.

Positive feedback loops of AI development are probable, but they look more like:

In other words, we get recursive self improvement at home. 😉

 

I'm under the impression that while training more capable successor systems is feasible, doing so would incur considerable economic costs (and thus serve as a taut constraint to the "speed" of takeoff via such feedback loops)[17][18]. It does not seem to necessarily be the case — or even particularly likely — that the above kind of positive feedback loops will lead to discontinuous AI progress.

I think it may be possible that deep learning AGI would eventually be able to look upon its own mind and factorise it[19], but it seems like that level of cognitive capabilities would come well after AI has become existentially dangerous (or otherwise transformative)[20]. I currently do not expect seed AI style RSI to be a component on the critical path to existential catastrophe/"the singularity". 

 

On the Viability of Recursive Self Improvement

My current belief is that even if seed AI flavoured recursive self improvement was viable, the gains that can be eked out that way are not as radical as imagined.

To summarise my position:

Intelligence has high "intrinsic complexity": returns to cognitive investment scale (strongly?) sublinearly with investment of computational resources, and this is a fact of computer science/mathematics/optimisation. You can't self improve to cognitive algorithms that attain linear or superlinear returns on cognition because no such algorithms exist (the same way no amount of algorithmic innovation would deliver a comparison based sorting algorithm that has better worst case complexity than ; no such algorithm exist.

 

My reasons for this belief are largely:

 

It does not seem to me like recursive self-improvement is relevant in the deep learning paradigm.


Implications

If the aforementioned objections are correct, then insomuch as one's intuitions around foom were rooted in some expectation of recursive self-improvement and insomuch as one believes that the first AGIs will be created within the deep learning paradigm[23] then the inapplicability of RSI to deep learning should update people significantly downwards on the likelihood of hard takeoff/foom[24]

However, that does not seem to have been the case. I get the sense that people are no longer necessarily expecting seed AI flavoured recursive self-improvement, but basically still posit very high confidence in foom[25]. This is prima facie pretty surprising/suspicious as RSI/seed AI was posited as the main reason to expect a hard takeoff. From the wiki page on recursive self-improvement (emphasis mine):

Recursive self-improvement and AI takeoff

Recursively self-improving AI is considered to be the push behind the intelligence explosion.

From "Hard Takeoff" (emphasis mine):

RSI is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor.  The advent of human intelligence was a discontinuity with the past even without RSI...

 

Yudkowsky, talking about a situation in which your goals changed but your beliefs about the steps to take did not said (underlines mine):

Let me guess: Yes, you admit that you originally decided you wanted to buy a million-dollar laptop by thinking, “Ooh, shiny.” Yes, you concede that this isn’t a decision process consonant with your stated goals. But since then, you’ve decided that you really ought to spend your money in such fashion as to provide laptops to as many laptopless wretches as possible. And yet you just couldn’t find any more efficient way to do this than buying a million-dollar diamond-studded laptop—because, hey, you’re giving money to a laptop store and stimulating the economy! Can’t beat that!

My friend, I am damned suspicious of this amazing coincidence. I am damned suspicious that the best answer under this lovely, rational, altruistic criterion X, is also the idea that just happened to originally pop out of the unrelated indefensible process Y. If you don’t think that rolling dice would have been likely to produce the correct answer, then how likely is it to pop out of any other irrational cognition?

It’s improbable that you used mistaken reasoning, yet made no mistakes.

I feel like the update from deep learning hasn't fully propagated through the belief pools of many in the LW sphere. I think Yudkowsky (and those influenced by his views) should be significantly less confident in hard takeoff, and give more consideration to softer takeoff scenarios; otherwise feels like too much epistemic inertia[26][27].

 

I think there are other avenues for hard takeoff that don't hinge so strongly on RSI (e.g. hardware/content overhang[28][29][30][31]), but they also seem to be somewhat weakened by the deep learning paradigm[32][33][34] (perhaps especially so if scaling maximalism is true)[35][36][37][38]. That said, my broader scepticism of foom deserves its own top level post.

I am also not persuaded by the justification for foom credulity based on AlphaGo. I don't think AlphaGo is necessarily as strong an indicator for foom as suggested: AlphaGo was able to blow past human performance in the narrow field of Go via 3 days of self-play; it does not seem that general competence in rich domains is similarly amenable to self-play[39][40].


Acknowledgements

I'm grateful to @beren, @janus, @tailcalled, @jessicata, @Alex Vermillion, @clockworklady and others for their valuable feedback on drafts of this post.

  1. ^

    I added a new section (and some footnotes) to this post before publishing, but I did not otherwise bother to extensively review the post.

  2. ^

    I have solicited feedback from people more knowledgeable on ML than myself and on reflection expect the main theses of this post to be directionally correct.

    Nonetheless, I flag statements I am noticeably not confident in.

  3. ^

    Many footnotes are more like appendices providing crucial context/elaboration on particular points and not merely authorial side comments such as this.

    I also added in relevant commentary from reviewers who are more technically grounded in relevant aspects than myself.

  4. ^

    How steep/rapid is it? Alternatively, just how big a discontinuity does the takeoff represent?

  5. ^

    It's not particularly clear what people mean by "hard"/"fast" takeoff. From the taxonomy Barnett drew, I use "hard"/"fast" takeoff to refer to (a more generous version) of Yudkowsky/Bostrom's formulations of the term. I.e. local takeoff that unravels over a timespan too short for humans to react (tentatively hours to a few weeks).

    [Where "takeoff" can be understood as the time for AI systems to transition from a par human regime to a strongly superhuman regime.]

  6. ^

    "A viable" may be an understatement; my impression/cached thoughts is that RSI was presented as the most load bearing component on the path to hard takeoff. E.g. from "Hard Takeoff" (emphasis mine):

    RSI is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor.  The advent of human intelligence was a discontinuity with the past even without RSI...

    Or from "Recursive Self-Improvement":

    When you first build an AI, it's a baby - if it had to improve itself, it would almost immediately flatline.  So you push it along using your own cognition, metaknowledge, and knowledge - not getting any benefit of recursion in doing so, just the usual human idiom of knowledge feeding upon itself and insights cascading into insights.  Eventually the AI becomes sophisticated enough to start improving itself, not just small improvements, but improvements large enough to cascade into other improvements.  (Though right now, due to lack of human insight, what happens when modern researchers push on their AGI design is mainly nothing.)  And then you get what I. J. Good called an "intelligence explosion".

  7. ^

    I think the LW Wiki is a pretty accurate representation of the community's consensus at the time.

  8. ^

    The main difference between "seed AI flavoured recursive self-improvement" and the kind of self-improvement that manifests in the deep learning paradigm (covered later) is probably the resource cost (time, computational and economic resources) to each iteration of the feedback process.

    Having to spend ever greater economic and computational resources (growing at a superlinear rate) on training successors to attain constant capabilities progress in the feedback cycle will greatly change the dynamics of the flavour of self-improvement that manifests under deep learning. To borrow Yudkowsky's metaphor, such self-improvement seems much less likely to be "prompt critical".

    In my terminology: the cumulative returns to cognitive capabilities from the investment of economic/computational resources are sublinear (see:[9]).

    See:[17][32]

  9. ^

    Alternatively, the marginal returns diminish at a superlinear rate.

    To a first approximation, sublinear cumulative returns, implies that marginal returns are diminishing at a superlinear rate (and vice versa).

    Though there are some subtleties here: marginal returns that diminish at an exponential rate imply bounded cumulative returns (geometric series with ratios  converges). Meanwhile cumulative returns that grow at a logarithmic rate are not bounded do not converge.

    I think whether to consider marginal returns or cumulative returns depends on the nature of the process you're considering (what exactly drives the returns).

  10. ^

    The levels Yudkowsky is referring to:

    So I shall stratify this causality into levels - the boundaries being semi-arbitrary, but you've got to draw them somewhere:

    • "Metacognitive" is the optimization that builds the brain - in the case of a human, natural selection; in the case of an AI, either human programmers or, after some point, the AI itself.
    • "Cognitive", in humans, is the labor performed by your neural circuitry, algorithms that consume large amounts of computing power but are mostly opaque to you.  You know what you're seeing, but you don't know how the visual cortex works.  The Root of All Failure in AI is to underestimate those algorithms because you can't see them...  In an AI, the lines between procedural and declarative knowledge are theoretically blurred, but in practice it's often possible to distinguish cognitive algorithms and cognitive content.
    • "Metaknowledge":  Discoveries about how to discover, "Science" being an archetypal example, "Math" being another.  You can think of these as reflective cognitive content (knowledge about how to think).
    • "Knowledge":  Knowing how gravity works.
    • "Object level":  Specific actual problems like building a bridge or something.

    Fuller context for the bolded:

    This means that the AI's metacognitive level - the optimization process responsible for structuring the AI's cognitive algorithms in the first place - has now collapsed to identity with the AI's object level.

    For some odd reason, I run into a lot of people who vigorously deny that this phenomenon is at all novel; they say, "Oh, humanity is already self-improving, humanity is already going through a FOOM, humanity is already in a Singularity" etc. etc.

    Now to me, it seems clear that - at this point in the game, in advance of the observation - it is pragmatically worth drawing a distinction between inventing agriculture and using that to support more professionalized inventors, versus directly rewriting your own source code in RAM.  Before you can even argue about whether the two phenomena are likely to be similar in practice, you need to accept that they are, in fact, two different things to be argued about.

    And I do expect them to be very distinct in practice.  Inventing science is not rewriting your neural circuitry.  There is a tendency to completely overlook the power of brain algorithms, because they are invisible to introspection.  It took a long time historically for people to realize that there was such a thing as a cognitive algorithm that could underlie thinking.  And then, once you point out that cognitive algorithms exist, there is a tendency to tremendously underestimate them, because you don't know the specific details of how your hippocampus is storing memories well or poorly - you don't know how it could be improved, or what difference a slight degradation could make.  You can't draw detailed causal links between the wiring of your neural circuitry, and your performance on real-world problems.  All you can see is the knowledge and the metaknowledge, and that's where all your causal links go; that's all that's visibly important.

    To see the brain circuitry vary, you've got to look at a chimpanzee, basically.  Which is not something that most humans spend a lot of time doing, because chimpanzees can't play our games.

    You can also see the tremendous overlooked power of the brain circuitry by observing what happens when people set out to program what looks like "knowledge" into Good-Old-Fashioned AIs, semantic nets and such.  Roughly, nothing happens.  Well, research papers happen.  But no actual intelligence happens.  Without those opaque, overlooked, invisible brain algorithms, there is no real knowledge - only a tape recorder playing back human words.  If you have a small amount of fake knowledge, it doesn't do anything, and if you have a huge amount of fake knowledge programmed in at huge expense, it still doesn't do anything.

    So the cognitive level - in humans, the level of neural circuitry and neural algorithms - is a level of tremendous but invisible power. The difficulty of penetrating this invisibility and creating a real cognitive level is what stops modern-day humans from creating AI.  (Not that an AI's cognitive level would be made of neurons or anything equivalent to neurons; it would just do cognitive labor on the same level of organization.  Planes don't flap their wings, but they have to produce lift somehow.)

    Recursion that can rewrite the cognitive level is worth distinguishing.

  11. ^

    In particular, it seems to me like the combinatorial explosion of the search space as one moves further down the concept tree may make recalcitrance higher (one needs to hit a much smaller target [at each step, we're searching for concepts that are improvements to the status quo] in a much larger search space [the search space should grow exponentially in the number of distinct laws/regularities/structures/relevant phenomena discovered]) and result in sublinear cumulative returns (see:[9]) to cognitive reinvestment. See:[12]

    Alternatively, whatever is driving the "ideas are getting harder to find" phenomenon will probably also bound the "growth rate" of returns from cognitive reinvestment via recursive self-improvement within the seed AI paradigm.

  12. ^

    Rather, I think it is a straightforward fact that the optimisation process constructing the system will need to do more "work" (measured in bits) to improve on the current status quo. One would need to argue separately that the gains to optimisation ability of the successor system more than offset the additional optimisation work that needs to be done (from both a much smaller target and much larger search space). I don't think Yudkowsky ever made a particularly persuasive argument for that claim. 

    On priors, I expect diminishing marginal returns to optimisation ability even within the seed AI paradigm. Most (all?) resource utilisation shows diminishing marginal returns eventually; I think it should be a strong default. The main question is whether the returns diminish before systems are strongly superhuman; that seems like more of an open question to me. See:[13]

  13. ^

    beren says:

    I agree. This is the main question that we need answering imho is scaling laws for RSI-like behaviours in existing models which are at or near the human-superhuman boundary. I agree that I don't think EY has strong arguments for. My prior is against as is current scaling law evidence but OTOH it seems that techniques like iterative finetuning/RLHF seems to be highly effective at current scales and may also get better with scale so I have some probability mass on positive or at least not strongly sublinear returns to scale in the current regime

  14. ^

    Compared to idiomatic software at least.

  15. ^

    tailcalled says:

    AI doing prompt engineering. Especially if future AI systems will consist of a bunch of GPT-style prompts feeding data to each other. Then the AI could change the architecture in these pipelines.

    Also, AI doing ordinary programming. (Not just discovering more efficient algorithms, there's lots and lots of ordinary programming work that can be done. Some of this can feed back into the AI systems.)

  16. ^

    I think "AI metalearning better optimisation algorithms" and "AI automating AI architecture search" come closest to seed AI flavoured recursive self improvement, but they aren't quite it as it seems like AI creating more capable successor systems than a system improving itself directly. Succession can still induce foom (though see my [poorly raised] objections:[11][12]).

  17. ^

    Admittedly, AGI would also have positive feedback loops in available economic resources, so the economic constraints to capability amplification may be slacker than I posited. Though I still expect there to be significant economic constraints because I believe that cumulative returns to cognitive capabilities from investment of computational resources are likely strongly sublinear (though see:[18]) and computational resources are one of the main/most straightforward ways to purchase additional capabilities with economic resources.

    Prima facie, many (most?) interesting computational problems have worse than linear time (and/or space) complexity (some are much worse). I'm under the impression that neural networks don't implement correct deterministic algorithms to solve problems, but rather approximate them. As far as I'm aware, this approximation is not linear but generally polynomial with a smallish exponent (I've been told cubic, but have not independently verified/confirmed this. However, as long as the exponent is , it would still translate to sublinear cumulative returns to cognitive capabilities (where cognitive capabilities are measured in terms of the size of the problem sets NNs can approximately solve) from increased computational resources.

    (This is another point I welcome those more technically grounded than me to enlighten me on.)

    Furthermore, it seems that the empirical evidence from deep learning supports the "sublinear cumulative returns to computational resources (as measured via purchased cognitive capabilities)" hypothesis. See:[32] 

  18. ^

    beren:

    For this what really matters is not the returns on cognitive capabilities for cognitive capabilities but the returns on cognitive capabilities for accumulating resources. I.e. even if we hit strong diminishing returns for the first, it isn't clear that the getting resources will share the same return structure. There seem to be lots of places in the economy where there is some kind of winner-take-all dynamic where being slightly better in some way (faster, slightly better execution, luckier etc) results in vastly outsized returns

  19. ^

    clockworklady:

    I'd like to express the controversial view that law of requisite variety, at minimum, renders full / reliable factorization of system's architecture/mind BY same system very nontrivial and very architecturally dependent. And yes, current approaches don't seem like they would have a good time sidestepping LRW (very large state/parameter count, enormous dependence on "scale" suggest that these systems just aren't architectures that would allow for efficient, compressed representations that would allow a system to successfully introspect all relevant aspects of itself)

     

    Current systems seem to have enormous parameter counts and require enormous amount of training data ("scale is all you need" boils down uncomfortably to "you need astronomic amounts of scale") and that does not seem like an architecture that would be amenable to compressed, succinct representations of states or clever TMTO-like modeling hackage that would allow the system itself to efficiently and reliably self-model to the point of high-quality "planned" self-improvement.

     

     If anything, current LLM seem to be very opposite of that .

     

    NB! It doesn't mean that LLMs are at all "banned" from self-improvement, but that its quality will by virtue of their architecture remain massively below "factorizing its own mind" quality benchmark 

  20. ^

    Other phrasings that convey what I'm trying to.

    Recursive self-improvement seems to not be all that likely to play a role in:

    * The path to superintelligent systems

    * The path to attaining decisive strategic advantage

    I.e. I don't think recursive self-improvement is on the critical path of humanity's future; it is in this sense that I don't think it's particularly relevant in the deep learning paradigm.

  21. ^

    There are caveats to this. They sound like:

    * Often we don't want exact solutions, and we just want good enough approximations

    * Or we do not care about deterministic correctness, and correctness with high probability is desired

    * Or we do not care about worst case complexity and it's the expected complexity on real world probability distributions that matters

    * Or ...

    My reply to all such objections of the above form is that as far as I'm aware, these relaxations generally do not take the "effective complexity" of a problem from superlinear to linear or sublinear. Some problems have exponential time complexity even after all these relaxations and most others are generally super quadratic.

  22. ^

    That said the "complexity of intelligence" hypothesis does not rule out fast takeoff. Effective intelligence level as a function of time can still grow at a superlinear rate if computational resources grows sufficiently fast. Economic feedback loops could facilitate this.

    I do have some scepticism of compute investment driven foom, but this post is about recursive self improvement in particular.

  23. ^
  24. ^

    A caveat to this claim (suggested by janus) is that even if the first AGI is created by deep learning, its successor (or some other relevant AI down the line) may not be. janus' model of Yudkowsky believes something similar to this. I think this caveat is sensible, and it made me revise down my estimates that LW folks whose credence in foom wasn't revised downwards after updating on the deep learning revolution were necessarily committing "epistemic malpractice".

    That said, I'm sceptical of the caveat. I think the biggest update of the last decade of machine learning is that we don't actually know how to program intelligence (even for pretty narrow domains, e.g. as far as I'm aware, no one can write a program to classify handwritten digits with high accuracy). It may be possible to construct intelligent systems using more "design like" constructive optimisation processes, but I don't necessarily expect that it would be something very analogous to idiomatic programming (or inasmuch as it is analogous to idiomatic programming, I don't expect it to manifest soon enough to matter).

    Of course, some systems may be able to hold trained quadrillion parameter ML models in their "working memory" and factorise that as one might a 10-line python script, but as with many advanced capabilities, I think this maps to a very high level of "strongly superhuman".

    I'd expect AI systems to be radically transformative or existentially dangerous well before they can "program" their successors. Insomuch as "programming intelligence" is feasible, I don't expect it to be a component on the critical path.

    Not being on the critical path to humanity's future is the main way I think seed AI flavoured recursive self-improvement is "not relevant"; if it does manifest in the deep learning paradigm, it would be too late to matter.

    That said, I don't necessarily posit high confidence in these expectations. I don't really have a technically grounded compelling story for why deep learning systems would be unable to properly factorise systems of comparable complexity (this is not to say that one does not exist, I just don't intuit it). There's not necessarily any rigorous tech tree that dictates existentially dangerous capabilities must come before such advanced factorisation abilities, mostly these are just intuitions derived from language models. Moreover, this entire objection could be thrown entirely out of the window if we successfully automate the design of intelligent systems.

  25. ^

    E.g. I was left with that impression listening to some of Yudkowsky's recent AI safety writing, and empirically in AI safety conversations I've had online, many people still seem to have high confidence in foom.

  26. ^

    Again, if it's indeed the case that deep learning is not particularly amenable to seed AI flavoured recursive self-improvement and they expect AGI to arise within the deep learning paradigm.

  27. ^

    Largely, I mean that many people are not making updates I think consistency demands they make given their previously stated beliefs, their reasons for those beliefs, and the available evidence.

    Also, see:[24].

  28. ^

    Within lifetime human learning seems to be remarkably more sample efficient than our best ML models, so I'm somewhat (a lot?) more sympathetic to the content overhang thesis than the hardware overhang one.

    Even so, I don't expect the first AGIs to be anywhere near as sample efficient as the brain (in general biology is way more resource (e.g. energy) efficient than human engineering (bird flight, animal locomotion in general, the brain's "10 -12 watt" energy budget [it seems to be 1+ orders of magnitude more energy efficient than GPUs of comparable processing power and within an order of magnitude of the theoretical limit given its size], etc.) and so I wouldn't be surprised if biology is way more data efficient than our first AIs (and I wouldn't necessarily expect AI to reach biology's data efficiency quickly). See:[29]

  29. ^

    For largely similar reasons as why we didn't reach biology's energy efficiency quickly (for some domains we still haven't reached it). Basically, energy has been a much more taut constraint for evolution than for human engineering (ATP vs electricity, humans having access to much denser energy sources (e.g. fossil fuels, nuclear power), and generally a much larger energy budget). As a constructive optimisation process, evolution was subject to much harsher energy constraints than human engineering, so any product of evolution that works at all will necessarily be very energy efficient in comparison. 

    I think a similar argument could be made about data efficiency; within lifetime human learning simply does not have the data budget that our large language models do. Human language was evolved in a much more data sparse environment (6+ orders of magnitude less available data doesn't seem to me like an overestimate). Given current data availability (and the rate at which it's growing), it seems unlikely that data budget would be anywhere near as harsh a constraint for ML models. If there aren't strong selection pressures for data efficiency, I wouldn't expect data efficiency within an order of magnitude of humans.

    A major way I could be wrong is if general intelligence turns out to be relatively compact and selection for cross-domain performance comes bundled with data efficiency. I think this is somewhat plausible/don't rule it out by default. Or rather, I think sufficiently capable AI systems should be more data efficient than humans (I expect that sufficiently advanced engineering [e.g. nanotech] would be more energy efficient than biology). The question is if human level (or better) data efficiency is something that we get on the critical path. I don't have particularly strong intuitions on that.

    Also note that unlike the case with energy, (as far as I'm aware) we don't have strong theoretical reasons to suspect that the brain is within an order of magnitude of the theoretical limit for data efficient learning for a system of its size (though see:[30]). It seems intuitively plausible (given an ample energy budget) to be much more sample efficient than the brain

    Overall data efficiency by default for sufficiently capable systems seems more plausible than energy efficiency by default, so I think there's a stronger case for a content overhang than a hardware overhang (see:[31]).

  30. ^

    Maybe this would be true if we also condition on the brain's energy budget as well? Should we expect the brain to be near the theoretical limits for data efficiency given that it's near the theoretical limits for energy efficiency? I'm not quite sure. I'd appreciate commentary on this from people much more knowledgeable on the brain than me.as confident on this.

  31. ^

    Hardware overhang is often framed in terms of available compute not available data. Learning that the brain is very energy efficient updated my intuitions in complicated ways that I can't communicate clearly (in part because some parts are blank due to missing technical details) but roughly:

    * Conditioning on the brain's energy efficiency and grossly better sample efficiency than our current best models (I'm under the impression that its orders of magnitude more sample efficient), I expect the brain to be very compute efficient given its energy budget. That is, I think the empirical evidence suggests that the brain is just extremely efficient in general.

    * I expect (strongly?) sublinear cumulative returns to cognitive reinvestment (see:[11][12]) so I think it'll be extremely nontrivial to attain better compute efficiency than the brain.

    * I don't have an intuitive story for why radically better compute efficiency than the brain is attainable the way we have intuitive stories for why better sample efficiency is readily attainable (and even then, I expect the much better sample efficiency to be accompanied by much higher compute utilisation).

    * I think that cumulative returns to cognitive capabilities from computational resources are sublinear (superlinearly more computational resources are required to produce the same constant progress). See:[17][32]. Thus, I don't think an abundance of computing resources is by itself likely to be powerful enough to induce an intelligence explosion without considerable energy/compute efficiency; as mentioned earlier, I believe the brain performs very well along those dimensions.

  32. ^

    Our extant scaling laws show a power law relationship between data/model size and cross-entropy loss (with training compute scaling as the square of model/dataset size). I think this suggests that cumulative returns to cognitive capabilities from increased computational resources (training data and compute) are strongly sublinear.

    I think the scaling laws are significant empirical evidence against the foom thesis within the deep learning paradigm, though see:[33][34].

    That said, I'm not technically grounded in ML and so I wouldn't be surprised if I was mistaken here. I invite those more informed on the relevant details to comment here. I'd be particularly curious to hear from ML folks who still expect foom in light of the evidence from scaling laws.

  33. ^

    janus disagrees here. I quote:

    I disagree with this, because
    - empirical scaling laws describe a regime below that of RSI

    - scaling laws describe single-step predictive accuracy. Capabilities of simulacra across multiple steps, e.g. chain of thought accuracy, will scale more like exponentially with respect to single-step accuracy (see https://yaofu.notion.site/A-Closer-Look-at-Large-Language-Models-Emergent-Abilities-493876b55df5479d80686f68a1abd72f). RSI-ability is unlikely to correspond to single-step predictive accuracy, more likely to correspond to multi-step "accuracy", because RSI is presumably an activity the model does while generating trajectories.

    - In general, scaling laws measure prediction loss over the entire training set, and don't directly tell us whether the model is improving at the things we care about. You cannot tell just from reduced loss whether the model has just memorized some more arbitrary facts or if it's improved at some intellectually difficult predictions. The latter may be the result of a major capabilities gain, but cause be reflected in small improvements in loss because it is relevant to only a few select predictions in the training data.

  34. ^

    beren:

    I agree largely with Janus here. I want to stress though that this has goalpost moved a bit from the pure scale-pilled 'generate FOOM by pure scaling' which I think the scaling laws are strong evidence against. I agree that a.) if there are multistep tasks, under a simple binomial model this will improve much better with scale than pure loss and b.) currently it is unclear how the effects of RSI-like things -- i.e. self-prompting, iterative finetuning etc scale and these could potentially have  both strong positive returns and/or show strong improvements with scale. 

    We also have very little information about scaling laws as applied to things like architecture search or active data selection, where better models could potentially improve the scaling law coefficients of successor models

  35. ^

    I am particularly unconfident about this point, and especially with respect to a hardware overhang; it seems to me that scaling maximalism could make a hardware overhang much more likely (see:[36]) as training is often (much? see:[37]) more expensive than inference. But I don't have a solid grasp of how easily/readily trained models can be scaled up to use more compute at inference and how that affects their performance (see:[38]).

    I would be grateful if those more technically grounded than myself were to address this point in particular.

  36. ^

    beren:

    I disagree with this point. Almost all the RSI methods seem to require some level of running training experiments and not just inference -- i.e. model has to fine-tune itself or train a successor. This will be primarily bottlenecked by training compute

  37. ^

    beren:

    is it really true that training is only slightly more expensive than inference? I mean I guess it depends on how much inference you do but the approximate cost of a training run is basically dataset-size / batch_size * N_epochs * 2 (assuming 1 forward and 1 backward pass of approx equal compute). 

  38. ^

    beren:

    This is potentially possible (i.e. adaptive compute) and the brain does something like this. But current ML systems don't really do this and so inference is just a forward pass with a fixed cost. The most probably increase in capabilities could come from finetuning which are essentially just very short additional training runs

  39. ^

    At least, not with amounts of compute available in the near term. But I could also be wrong on this; I'm not technically grounded enough here to feel strongly about my intuitions in this regard.

    Alternatively, it's not necessarily the case that even if we had unbounded compute, we could specify training objectives that select for general intelligence under iterated self-play. janus suspects this:

    I agree with this, though not even primarily because of compute limitations (although that's also a factor). Unlike Go, we don't have the "true name" of a win condition for AGI operating in rich domains that is amenable to iterated games.

  40. ^

    beren:

    yeah I mean alpha-go is not actually an example of foom at all and self-play doesn't really have RSI like characteristics imho. It would be great to get scaling laws for self-play. I suspect diminishing returns but don't actually know it. To me self-play is more an example of standard scaling

New Answer
New Comment

9 Answers sorted by

Ben Livengood

94

I think it's premature to conclude that AGI progress will be large pre-trained transformers indefinitely into the future. They are surprisingly(?) effective but for comparison they are not as effective in the narrow domains where AlphaZero and AlphaStar are using value and action networks paired with Monte-Carlo search with orders of magnitude fewer parameters. We don't know what MCTS on arbitrary domains will look like with 2-4 OOM-larger networks, which are within reach now. We haven't formulated methods of self-play for improvement with LLMs and I think that's also a potentially large overhang.

There's also a human limit to the types of RSI we can imagine and once pre-trained transformers exceed human intelligence in the domain of machine learning those limits won't apply. I think there's probably significant overhang in prompt engineering, especially when new capabilities emerge from scaling, that could be exploited by removing the serial bottleneck of humans trying out prompts by hand.

Finally I don't think GOFAI is dead; it's still in its long winter waiting to bloom when enough intelligence is put into it. We don't know the intelligence/capability threshold necessary to make substantial progress there. Generally, the bottleneck has been identifying useful mappings from the real world to mathematics and algorithms. Humans are pretty good at that, but we stalled at formalizing effective general intelligence itself. Our abstraction/modeling abilities, working memory, and time are too limited and we have no idea where those limits come from, whether LLMs are subject to the same or similar limits, or how the limits are reduced/removed with model scaling.

  1. MCTS seems difficult in "rich" (complex/high dimensional problem domains, continuous, stochastic, large state/action spaces) environments (e.g. the real world)?
  2. My conclusion was that AGI progress would be deep learning based into the indefinite future, not pretrained transformers
3Ben Livengood
Naive MCTS in the real world does seem difficult to me, but e.g. action networks constrain the actual search significantly. Imagine a value network good at seeing if solutions work (maybe executing generated code and evaluating the output) and plugging a plain old LLM in as the action network; it could theoretically explore the large solution space better than beam search or argmax+temperature[0]. 0: https://openreview.net/forum?id=Lr8cOOtYbfL is from February and I found it after writing this comment, figuring someone else probably had the same idea.

Vladimir_Nesov

72

If deep learning yields AGI, the question is how far can its intelligence jump beyond human level before it runs out of compute available in the world, using the improvements that can be made very quickly at the current level of intelligence. In short sprints, a hoard of handmade constants can look as good as asymptotic improvement. So the latter's hypothetical impossibility doesn't put convincing bounds on how far this could be pushed before running out of steam. And if by that point procedures for bootstrapping nanotech become obvious, this keeps going, transitioning into disassembling the world for more compute without pause. All without refuting the bitter lesson.

Christopher King

51

I think you forgot one critical thing. Why does the normal argument for RSI's inevitability fail? The answer is: it doesn't.

Even though there is some research in the direction of a neural network changing each of its weights directly, this isn't important to the main argument because it is about improving source code. The weights are more like compiled code.

In the context of deep learning, the source code consists of:

  • The code defining the architecture
  • The code for collecting data (it can likely just hard code all of the training data if it is smart enough, but this isn't strictly necessary)
  • The code for training
  • The code utilizing the neural network (this includes things like prompt engineering, the interface to the outside world, sampling, quantization, etc...)

So the question is if a deep learning model could improve any of this code. The question of if it can improve its "compiled code" (the weights) is also probably yes, but isn't what the argument is based on.

Then this runs into the issue that I challenge there's just not that much gain to be made from such source code improvements.

3Christopher King
This seems highly unlikely.
3DragonGod
By "not that much gain", I mean that no amount of algorithmic improvements would change the sublinear scaling of intelligence as a function of compute.
1Archimedes
Until AI is at least as sample-efficient and energy-efficient as humans are at learning, there are significant algorithmic gains that are possible. This may not be possible under the current deep-learning paradigm but we know it's possible under some paradigm since evolution has already accomplished it blindly. I do share your skepticism that something like an LLM alone could recursively improve itself quickly. Assuming FOOM, my model of how it happened has deep learning as only part of the answer. It's part of the recursive loop but is used mostly as a general heuristic module, much like the neural net of a chess engine is only a piece of the puzzle; you still need a fast search algorithm that uses the heuristics efficiently.

A misaligned model might not want to do that, though, since it would be difficult for it to ensure that the output of the new training process is aligned to its goals.

Matt Goldenberg

40

It seems pretty clear to me that AI's could get really good at understanding and predicting the results of editing model weights in the same way they can get good at predicting how proteins will fold.  From there, directly creating circuits that add XYZ reasoning functionality seems at least possible.  

I don't actually share this intuition.

I don't think you can get the information of computing the gradient updates to particular weights without actually running that computation (or something equivalent to it).

And presumably one would need empirical feedback (i.e. the value of the objective function we're optimising the network for on particular inputs) to compute the desired gradient updates.

The idea of the system just predicting the desired gradient updates without any ground truth supervisory signal seems fanciful.

6Matt Goldenberg
Ehh, protein folding feels equally fanciful to me, figuring out how the protein will fold without actually simulating the physical interactions. Meanwhile we have humans already editing model weights to change model behavior in desired ways: https://www.lesswrong.com/posts/gRp6FAWcQiCWkouN5/maze-solving-agents-add-a-top-right-vector-make-the-agent-go

I agree that it seems possible. I have doubts that predicting the results of editing weights is a more compute-efficient way of causing a model to exhibit the desired behavior than giving it the obvious tools and using fine-tuning / RL to make it able to use those tools though, or alternatively just don't the RL/finetune directly. That's basically the heart of how I interpret the bitter lesson - it's not that you can't find more efficient ways to do what DL can do, it's that when you have a task that humans can do and computers can't, the approach of "introspect and think really hard about how to approach task the right way" is outperformed by the approach of "lol more layers go brrrrr".

Zygi Straznickas

41

This is a solid argument inasmuch as we define RSI to be about self-modifying its own weights/other-inscrutable-reasoning-atoms. That does seem to be quite hard given our current understanding.

But there are tons of opportunities for an agent to improve its own reasoning capacity otherwise. At a very basic level, the agent can do at least two other things:

  1. Make itself faster and more energy efficient -- in the DL paradigm, techniques like quantization, distillation and pruning seem to be very effective when used by humans and keep improving, so it's likely an AGI would improve them further.
  2. Invent computational tools: wrt

Most problems in computer science have superlinear time complexity

on one hand sure, improving this is (likely) impossible in the limit because of fundamental complexity properties. On the other hand, the agent can still become vastly smarter than humans. A particular example: the human mind, without any assistance, is very bad at solving 3SAT. But we've invented computers, and then constraint solvers, and now are able to solve 3SAT much much faster, even though 3SAT is (likely) exponentially hard. So the RSI argument here is, the smarter (or faster) the model is, the more special-purpose tools it can create to efficiently solve specific problems and thus upgrade its reasoning ability. Not to infinity, but likely far beyond humans.

To be clear, the complexity theory argument is against fast takeoff, not an argument that intelligence caps at some level relative to humans.

Analogy approaches infinity, but it does so much slower than .

I.e. the sublinear asymptotics would prevent AI from progressing very quickly to a vastly superhuman level (unless the AI is able to grow its available resources sufficiently quickly to dominate the poor asymptotics.

Alternatively, each order of magnitude increase in compute buys (significantly) less intelligence; thus progress from human level to a... (read more)

1Zygi Straznickas
Thanks for clarifying. Yeah, I agree the argument is mathematically correct, but it kinda doesn't seem to apply to historic cases of intelligence increase that we have: * Human intelligence is a drastic jump from primate intelligence but this didn't require a drastic jump in "compute resources", and took comparably little time in evolutionary terms. * In human history, our "effective intelligence" -- capability of making decisions with the use of man-made tools -- grows at an increasing rate, not decreasing I'm still thinking about how best to reconcile this with the asymptotics. I think the other comments are right in that we're still at the stage where improving the constants is very viable.
5the gears to ascension
Oh man am I not convinced of this at all. Human intelligence seems to me to be only the result of 1. scaling up primate brains and 2. accumulating knowledge in the form of language, which relied on 3. humans and hominids in general being exceptional at synchronized behavior and collective action (eg, "charge!!!") - modern primates besides humans are still exceptionally smart per synapse among the animal kingdom.
3Archimedes
I agree that humans are not drastically more intelligent than all other animals. This makes the prospect of AI even scarier, in my opinion, since it shows how powerful accumulated progress is. I believe that human-level intelligence is sufficient for an AI to be extremely dangerous if it can scale while maintaining self-alignment in the form of "synchronized behavior and collective action". Imagine what a tech company could achieve if all employees had the same company-aligned goals, efficient coordination, in silico processing speeds, high-bandwidth communication of knowledge, etc. With these sorts of advantages, it's likely game over before it hits human-level intelligence across the board.
4the gears to ascension
indeed. my commentary should not be seen as reason to believe we're safe - just reason to believe the curve sharpness isn't quite as bad as it could have been imagined to be.
3DragonGod
My impression is that the human brain is a scaled up primate brain. As for humanity's effective capabilities increasing with time: * Language allowed accumulation of knowledge across generations, plus cultural evolution * Population growth has been (super)exponential over the history of humanity * Larger populations afforded specialisation/division of labour, trade, economics, industry, etc. Alternatively, our available resources have grown at a superexponential rate. The issue is takeoff being fast relative to the reaction time of civilisation. The AI would need to grow its invested resources much faster than civilisation has been to date. But resource investment seems primed to slow down if anything.
4Archimedes
Resource accumulation certainly can't grow exponentially indefinitely and I agree that RSI can't improve exponentially forever either, but it doesn't need to for AI to take over. An AI doesn't have to get far beyond human-level intelligence to control the future. If there's sufficient algorithmic overhang, current resources might even be enough. FOOM would certainly be easier if no new hardware were necessary. This would look less like an explosion and more like a quantum leap followed by slower growth as physical reality constrains rapid progress.
2DragonGod
Explain the inside view of "algorithmic overhang"?
3Archimedes
I don't have an inside view. If I did, that would be pretty powerful capabilities information. I'm pointing at the possibility that we already have more than sufficient resources for AGI and we're only separated from it by a few insights (a la transformers) and clever system architecture. I'm not predicting this is true just that it's plausible based on existing intelligent systems (humans). Epistemic status: pondering aloud to coalsce my own fuzzy thoughts a bit I'd speculate that the missing pieces are conceptually tricky things like self-referential "strange loops", continual learning with updateable memory, and agentic interactions with an environment. These are only vague ideas in my mind but, for some reason, feel difficult to solve but don't feel like things that require massive data and training resources so much as useful connections to reality and itself.

4-3

So I'm going to strong disagree here.

First of all, as it turns out in  practice, scale was everything.  This means that any AI idea you want to name, unless that idea was based on a transformer and worked on by approximately 3 labs, it was never actually attempted.

We can just ignore all the other thousands of AI methods that humans tried because they were not attempted with a relevant level of scale.  

Therefore, RSI has never been tried.
Second, you can easily design a variation on RSI that works fine with current paradigms.

  It's not precisely RSI but it's functionally the same thing.  Here are the steps:
 

[1] benchmark of many tasks.  Tasks must be autogradeable, human participants must be able to 'play' the tasks so we have a control group score, tasks must push the edge of human cognitive ability (so the average human scores nowhere close to the max score, and top 1% humans do not max the bench either), there must be many tasks and with a rich permutation space.  (so it isn't possible for a model to memorize all permutations)

[2] heuristic weight score on this task intended to measure how "AGI like" a model is.  So it might be the RMSE across the benchmark.  But also have a lot of score weighting on zero shot, cross domain/multimodal tasks.  That is, the kind of model that can use information from many different previous tasks on a complex exercise it has never seen before is closer to an AGI, or closer to replicating "Leonardo da Vinci", who had exceptional human performance presumably from all this cross domain knowledge.

[3] In the computer science task set, there are tasks to design an AGI for a bench like this.  The model proposes a design, and if that design has already been tested, immediately receives detailed feedback on how it performed.  

The "design an AGI" subtask can be much simpler than "write all the boilerplate in Python", but these models will be able to do that if needed.  

 

As tasks scores approach human level across a broad set of tasks, you have an AGI.  You would expect it to almost immediately improve to a low superintelligence.  As AGIs get used in the real world and fail to perform well at something, you add more tasks to the bench, and/or automate creating simulated scenarios that use robotics data.

Why aren't we already doing this if it's so simple?
 

Because each AGI candidate training run has to be at least twice as large as llama-65b, so it means 2m+ in training costs per run.  And you need to explore the possibility space pretty broadly, so you would figure several thousand runs to really get to a decent AGI design which will not be optimal.  

This is one of the reasons foom cannot happen.  At least not without a lot more compute than we have now.  Each attempt is too expensive.

Can we refine the above algorithm into something more compute efficient?  Yes, somewhat (by going to a modular architecture, where each "AGI candidate" is composed of hundreds of smaller networks, and we reuse most of them in between candidates), but it's going to still require a lot more compute than llama-65b took to train.  

Nathan Helm-Burger

30

I've made a related Manifold question here: 

I'm a believer in RSI-soon. I have an inside view that supports this. I expect convincing evidence of RSI to become apparent in the world before 2026. If you think I'm wrong, but this manifold question doesn't get at the root of things, I'll happily also vote on other markets related to this.

veered

21

Direct self-improvement (i.e. rewriting itself at the cognitive level) does seem much, much harder with deep learning systems than with the sort of systems Eliezer originally focused on.

In DL, there is no distinction between "code" and "data"; it's all messily packed together in the weights. Classic RSI relies on the ability to improve and reason about the code (relatively simple) without needing to consider the data (irreducibly complicated).

Any verification that a change to the weights/architecture will preserve a particular non-trivial property (e.g. avoiding value drift) is likely to be commensurate in complexity to the complexity of the weights. So... very complex.

The safest "self-improvement" changes probably look more like performance/parallelization improvements than "cognitive" changes. There are likely to be many opportunities for immediate performance improvements[1], but that could quickly asymptote. 

I think that recursive self-empowerment might now be a more accurate term than RSI for a possible source of foom. That is, the creation of accessory tools for capability increase. More like a metaphorical spider at the center of an increasingly large web. Or (more colorfully) a shoggoth spawning a multitude of extra tentacles.

The change is still recursive in the sense that marginal self-empowerment increase the ability to self-empower.

So I'd say that a "foom" is still possible in DL, but is both less likely and almost certainly slower. However, even if a foom is days or weeks rather than minutes, many of the same considerations apply. Especially if the AI has already broadly distributed itself via the internet.

Perhaps instead of just foom, we get "AI goes brrrr... boom... foom".

  1. ^

    Hypothetical examples include: more efficient matrix multiplication, faster floating point arithmetic, better techniques for avoiding memory bottlenecks, finding acceptable latency vs. throughput trade-offs, parallelization, better usage of GPU L1/L2/etc caches, NN "circuit" factoring, and many other algorithmic improvements that I'm not qualified to predict.

[-][anonymous]30

What if the machine has a benchmark/training suite for performance. On the benchmark is a task for designing a better machine architecture.

Machine proposes a better architecture. New architecture maybe a brand new set of files defining the networks, topology, and training procedure, or they may reuse networks for components.

For example you might imagine an architecture that uses gpt-3.5 and -4 as subsystems but the "executive control" is from a new network defined by the architecture.

Given a very large compute budget (many billions), the company hosting... (read more)

2DragonGod
Training runs already take months. I'd expect that to take several generations of models, so double digit numbers of months in an aggressive scenario? (Barring drastic jumps in compute that cut months long training runs to hours/days).
1[anonymous]
Read paragraph 2 But yes foom wasn't going to happen. It takes time for ai to be improved, turns out reality gets a vote.

I think takeoff from broadly human level to strongly superhuman is years, and potentially decades.

Foom in days or weeks still seems just as fanciful as before.

avturchin

20

An interesting possibility is recursively self-improving prompt in auto-GPT.

3 comments, sorted by Click to highlight new comments since:

The MIRI 2000s paradigm for an AI capable of self-improvement, was that it would be modular code with a hierarchical organization, that would potentially engage in self-improvement at every level. 

The actual path we've been on has been: deep learning, scaling, finetuning with RLHF, and now (just starting) reflective agents built on a GPT base

A reflective GPT-based agent is certainly capable of studying itself and coming up with ideas for improvement. So we're probably at the beginning of attempts at self-improvement, right now. 

Inability of neural nets to quickly retrain a new larger versions will slower the speed on their progress and thus many competing AIs are more likely to emerge. It is less likely that they can merge later by "merging utility functions" as NNs have no explicit utility functions. Thus multipolar world is more likely. 

Does this reasoning mean that interpretability is basically impossible?