We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings together in a single place the most important arguments that drive our models of the AGI race, and what we need to do to avoid catastrophe.
We felt that something like this has been missing from the AI conversation. Most of these points have been shared before, but a “comprehensive worldview” doc has been missing. We’ve tried our best to fill this gap, and welcome feedback and debate about the arguments. The Compendium is a living document, and we’ll keep updating it as we learn more and change our minds.
We would appreciate your feedback, whether or not you agree with us:
- If you do agree with us, please point out where you think the arguments can be made stronger, and contact us if there are ways you’d be interested in collaborating in the future.
- If you disagree with us, please let us know where our argument loses you and which points are the most significant cruxes - we welcome debate.
Here is the twitter thread and the summary:
The Compendium aims to present a coherent worldview about the extinction risks of artificial general intelligence (AGI), an artificial intelligence that exceeds that of humans, in a way that is accessible to non-technical readers who have no prior knowledge of AI. A reader should come away with an understanding of the current landscape, the race to AGI, and its existential stakes.
AI progress is rapidly converging on building AGI, driven by a brute-force paradigm that is bottlenecked by resources, not insights. Well-resourced, ideologically motivated individuals are driving a corporate race to AGI. They are now backed by Big Tech, and will soon have the support of nations.
People debate whether or not it is possible to build AGI, but most of the discourse is rooted in pseudoscience. Because humanity lacks a formal theory of intelligence, we must operate by the empirical observation that AI capabilities are increasing rapidly, surpassing human benchmarks at an unprecedented pace.
As more and more human tasks are automated, the gap between artificial and human intelligence shrinks. At the point when AI is able to do all of the tasks a human can on a computer, it will functionally be AGI and able to conduct the same AI research that we can. Should this happen, AGI will quickly scale to superintelligence, and then to levels so powerful that AI is best described as a god compared to humans. Just as humans have catalyzed the holocene extinction, these systems pose an extinction risk for humanity not because they are malicious, but because we will be powerless to control them as they reshape the world, indifferent to our fate.
Coexisting with such powerful AI requires solving some of the most difficult problems that humanity has ever tackled, which demand Nobel-prize-level breakthroughs, billions or trillions of dollars of investment, and progress in fields that resist scientific understanding. We suspect that we do not have enough time to adequately address these challenges.
Current technical AI safety efforts are not on track to solve this problem, and current AI governance efforts are ill-equipped to stop the race to AGI. Many of these efforts have been co-opted by the very actors racing to AGI, who undermine regulatory efforts, cut corners on safety, and are increasingly stoking nation-state conflict in order to justify racing.
This race is propelled by the belief that AI will bring extreme power to whoever builds it first, and that the primary quest of our era is to build this technology. To survive, humanity must oppose this ideology and the race to AGI, building global governance that is mature enough to develop technology conscientiously and justly. We are far from achieving this goal, but believe it to be possible. We need your help to get there.
I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.
I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to "align" military leaders than we do AI models - we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.
I think you could also make a similar argument that big things control little things - with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can - only the one in the cockpit.
My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.
Chess is a system that's perfectly predictable. Reality is a chaotic system. Chaotic systems - like a three-body orbital arrangement - are impossible to perfectly predict in all cases even if they're totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It's fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.
It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don't want to hurt people, they want money. Yet I don't think anyone would suggest that the North Hollywood shooters were "non-hostile but misaligned". I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.
Goal divergence is a hallmark of the Bostromian paradigm - the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.
Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.
I'm not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don't think they're correct. I think that the definition of Alignment you have given - "the ability to “steer AI systems toward a person's or group's intended goals, preferences, and ethical principles.”" - does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo's post:
https://www.lesswrong.com/posts/67fNBeHrjdrZZNDDK/defining-alignment-research
is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.
Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.
You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner's essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn't add anything to the conversation, and because of that, it won't change anyone's minds. The reason that the public discourse doesn't involve Alignment talk isn't due to lack of awareness, it's because it isn't at all compelling to most people. Writing it better, with a nicer format, will not change this.
To respond to this comment, I'll give a view on why I think the answer to coordination might be easier for AIs than for people, and also explain why AI invention likely breaks a lot of the social rules we are used to.
For example, one big difference I think that impacts coordination for AIs is that an AI model is likely to be able to copy itself millions of times, given current inference scaling, and in particular you can distribute fine-tunes to those millions as though they were a single unit.
This is a huge change for coordination, because humans can't co... (read more)