This is one of the documents I was responding to when I wrote A general model of safety-oriented AI development, Three AI Safety Related Ideas, and Two Neglected Problems in Human-AI Safety. (I didn't cite it because it was circulating semi-privately in draft form, and Eric apparently didn't want its existence to be publicly known.) I'm disappointed that although Eric wrote to me "I think that your two neglected problems are critically important", the perspectives in those posts didn't get incorporated more into the final document, which spends only 3 short paragraphs out of hundreds of pages to talk about what I think of as "human safety problems". (I think those paragraphs were in the draft even before I wrote my posts.)
I worry about the framing adopted in this document that the main problem in human-AI safety is "questions of what humans might choose to do with their capabilities", as opposed to my preferred framing of "how can we design human-AI systems to minimize total risk". (To be fair to Eric, a lot of other AI safety people also only talk about "misuse risk" and not about how AI is by default likely to exacerbate human safety problems, e.g., by causing rapid distributiona
...It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc.
Can you explain how you'd implement these services? Take "how disruptive new technologies will be" for example. I imagine you can't just apply ML given the paucity of training data and how difficult it would be to generalize from historical data to new technologies and new social situations. And it seems to me that if you base it on any kind of narrow AI technology, it would be easy to miss some of the novel implications/consequences of the new technologies and social situations and end up with a wrong answer. Maybe you could instead base it on a general purpose reasoner or question-answerer, but if something like that exists, AI would already have created a lot of new technologies that are risky for humans to face. Plus, the general purpose AI could replace a lot of discrete/narrow AI services, so I feel like we would already have moved past the CAIS world at that point. BTW, if the service i
...Can you explain how you'd implement these services?
Not really. I think of CAIS as suggesting that we take an outside view that says "looking at how AI has been progressing, and how humans generally do things, we'll probably be able to do more and more complex tasks as time goes on". But the emphasis that CAIS places is that the things we'll be able to do will be domain-specific tasks, rather than getting a general-purpose reasoner. I don't have a detailed enough inside view to say how complex tasks might be implemented in practice.
I agree with the rest of what you said, which feels to me like considering a few possible inside-view scenarios and showing that they don't work.
One way to think about this is through the lens of iterated amplification. With iterated amplification, we also get the property that our AI systems will be able to do more and more complex tasks as time goes on. The key piece that enables this is the ability to decompose problems, so that iterated amplification always bottoms out into a tree of questions and subquestions down to leaves which the base agent can answer. You could think of (my conception of) CAIS as a claim that a...
I have a problem with section 32, "Unaligned superintelligent agents need not threaten world stability". Here's the summary of that section from the paper:
- Powerful SI-level capabilities can precede AGI agents.
- SI-level capabilities could be applied to strengthen defensive stability.
- Unopposed preparation enables strong defensive capabilities.
- Strong defensive capabilities can constrain problematic agents.
So the key idea here seems to be that good actors will have a period of time to use superintelligent AI services to prepare some sort of ubiquitous defense that will constrain any subsequent AGI agents. But I don't understand where this period of "unopposed preparation" comes from. Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could? If they did that, then superintelligent AGI agents would arise nearly simultaneously with SI-level capabilities, and there would be no such period of unopposed preparation. In section 32.2, Eric only argues that SI-level capabilities can precede AGI agents. Since I think they wouldn't at least not by a significant margin, the whole argumen
...Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could?
Because any task that an AGI could do, CAIS could do as well. (Though I don't agree with this -- unified agents seem to work better.)
But if quickly building an AGI can potentially allow someone to take over the world before "unopposed preparation" can take place, isn't that a compelling motivation by itself for many people?
I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.
It may be the case that people try to take over the world just with CAIS, and maybe that could succeed. I think he's arguing only against AGI accident risk here, not against malicious uses of AI. (I think you already knew that, but it wasn't fully clear on reading your comment.)
I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.
That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32. If that was his position, he could just talk about how ordinary policing and military defense would work in a CAIS world (i.e., against human adversaries wielding CAIS) and say that the same policing/defense would also work against AGI because AGI is not much more capable than CAIS.
Instead it seems clear that he thinks AGI requires special effort to defend against, which is made possible by a delay between SI-level CAIS and AGI, which he proposes that we use to do a very extensive "unopposed preparation". I've been trying to figure out why he thinks there will be such a delay and my current best guess is "Implementation of the AGI model is widely regarded as requiring conceptual breakthroughs." (page 75) which he repeats on page 77, "AGI (but not CAIS) calls for conceptual breakthr
...Do you get it?
I doubt I will ever be able to confidently answer yes to that question.
That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32.
My model is that he does think AGI won't be much more capable than CAIS (see sections 12 and 13 in particular, and 10, 11 and 16 also touch on the topic), but lots of people (including me) kept making the argument that end-to-end training tends to improve performance and so AGI would outperform CAIS, and so he decided to write a response to that.
In general, my impression from talking to him and reading earlier drafts is that the earlier chapters are representative of his core models, while the later chapters are more like responses to particular arguments, or specific implications of those models.
I can give one positive argument for AGI being harder to make than SI-level CAIS. All of our current techniques for building AI systems create things that are bounded in the time horizon they are optimizing over. It's actually quite unclear how we would use current techniques ...
Eric and I have exchanged a few emails since I posted this summary, I'm posting some of it here (with his permission), edited by me for conciseness and clarity. The paragraphs in the quotes are Eric's, but I have rearranged his paragraphs and omitted some of them for better flow in this comment.
There is a widespread intuition that AGI agents would by nature be more integrated, flexible, or efficient than comparable AI services. I am persuaded that this is wrong, and stems from an illusion of simplicity that results from hiding mechanism in a conceptually opaque box, a point that is argued at some length in Section 13.
Overall, I think that many of us have been in the habit of seeing flexible optimization itself as problem, when optimization is instead (in the typical case) a strong constraint on a system’s behavior (see Section 8). Flexibility of computation in pursuit of optimization for bounded tasks seems simply useful, regardless of planning horizon, scope of considerations, or scope of required knowledge.
I agree that AGI agents hide mechanism in an opaque box. I also agree that the sort of optimization that current ML does, which is very task-focused, is a strong cons...
That was the summary :P The full thing was quite a bit longer. I also didn't want to misquote Eric.
Maybe the shorter summary is: there are two axes which we can talk about. First, will systems be transparent, modular and structured (call this CAIS-like), or will they be opaque and well-integrated? Second, assuming that they are opaque and well-integrated, will they have the classic long-term goal-directed AGI-agent risks or not?
Eric and I disagree on the first one: my position is that for any particular task, while CAIS-like systems will be developed first, they will gradually be replaced by well-integrated ones, once we have enough compute, data, and model capacity.
I'm not sure how much Eric and I disagree on the second one: I think it's reasonable to predict that the resulting systems are specialized for particular bounded tasks and so won't be running broad searches for long-term plans. I would still worry about inner optimizers; I don't know what Eric thinks about that worry.
This summary is more focused on my beliefs than Eric's, and is probably not a good summary of the intent behind the original comment, which was "what does Eric think Rohin got wrong in his summary + opinion of CAIS", along with some commentary from me trying to clarify my beliefs.
Updates were mainly about actually carving up the space in the way above. Probably others, but I often find it hard to introspect on how my beliefs are updating.
Promoted to curated: I think the linked document is one of the most interesting things to be written in AI Alignment in the last year, and this is the best summary and commentary of it that currently exists. Quality wise, I think everything that I have to say has already been covered by the other commenters, but I overall found reading the linked document, as well as this summary, to be quite helpful in my thinking about AI Alignment, though I also disagree with large parts of it (However, I am not at the research level, and so have a harder time judging how useful it would be for the people who are spending even more time thinking about AI Alignment).
Thanks a lot for writing this summary, and thanks a lot to Eric for all the work he is doing.
I want to draw separate attention to chapter 40 of Drexler's paper, which uses what looks like a novel approach to argue that current supercomputers likely have more raw processing power than a human brain. I find that scary.
From the conclusion of that section:
Many modern AI tasks, although narrow, are comparable to narrow capacities of neural systems in the human brain. Given an empirical value for the fraction of computational resources required to perform that task with humanlike throughput on a 1 PFLOP/s machine, and an inherently uncertain and ambiguous—yet bounded—estimate of the fraction of brain resources required to perform “the equivalent” of that machine task, we can estimate the ratio of PFLOP/s machine capacity to brain capacity. What are in the author’s judgment plausible estimates for each task are consistent in suggesting that this ratio is ~10 or more. Machine learning and human learning differ in their relationship to costs, but even large machine learning costs can be amortized over an indefinitely large number of task-performing systems and application events.
In light of these considerations, we should expect that substantially superhuman computational capacity will accompany the eventual emergence of a software with broad functional competencies. On present evidence, scenarios that assume otherwise seem unlikely.
I'm not completely sure I'm understanding the first paragrap...
I trust past-me to have summarized CAIS much better than current-me; back when this post was written I had just finished reading CAIS for the third or fourth time, and I haven't read it since. (This isn't a compliment -- I read it multiple times because I had a lot of trouble understanding it.)
I've put in two points of my own in the post. First:
...(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and on
I disagree outright with
Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.
Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!
And the deeper reason for that is that we have no idea how to tell what's a hole.
Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow b...
As a note, I belive that FHI is planning to publish a(n edited?) version of this document as an actual book ala Superintelligence: Paths, Dangers, Strategies.
Upvoted. I've long thought that Drexler's work is a valuable contribution to the debate that hasn't received enough attention so far, so it's great to see that this has now been published.
I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, mergin...
The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D.
This conclusion seems similar to the one Paul arrives at here:
In the slow takeoff scenario, pre-AGI systems have a transformative impact that’s only slightly smaller than AGI.
(See also this post from AI Impacts.)
CAIS is a very different take on what transformative AI might look like than the ones I find most intuitive. I think it's really useful to experience a range of different perspectives to break me out of my cached thoughts.
And I'm grateful to Rohin for writing up this summary! I think this kind of thing is a valuable service for spreading these ideas to more people, who don't want to read a 200 page document.
I think the CAIS framing that Eric Drexler proposed gave concrete shape to a set of intuitions that many people have been relying on for their thinking about AGI. I also tend to think that those intuitions and models aren't actually very good at modeling AGI, but I nevertheless think it productively moved the discourse forward a good bit.
In particular I am very grateful about the comment thread between Wei Dai and Rohin, which really helped me engage with the CAIS ideas, and I think were necessary to get me to my current understanding of CAIS and to ...
I see a few criticisms about how this doesn't really solve the problem, it only delays it because we expect a unified agent to outperform the combined services.
It seems to me on the basis of that criticism that this is worth driving as a commercial template anyway. Every R&D dollar that goes into a bounded service is one that doesn't drive specifically for an unbounded agent; every PhD doing development an individual service is not doing development on a unified agent.
We're currently still in the regime where first mover advantage is ov...
What excites me most about Eric's position since I first learned of it is that it provides a framework for safer AI systems that we might otherwise build if we were trying to target AGI. From this perspective it's valuable for setting policy and missions for AI-focused endeavors in such a way that we potentially delay the creation of AGI.
Although it might be argued that this is inevitable (last time I talked to Eric this was the impression that I got; he felt he was laying out some ideas that would happen anyway and was taking the time to explain...
My main objection to this idea is that it is a local solution, and doesn't have built-in mechanisms to become global AI safety solution, that is, to prevent other AIs creation, which could be agential superintelligences. One can try to make "AI police" as a service, but it could be less effective than agential police.
Another objection is probably Gwern's idea that any Tool AI "wants" to become agential AI.
This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.
So what is he saying? We never need to solve the problem of designing a human-friendly superintelligent agent?
Thanks for the summary! I agree that this is missing some extra consideration for programs that are planning / searching at test time. We normally think of Google Maps as non-agenty, "tool-like," "task-directed," etc, but it's performing a search for the best route from A to B, and capable of planning to overcome obstacles - as long as those obstacles are within the ontology of its map of ways from A to B.
A thermostat is dumber than Google Maps, but its data is more closely connected to the real world (local temperature rather than...
I consider it important to further clarify the notion of a bounded utility function.
A deployed neural network has a utility function that can be described as outputting a description of the patterns it sees in its most recent input, according to whatever algorithm it's been trained to apply. It's pretty clear to any expert that the neural network doesn't care about anything beyond a specific set of numbers that it outputs.
A neural network that is in the process of being trained is slightly harder to analyze, but essentially the same. It cares about generat
...You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process [...]
Does this assume that we'll be able to build generally intelligent systems (e.g. the service-creating-service) that optimize for a bounded task?
Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy. ETA: This comment provides updates based on more discussion with Eric.
The Model
The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D) processes. AI researchers consider a problem, define a search space, formulate an objective, and use an optimization technique in order to obtain an AI system, called a service, that performs the task.
A service is an AI system that delivers bounded results for some task using bounded resources in bounded time. Superintelligent language translation would count as a service, even though it requires a very detailed understanding of the world, including engineering, history, science, etc. Episodic RL agents also count as services.
While each of the AI R&D subtasks is currently performed by a human, as AI progresses we should expect that we will automate these tasks as well. At that point, we will have automated R&D, leading to recursive technological improvement. This is not recursive self-improvement, because the improvement comes from R&D services creating improvements in basic AI building blocks, and those improvements feed back into the R&D services. All of this should happen before we get any powerful AGI agents that can do arbitrary general reasoning.
Why Comprehensive?
Since services are focused on particular tasks, you might think that they aren't general intelligence, since there would be some tasks for which there is no service. However, pretty much everything we do can be thought of as a task -- including the task of creating a new service. When we have a new task that we would like automated, our service-creating-service can create a new service for that task, perhaps by training a new AI system, or by taking a bunch of existing services and putting them together, etc. In this way, the collection of services can perform any task, and so as an aggregate is generally intelligent. As a result, we can call this Comprehensive AI Services, or CAIS. The "Comprehensive" in CAIS is the analog of the "General" in AGI. So, we'll have the capabilities of an AGI agent, before we can actually make a monolithic AGI agent.
Isn't this just as dangerous as AGI?
You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process, and so it will not have any of the standard convergent instrumental subgoals (unless the subgoals are helpful for the task before reaching the bound).
In addition, all of the optimization pressure on the service is pushing it towards a particular narrow task. This sort of strong optimization tends to focus behavior. Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task. Think of how a racecar is optimized for speed, while a bus is optimized for carrying passengers, rather than having a "generally capable vehicle".
It's also worth noting what we mean by superintelligent here. In this case, we mean that the service is extremely competent at its assigned task. It need not be learning at all. We see this distinction with RL agents -- when they are trained using something like PPO, they are learning, but at test time you can simply execute them without any PPO and they will perform the behavior they previously learned and won't change that behavior at all.
(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and online learning settings, or even with vanilla RL if the learned policy has access to external memory and can implement a planning process separately from the training procedure.)
On a different note, you might argue that if we analyze the system of services as a whole, then it certainly looks generally intelligent, and so should be regarded as an AGI agent. However, "AGI agent" usually carries the anthropomorphic connotation of VNM rationality / expected utility maximization / goal-directedness. While it seems possible and even likely that each individual service can be well-modeled as VNM rational (albeit with a bounded utility function), it is not the case that a system of VNM rational agents will itself look VNM rational -- in fact, game theory is all about how systems of rational agents have weird behavior.
In addition, there are several aspects of CAIS that make it more safe than a classic monolithic AGI agent. Under CAIS, each service interacts with other services via clearly defined channels of communication, so that the system is interpretable and transparent, even though each service may be opaque. We can reason about what information is present in the inputs to infer what the service could possibly know. We could also provide access to some capability through an external resource during training, so that the service doesn't develop that capability itself.
This interpretability allows us to monitor the service -- for example, we could look at which subservices it accesses in order to make sure it isn't doing anything crazy. But what if having a human in the loop leads to unacceptable delays? Well, this would only happen for deployed applications, where having a human in the loop seems expected, and should also be economically incentivized because it leads to better behavior. Basic AI R&D can continue to be improved autonomously without a human in the loop, so you could still see an intelligence explosion. Note that tactical tasks requiring quick reaction times probably would be delegated to AI services, but the important strategic decisions could still be left in human hands (assisted by AI services, of course).
What happens when we create AGI?
Well, it might not be valuable to create an AGI. We want to perform many different tasks, and it makes sense for these to be done by diverse services. It would not be competitive to include all capabilities in a single monolithic agent. This is analogous to how specialization of labor is a good idea for us humans.
(My opinion: It seems like the lesson of deep learning is that if you can do something end-to-end, that will work better than a structured approach. This has happened with computer vision, natural language processing, and seems to be in the process of happening with robotics. So I don't buy this -- while it seems true that we will get CAIS before AGI since structured approaches tend to be available sooner and to work with less compute, I expect that a monolithic AGI agent would outperform CAIS at most tasks once we can make one.)
That said, if we ever do build AGI, we can leverage the services from our CAIS-world in order to make it safe. We could use superintelligent security services to constrain any AGI agent that we build. For example, we could have services trained to identify long-term planning processes and to perform adversarial testing and red teaming.
Safety in the CAIS world
While CAIS suggests that we will not have AGI agents, this does not mean that we automatically get safety. We will still have AI systems that take high impact actions, and if they take even one wrong action of this sort it could be catastrophic. One way this could happen is if the system of services starts to show agentic behavior -- our standard AI safety work could apply to this scenario.
In order to ensure safety, we should have AI safety researchers figure out and codify the best development practices that need to be followed. For example, we could try to always use predictive models of human (dis)approval as a sanity check on any plan that is being enacted. We could also train AI services that can adversarially check new services to make sure they are safe.
Summary
The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D. This reframes the problems of AI safety and has implications for what technical safety researchers should be doing.
ETA: This comment provides updates based on more discussion with Eric.