6.7 Systems composed of rational agents need not maximize a utility function There is no canonical way to aggregate utilities over agents, and game theory shows that interacting sets of rational agents need not achieve even Pareto optimality.
Is [underlined] true? I know it's true if you have agents following CDT, but does it still hold if agents follow FDT? (I think if you say 'rational' it should not mean 'CDT' since CDT is strictly worse than FDT).
a realistic example where I expect the delay would generate a strong incentive for using an agent AGI
I'd guess high speed stock trading. Right now, we already have AI trading stock to maximize profits over significant time horizons way faster than humans can effectively supervise.
We might already have examples of these AIs being misaligned and causing harm. (Maybe.) The 2010 Flash Crash is poorly understood, and few blame it entirely on high frequency trading algorithms. But regulators say that HFTs operating without human supervision were "clearly a contributing factor" to the crash because:
To be fair, others say that HFTs were a big part of why the crash was quickly reversed and the market returned to normal.
In any case, all of this happened without any human supervision, and was so opaque that we still don't understand what happened. That seems like evidence for opaque, unsupervised AIs with broad goals.
Yeah, I worry that competitive pressure could convince people to push for unsafe systems. Military AI seems like an especially risky case. Military goals are harder to specify than "maximize portfolio value", but there are probably reasonable proxies, and as AI gets more capable and more widely used there's a strong incentive to get ahead of the competition.
As I was writing the last few paragraphs, and thinking about Wei Dei's objections, I found it hard to clearly model how CAIS would handle the cancer example.
This link appears to be broken. It directs me to https://www.lesswrong.com/posts/x3fNwSe5aWZb5yXEG/reframing-superintelligence-comprehensive-ai-services-as/comment/gMZes7XnQK8FHcZsu, which does not seem to exist.
Replacing the /comment/ part with a # gives https://www.lesswrong.com/posts/x3fNwSe5aWZb5yXEG/reframing-superintelligence-comprehensive-ai-services-as#gMZes7XnQK8FHcZsu, which does work.
(Also it should be "Dai", not "Dei".)
I. Introduction
I watched a 40 minute video by Eric Drexler called Reframing Superintelligence. I take it the content somewhat overlaps with his PDF book of the same name.
In it he expresses the opinion that under the Comprehensive Artificial Intelligence Services model, high level actors will not try to take over the world, because there is a serious risk of being stopped, with a bad outcome for the aggressor.
Dr. Drexler made the point that a subgoal of 'take over the world' is 'overthrow the government of China', and that they would come after you if you made a "credible" attempt.
II. Eric Drexler's CAIS scheme may fail to solve at least 4 distinct problems.
1. Development of AI may be asymmetric, undermining deterrence.
I think this is somewhat naive. Picture the world as a body of water, with a temperature that represents the degree of advancement in AI capablities. Dr. Drexler's AI services idea would raise the temperature of the whole world until it exceeds the boiling point (e.g. is superintelligent), but avoid nucleation (an attempt to dominate the world). The premise seems to be that anyone with high-level access to CAIS can make such an attempt, but they will be circumspect about doing so because others are just as advanced, and could defeat and punish them.
What this misses is what I'll call symmetry breaking. Suppose that the United States makes a non-trivial advance in the hardware used to run neural networks, allowing most neural networks to be run more efficiently. This in turn allows research and development of neural nets much closer to human-brain-scale.
Suppose the U.S. classifies this advance, retaining the fruits of the ensuing research for itself.
The symmetry between the United States and other countries is thus broken, first by hardware, and then by software. The U.S. thus can use CAIS with relative impunity as soon as it attains superintelligence. Even if the CAIS software leaked across borders, the U.S. alone would have the hardware to run it.
Of course the impunity is relative, because using CAIS to dominate the world is still risky for the dominator, but if the U.S. is feeling lucky, China won't be able to stop it. Hence Dr. Drexler's argument for the stability of CAIS fails in the presence of symmetry breakers.
2. Even if development is symmetric, there may be a strong first-mover incentive to take over the world.
To reiterate, Dr. Drexler's hope seems to be that if every major power has access to CAIS, mutual deterrence will prevent hostile use. If the symmetric state can be reached, this might be the case. But it might not. What if CAIS reported to whoever used it that there was a certainty of success in world domination for whoever acted first? The only way out might be the domination of the world by a hegemon (human or artificial) tasked with preventing the domination of the world by anyone else!
An alternative scenario is semi-stablity, where a world-domination attempt by a first-mover may be thwarted by a second-mover, but the second-mover must at least temporarily dominate the world in order to do so (e.g. by scouring the world of the first-mover's nanobots with the second-mover's nanobots.)
3. Even if there isn't such an incentive, aggression may be hard to define, thus deterrence may be difficult to implement.
What if what must be deterred are not attempts to dominate the world only, but also lesser disruptive goals? Disruptive goals can be classed in order of degree of disruption (from destruction to minimal disturbance) and scope of disruption (from universal to personal). It is not clear what lines to draw, and where.
4. Even if deterrence can be implemented, the system may usher in a techno-oligarcy.
There is another argument against CAIS. It is not simply that it can be misused (Dr. Drexler has already acknowledged that, and the alternative AGI model can be misused too.) Rather, CAIS subjects the world utterly to human will, and not just any human will, but the will of a selected set, with higher permissions being held by fewer people.
If full CAIS are limited to a few, then it seems to me that people will be ruled by immortal potentates whom they will have no chance overthrow, unless the potentates give them that chance voluntarily. I don't know whether A.I.-U.S.A. would be livable, but I wouldn't want to live in A.I.-People's Republic of China.
III. Conclusion
1. The alternative to techno-oligarchy may be a singleton, which is what was supposed to be avoided in the first place.
Because CAIS are comprehensive, a person holding full permissions will be able to attempt to do anything, including trying to turn the world into paperclips. Unless, that is, they are under constant surveillance by CAIS not their own, or their CAIS won't obey, which suggests an agent behind the disobedience.
Either most people won't have full CAIS permissions, or the entire reachable universe will have to be wired with protective systems, lest al Qaeda send a von Neumann probe to Jupiter disguised as an innocent science project. At some point, this would imply the deployment of systems that are capable of making decisions on their own, or in other words AGI.
2. Not developing strong AI at all may be the only good option.
Eric Drexler has published a book-length paper on AI risk, describing an approach that he calls Comprehensive AI Services (CAIS).
His primary goal seems to be reframing AI risk discussions to use a rather different paradigm than the one that Nick Bostrom and Eliezer Yudkowsky have been promoting. (There isn't yet any paradigm that's widely accepted, so this isn't a Kuhnian paradigm shift; it's better characterized as an amorphous field that is struggling to establish its first paradigm). Dueling paradigms seems to be the best that the AI safety field can manage to achieve for now.
I'll start by mentioning some important claims that Drexler doesn't dispute:
Drexler likely disagrees about some of the claims made by Bostrom / Yudkowsky on those points, but he shares enough of their concerns about them that those disagreements don't explain why Drexler approaches AI safety differently. (Drexler is more cautious than most writers about making any predictions concerning these three claims).
CAIS isn't a full solution to AI risks. Instead, it's better thought of as an attempt to reduce the risk of world conquest by the first AGI that reaches some threshold, preserve existing corrigibility somewhat past human-level AI, and postpone need for a permanent solution until we have more intelligence.
Stop Anthropomorphising Intelligence!
What I see as the most important distinction between the CAIS paradigm and the Bostrom / Yudkowsky paradigm is Drexler's objection to having advanced AI be a unified, general-purpose agent.
Intelligence doesn't require a broad mind-like utility function. Mindspace is a small subset of the space of intelligence.
Instead, Drexler suggests composing broad AI systems out of many, diverse, narrower-purpose components. Normal software engineering produces components with goals that are limited to a specific output. Drexler claims there's no need to add world-oriented goals that would cause a system to care about large parts of spacetime.
Systems built out of components with narrow goals don't need to develop much broader goals. Existing trends in AI research suggest that better-than-human intelligence can be achieved via tools that have narrow goals.
Drexler's main example of narrow goals is Google's machine translation, which has no goals beyond translating the next unit of text. That doesn't imply any obvious constraint on how sophisticated its world-model can be. It would be quite natural for AI progress continue with components whose "utility function" remains bounded like this.
It looks like this difference between narrow and broad goals can be turned into a fairly rigorous distinction, but I'm dissatisfied with available descriptions of the distinction. (I'd also like better names for them.)
There are lots of clear-cut cases: narrow-task software that just waits for commands, and on getting a command, it produces a result, then returns to its prior state; versus a general-purpose agent which is designed to maximize the price of a company's stock.
But we need some narrow-task software to remember some information, and once we allow memory, it gets complicated to analyze whether the software's goal is "narrow".
Drexler seems less optimistic than I am about clarifying this distinction:
It may be true that a bright line can't be explained clearly to laymen, but I have a strong intuition that machine learning (ML) developers will be able to explain it to each other well enough to agree on how to classify the cases that matter.
A Nanotech Analogy
Drexler originally described nanotechnology in terms of self-replicating machines.
Later, concerns about grey goo caused him to shift his recommendations toward a safer strategy, where no single machine would be able to replicate itself, but where the benefits of nanotechnology could be used recursively to improve nanofactories.
Similarly, some of the more science-fiction style analyses suggest that an AI with recursive self-improvement could quickly conquer the world.
Drexler's CAIS proposal removes the "self-" from recursive self-improvement, in much the same way that nanofactories removed the "self-" from nanobot self-replication, replacing it with a more decentralized process that involves preserving more features of existing factories / AI implementations. The AI equivalent of nanofactories consists of a set of AI services, each with a narrow goal, which coordinate in ways that don't qualify as a unified agent.
It sort of looks like Drexler's nanotech background has had an important influence on his views. Eliezer's somewhat conflicting view seems to follow a more science-fiction-like pattern of expecting one man to save (or destroy?) the world. And I could generate similar stories for mainstream AI researchers.
That doesn't suggest much about who's right, but it does suggest that people are being influenced by considerations that are only marginally relevant.
How Powerful is CAIS
Will CAIS be slower to develop than recursive self-improvement? Maybe. It depends somewhat on how fast recursive self-improvement is.
I'm uncertain whether to believe that human oversight is compatible with rapid development. Some of that uncertainty comes from confusion about what to compare it to (an agent AGI that needs no human feedback? or one that often asks humans for approval?).
Some people expect unified agents to be more powerful than CAIS. How plausible are their concerns?
Some of it is disagreement over the extent to which human-level AI will be built with currently understood techniques. (See Victoria Krakovna's chart of what various people believe about this).
Could some of it be due to analogies to people? We have experience with some very agenty businessmen (e.g. Elon Musk or Bill Gates), and some bureaucracies made up of not-so-agenty employees (the post office, or Comcast). I'm tempted to use the intuitions I get from those examples to conclude that an unified agent AI will be more visionary and eager to improve. But I worry that doing so anthropomorphises intelligence in a way that misleads, since I can't say anything more rigorous than "these patterns look relevant".
But if that analogy doesn't help, then the novelty of the situation hints we should distrust Drexler's extrapolation from standard software practices (without placing much confidence in any alternative).
Cure Cancer Example
Drexler wants some limits on what gets automated. E.g. he wants to avoid a situation where an AI is told to cure cancer, and does so without further human interaction. That would risk generating a solution for which the system misjudges human approval (e.g. mind uploading or cryonic suspension).
Instead, he wants humans to decompose that into narrower goals (with substantial AI assistance), such that humans could verify that the goals are compatible with human welfare (or reject those that are too hard too evaluate).
This seems likely to delay cancer cures compared to what an agent AGI would do, maybe by hours, maybe by months, as the humans check the subtasks. I expect most people would accept such a delay as a reasonable price for reducing AI risks. I haven't thought of a realistic example where I expect the delay would generate a strong incentive for using an agent AGI, but the cancer example is close enough to be unsettling.
This analysis is reassuring compared to Superintelligence, but not as reassuring as I'd like.
As I was writing the last few paragraphs, and thinking about Wei Dai's objections, I found it hard to clearly model how CAIS would handle the cancer example.
Some of Wei Dai's objections result from a disagreement about whether agent AGI has benefits. But his objections suggest other questions, for which I needed to think carefully in order to guess how Drexler would answer them: How much does CAIS depend on human judgment about what tasks to give to a service? Probably quite heavily, in some cases. How much does CAIS depend on the system having good estimates of human approval? Probably not too much, as long as experts are aware of how good those estimates are, and are willing and able to restrict access to some relatively risky high-level services.
I expect ML researchers can identify a safe way to use CAIS, but it doesn't look very close to an idiot-proof framework, at least not without significant trial and error. I presume there will in the long run be a need for an idiot-proof interface to most such services, but I expect those to be developed later.
What Incentives will influence AI Developers?
With grey goo, it was pretty clear that most nanotech developers would clearly prefer the nanofactory approach, due to it being safer, and having few downsides.
With CAIS, the incentives are less clear, because it's harder to tell whether there will be benefits to agent AGI's.
Much depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of Drexler's analysis.
If I knew that AI required expensive hardware, I might be confident that the first human-level AI's would be developed at large, relatively risk-averse institutions.
But Drexler has a novel(?) approach (section 40) which suggests that existing supercomputers have about human-level raw computing power. That provides a reason for worrying that a wider variety of entities could develop powerful AI.
Drexler seems to extrapolate current trends, implying that the first entity to generate human-level AI will look like Google or OpenAI. Developers there seem likely to be sufficiently satisfied with the kind of intelligence explosion that CAIS seems likely to produce that it will only take moderate concern about risks to deter them from pursuing something more dangerous.
Whereas a poorly funded startup, or the stereotypical lone hacker in a basement, might be more tempted to gamble on an agent AGI. I have some hope that human-level AI will require a wide variety of service-like components, maybe too much for a small organization to handle. But I don't like relying on that.
Presumably the publicly available AI services won't be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer
I'm unsure where Siri and Alexa fit in this framework. Their designers have some incentive to incorporate goals that extend well into the future, in order to better adapt to individual customers, by improving their models of each customers desires. I can imagine that being fully compatible with a CAIS approach, but I can also imagine them being given utility functions that would cause them to act quite agenty.
How Valuable is Modularity?
CAIS may be easier to develop, since modularity normally makes software development easier. On the other hand, modularity seems less important for ML. On the gripping hand, AI developers will likely be combining ML with other techniques, and modularity seems likely to be valuable for those systems, even if the ML parts are not modular. Section 37 lists examples of systems composed of both ML and traditional software.
How much less important is modularity for ML? A typical ML system seems to do plenty of re-learning from scratch, when we could imagine it delegating tasks to other components. On the other hand, ML developers seem to be fairly strongly sticking to the pattern of assigning only narrow goals to any instance of an ML service, typically using high-level human judgment to integrate that with other parts.
I expect robocars to provide a good test of how much ML is pushing software development away from modularity. I'd expect if CAIS is generally correct, a robocar would have more than 10 independently trained ML modules integrated into the main software that does the driving, whereas I'd expect less than 10 if Drexler were wrong about modularity. My cursory search did not find any clear answer - can anyone resolve this?
I suspect that most ML literature tends to emphasize monolithic software because that's easier to understand, and because those papers focus on specific new ML features, to which modularity is not very relevant.
Maybe there's a useful analogy to markets - maybe people underestimate CAIS because very decentralized systems are harder for people to model. People often imagine that decentralized markets are less efficient that centralized command and control, and only seem to tolerate markets after seeing lots of evidence (e.g. the collapse of communism). On the other hand, Eliezer and Bostrom don't seem especially prone to underestimate markets, so I have low confidence that this guess explains much.
Alas, skepticism of decentralized systems might mean that we're doomed to learn the hard way that the same principles apply to AI development (or fail to learn, because we don't survive the first mistake).
Transparency?
MIRI has been worrying about the opaqueness of neural nets and similar approaches to AI, because it's hard to evaluate the safety of a large, opaque system. I suspect that complex world-models are inherently hard to analyze. So I'd be rather pessimistic if I thought we needed the kind of transparency that MIRI hopes for.
Drexler points out that opaqueness causes fewer problems under the CAIS paradigm. Individual components may often be pretty opaque, but interactions between components seem more likely to follow a transparent protocol (assuming designers value that). And as long as the opaque components have sufficiently limited goals, the risks that might hide under that opaqueness are constrained.
Transparent protocols enable faster development by humans, but I'm concerned that it will be even faster to have AI's generating systems with less transparent protocols.
Implications
The differences between CAIS and agent AGI ought to define a threshold, which could function as a fire alarm for AI experts. If AI developers need to switch to broad utility functions in order to compete, that will provide a clear sign that AI risks are high, and that something's wrong with the CAIS paradigm.
CAIS indicates that it's important to have a consortium of AI companies to promote safety guidelines, and to propagate a consensus view on how to stay on the safe side of the narrow versus broad task threshold.
CAIS helps reduce the pressure to classify typical AI research as dangerous, and therefore reduces AI researcher's motivation to resist AI safety research.
Some implications for AI safety researchers in general: don't imply that anyone knows whether recursive self-improvement will beat other forms of recursive improvement. We don't want to tempt AI researchers to try recursive self-improvement (by telling people it's much more powerful). And we don't want to err much in the other direction, because we don't want people to be complacent about the risks of recursive self-improvement.
Conclusion
CAIS seems somewhat more grounded in existing software practices than, say, the paradigm used in Superintelligence, and provides more reasons for hope. Yet it provides little reason for complacency:
I see important uncertainty in whether CAIS will be as fast and efficient as agent AGI, and I don't expect any easy resolution to that uncertainty.
This paper is a good starting point, but we need someone to transform it into something more rigorous.
CAIS is sufficiently similar to standard practices that it doesn't require much work to attempt it, and creates few risks.
I'm around 50% confident that CAIS plus a normal degree of vigilance by AI developers will be sufficient to avoid global catastrophe from AI.