What if AI doesn't quite go FOOM?

Mass_Driver

LESSWRONG
LW

What if AI doesn't quite go FOOM? — LessWrong

16 What if AI doesn't quite go FOOM?

by Mass_Driver

20th Jun 2010

6 min read

191

16

Intro

This article seeks to explore possible futures in a world where artificial intelligence turns out NOT to be able to quickly, recursively self-improve so as to influence our world with arbitrarily large strength and subtlety, i.e, "go FOOM." Note that I am not arguing that AI won't FOOM. Eliezer has made several good arguments for why AI probably will FOOM, and I don't necessarily disagree. I am simply calling attention to the non-zero probability that it won't FOOM, and then asking what we might do to prepare for a world in which it doesn't.

Failure Modes

I can imagine three different ways in which AI could fail to FOOM in the next 100 years or so. Option 1 is a "human fail." Option 1 means we destroy ourselves or succumb to some other existential risk before the first FOOM-capable AI boots up. I would love to hear in the comments section about (a) which existential risks people think are most likely to seriously threaten us before the advent of AI, and (b) what, if anything, a handful of people with moderate resources (i.e., people who hang around on Less Wrong) might do to effectively combat some of those risks.

Option 2 is a "hardware fail." Option 2 means that Moore's Law turns out to have an upper bound; if physics doesn't show enough complexity beneath the level of quarks, or if quantum-sized particles are so irredeemably random as to be intractable for computational purposes, then it might not be possible for even the most advanced intelligence to significantly improve on the basic hardware design of the supercomputers of, say, the year 2020. This would limit the computing power available per dollar, and so the level of computing power required for a self-improving AI might not be affordable for generations, if ever. Nick Bostrom has some interesting thoughts along these lines, ultimately guessing (as of 2008) that the odds of a super-intelligence forming by 2033 was less than 50%.

Option 3 is a "software fail." Option 3 means that *programming* efficiency turns out to have an upper bound; if there are natural information-theoretical limits on how efficiently a set number of operations can be used to perform an arbitrary task, then it might not be possible for even the most advanced intelligence to significantly improve on its basic software design; the supercomputer would be more than 'smart' enough to understand itself and to re-write itself, but there would simply not *be* an alternate script for the source code that was actually more effective.

These three options are not necessarily exhaustive; they are just the possibilities that have immediately occurred to me, with some help from User: JoshuaZ.

"Superintelligent Enough" AI

An important point to keep in mind is that even if self-improving AI faces hard limits before becoming arbitrarily powerful, AI might still be more than powerful enough to effortlessly dominate future society. I am sure my numbers are off by many orders of magnitude, but by way of illustration only, suppose that current supercomputers run at a speed of roughly 10^20 ops/second, and that successfully completing Eliezer's coherent extrapolated volition project would require a processing speed of roughly 10^36 ops/second. There is obviously quite a lot of space here for a miniature FOOM. If one of today's supercomputers starts to go FOOM and then hits hard limits at 10^25 ops/second, it wouldn't be able to identify humankind's CEV, but it might be able to, e.g, take over every electronic device capable of receiving transmissions, such as cars, satellites, and first-world factories. If this happens around the year 2020, a mini-FOOMed AI might also be able to take over homes, medical prosthetics, robotic soldiers, and credit cards.

Sufficient investments in security and encryption might keep such an AI out of some corners of our economy, but right now, major operating systems aren't even proof against casual human trolls, let alone a dedicated AI thinking at faster-than-human speeds. I do not understand encryption well, and so it is possible that some plausible level of investment in computer security could, contrary to my assumptions, actually manage to protect human control over individual computers for the foreseeable future. Even if key industrial resources were adequately secured, though, a moderately super-intelligent AI might be capable of modeling the politics of current human leaders well enough to manipulate them into steering Earth onto a path of its choosing, as in Issac Asimov's The Evitable Conflict.

If enough superintelligences develop at close enough to the same moment in time and have different enough values, they might in theory reach some sort of equilibrium that does not involve any one of them taking over the world. As Eliezer has argued (scroll down to 2nd half of the linked page), though, the stability of a race between intelligent agents should mostly be expected to *decrease* as those agents swallow their own intellectual and physical supply chains. If a supercomputer can take over larger and larger chunks of the Internet as it gets smarter and smarter, or if a supercomputer can effectively control what happens in more and more factories as it gets smarter and smarter, then there's less and less reason to think that supercomputing empires will "grow" at roughly the same pace -- the first empire to grow to a given size is likely to grow faster than its rivals until it takes over the world. Note that this could happen even if the AI is nowhere near smart enough to start mucking about with uploaded "ems" or nanoreplicators. Even in a boringly normal near-future scenario, a computer with even modest self-improvement and self-aggrandizement capabilities might be able to take over the world. Imagine something like the ending to David Brin's Earth, stripped of the mystical symbolism and the egalitarian optimism.

Ensuring a "Nice Place to Live"

I don't know what Eliezer's timeline is for attempting to develop provably Friendly AI, but it might be worthwhile to attempt to develop a second-order stopgap. Eliezer's CEV is supposed to function as a first-order stopgap; it won't achieve all of our goals, but it will ensure that we all get to grow up in a Nice Place to Live while we figure out what those goals are. Of course, that only happens if someone develops a CEV-capable AI. Eliezer seems quite worried about the possibility that someone will develop a FOOMing unFriendly AI before Friendly AI can get off the ground, but is anything being done about this besides just rushing to finish Friendly AI?

Perhaps we need some kind of mini-FOOMing marginally Friendly AI whose only goal is to ensure that nothing seizes control of the world's computing resources until SIAI can figure out how to get CEV to work. Although no "utility function" can be specified for a general AI without risking paper-clip tiling, it might be possible to formulate a "homeostatic function" at relatively low risk. An AI that "valued" keeping the world looking roughly the way it does now, that was specifically instructed *never* to seize control of more than X number of each of several thousand different kinds of resources, and whose principal intended activity was to search for, hunt down, and destroy AIs that seemed to be growing too powerful too quickly might be an acceptable risk. Even if such a "shield AI" were not provably friendly, it might pose a smaller risk of tiling the solar system than the status quo, since the status quo is full of irresponsible people who like to tinker with seed AIs.

An interesting side question is whether this would be counterproductive in a world where Failure Mode 2 (hard limits on hardware) or Failure Mode 3 (hard limits on software) were serious concerns. Assuming that, eventually, a provably friendly AI can be developed, then, several years after that, it's likely that millions of people can be convinced that it would be really good to activate the provably friendly AI, and humans might be able to dedicate enough resources to specifically overcome the second-order stopgap "shield AI" that was knocking out other people's un-provably Friendly AIs. But if the shield AI worked too well and got too close to the hard upper bound on the power of an AI, then it might not be possible to unmake the shield, even with added resources and with no holds barred.