Existential risk from AI without an intelligence explosion

AlexMennen

[xpost from my blog]

In discussions of existential risk from AI, it is often assumed that the existential catastrophe would follow an intelligence explosion, in which an AI creates a more capable AI, which in turn creates a yet more capable AI, and so on, a feedback loop that eventually produces an AI whose cognitive power vastly surpasses that of humans, which would be able to obtain a decisive strategic advantage over humanity, allowing it to pursue its own goals without effective human interference. Victoria Krakovna points out that many arguments that AI could present an existential risk do not rely on an intelligence explosion. I want to look in sightly more detail at how that could happen. Kaj Sotala also discusses this.

An AI starts an intelligence explosion when its ability to create better AIs surpasses that of human AI researchers by a sufficient margin (provided the AI is motivated to do so). An AI attains a decisive strategic advantage when its ability to optimize the universe surpasses that of humanity by a sufficient margin. Which of these happens first depends on what skills AIs have the advantage at relative to humans. If AIs are better at programming AIs than they are at taking over the world, then an intelligence explosion will happen first, and it will then be able to get a decisive strategic advantage soon after. But if AIs are better at taking over the world than they are at programming AIs, then an AI would get a decisive strategic advantage without an intelligence explosion occurring first.

Since an intelligence explosion happening first is usually considered the default assumption, I'll just sketch a plausibility argument for the reverse. There's a lot of variation in how easy cognitive tasks are for AIs compared to humans. Since programming AIs is not yet a task that AIs can do well, it doesn't seem like it should be a priori surprising if programming AIs turned out to be an extremely difficult task for AIs to accomplish, relative to humans. Taking over the world is also plausibly especially difficult for AIs, but I don't see strong reasons for confidence that it would be harder for AIs than starting an intelligence explosion would be. It's possible that an AI with significantly but not vastly superhuman abilities in some domains could identify some vulnerability that it could exploit to gain power, which humans would never think of. Or an AI could be enough better than humans at forms of engineering other than AI programming (perhaps molecular manufacturing) that it could build physical machines that could out-compete humans, though this would require it to obtain the resources necessary to produce them.

Furthermore, an AI that is capable of producing a more capable AI may refrain from doing so if it is unable to solve the AI alignment problem for itself; that is, if it can create a more intelligent AI, but not one that shares its preferences. This seems unlikely if the AI has an explicit description of its preferences. But if the AI, like humans and most contemporary AI, lacks an explicit description of its preferences, then the difficulty of the AI alignment problem could be an obstacle to an intelligence explosion occurring.

It also seems worth thinking about the policy implications of the differences between existential catastrophes from AI that follow an intelligence explosion versus those that don't. For instance, AIs that attempt to attain a decisive strategic advantage without undergoing an intelligence explosion will exceed human cognitive capabilities by a smaller margin, and thus would likely attain strategic advantages that are less decisive, and would be more likely to fail. Thus containment strategies are probably more useful for addressing risks that don't involve an intelligence explosion, while attempts to contain a post-intelligence explosion AI are probably pretty much hopeless (although it may be worthwhile to find ways to interrupt an intelligence explosion while it is beginning). Risks not involving an intelligence explosion may be more predictable in advance, since they don't involve a rapid increase in the AI's abilities, and would thus be easier to deal with at the last minute, so it might make sense far in advance to focus disproportionately on risks that do involve an intelligence explosion.

It seems likely that AI alignment would be easier for AIs that do not undergo an intelligence explosion, since it is more likely to be possible to monitor and do something about it if it goes wrong, and lower optimization power means lower ability to exploit the difference between the goals the AI was given and the goals that were intended, if we are only able to specify our goals approximately. The first of those reasons applies to any AI that attempts to attain a decisive strategic advantage without first undergoing an intelligence explosion, whereas the second only applies to AIs that do not undergo an intelligence explosion ever. Because of these, it might make sense to attempt to decrease the chance that the first AI to attain a decisive strategic advantage undergoes an intelligence explosion beforehand, as well as the chance that it undergoes an intelligence explosion ever, though preventing the latter may be much more difficult. However, some strategies to achieve this may have undesirable side-effects; for instance, as mentioned earlier, AIs whose preferences are not explicitly described seem more likely to attain a decisive strategic advantage without first undergoing an intelligence explosion, but such AIs are probably more difficult to align with human values.

If AIs get a decisive strategic advantage over humans without an intelligence explosion, then since this would likely involve the decisive strategic advantage being obtained much more slowly, it would be much more likely for multiple, and possibly many, AIs to gain decisive strategic advantages over humans, though not necessarily over each other, resulting in a multipolar outcome. Thus considerations about multipolar versus singleton scenarios also apply to decisive strategic advantage-first versus intelligence explosion-first scenarios.

You might also be interested in this article by Kaj Sotala: http://kajsotala.fi/2016/04/decisive-strategic-advantage-without-a-hard-takeoff/

Even though you are writing about the exact same subject, there is (as far as I can tell) no substantial overlap with the points you highlight. Kaj Sotala titled his blog post "(Part 1)" but never wrote a subsequent part.

Thanks for pointing out that article. I have added a reference to it.

Another big difference is if there's no intelligence explosion, we're probably not talking about a singleton. If someone manages to create an AI that's, say, roughly human level intelligence (probably stronger in some areas and weaker in others, but human-ish on average) and progress slows or stalls after that, then the most likely scenario is that a lot of those human-level AI's would be created and sold for different purposes all over the world. We would probably be dealing with a complex world that has a lot of different AI's and humans interacting with each other. That could create it's own risks, but they would probably have to be handled in a different way.

Good point. This seems like an important oversight on my part, so I added a note about it.

Thanks.

One more point you might want to mention, is that in a world with AI but no intelligent explosion, where AI's are not able to rapidly develop better AI's, augmented human intelligence through various transhuman technologies and various forms of brain/computer interfaces could be a much more important factor; that kind of technology could allow humans to "keep up with" AI's (at least for a time), and it's possible that humans and AI's working together on tasks could remain competitive with pure AI's for a significant time period.

There could be many ways how an AI will produce human extinction without undergoing intelligent explosion. Even a relatively simple computer program, which helps biohacker to engineer new deadly biological viruses in droves could kill everybody.

I tried to list different ways how AI could kill humanity here:

http://lesswrong.com/lw/mgf/a_map_agi_failures_modes_and_levels/

and now working in transforming this map into a proper article. The draft is ready.

One possibility would for the malign intelligence to take over the world would be to orchestrate a nuclear war and be sufficiently hardened/advanced that it could survive and develop more quickly in the aftermath.

I personally don't think writing down a goal gives us any predictability without a lot of work, which may or may not be possible. Specifying a goal assumes that the AIs perceptual/classification systems chops the world in the same way we would (which we don't have a formal specification of, and changes over time). We would also need to solve the ontology identification problem.

I'm of the opinion that intelligence might need to be self-programming on a micro subconscious level, which might make self-improvement hard on a macro level. So i think we should plan for non-fooming scenarios.

Now existing AI systems are very well in winning in war-like strategic games, like chess and Go, and already reached superhuman performance in them. Military strategic planning and geopolitics could be seen as such a game, and AI able to win in it seems imaginable even on current capabilities.

I also agree that self-improving AI may choose not create its new version, because of the difficulty to solve aligning problem on the new level. In that case it would choose evolutionary development path, which means slower capability gain. I wrote a draft of a paper about levels of self-improvement, where I look in such obstacles in details. I а you are interested, I could share it with you.

AI is good at well-defined strategy games, but (so far) bad at understanding and integrating real-world constraints. I suspect that there are already significant efforts to use narrow AI to help humans with strategic planning, but that these remain secret. For an AGI to defeat that sort of human-computer combination would require considerably superhuman capabilities, which means without an intelligence explosion it would take a great deal of time and resources.

If AI will be able to use humans as outsourced form of intuition like in Mechanical Turk, it may be able to play such games with much less own intelligence.

Such game may resemble Trump's election campaign, where cyberweapons, fake news and internet memes was used by some algorithm. There was some speculation about it: https://scout.ai/story/the-rise-of-the-weaponized-ai-propaganda-machine

We already see superhuman performance in war-simulating games, but nothing like it in AI self-improving.

Mildly superhuman capabilities may be reached without intelligence explosion by the low-level accumulation of hardware, training and knowledge.

There was some speculation about it: https://scout.ai/story/the-rise-of-the-weaponized-ai-propaganda-machine

When I read "Cambridge Analytica isn’t the only company that could pull this off -- but it is the most powerful right now." I immediately think "citation needed".

Eric Schmidt funded multiple companies to provide technology to get Hillary elected.

There are many programs which play Go, but only one currently with superhuman performance.

On the Go side, the program with the superhuman performance is run by Eric Schmidt's company.

What makes you think that Eric Schmidt's people aren't the best in the other domain as well?

The fact that H lost?

But in fact, I don't want to derail the discussion about AI's possible decisive advantage in the future in the conspiracy looking discussion about past elections, which I mentioned as a possible example of strategic games, but not as the fact proving that such AI actually exists.

The fact that H lost?

That argument feels circular in nature. You believe that Trump won because of a powerful computer model, simply because Trump won and he was supported by a computer model.

One the other hand, you have a tech billionaire who's gathering top programmers to fight. On the other hand, you have a company that has to be told by the daughter of that tech-billionaire what software they should use.

Who's press person said they worked for the leave-campaign and who's CEO is currently on the record for never having worked for the leave-campaign, neither paid nor unpaid.

From a NYTimes article:

But Cambridge’s psychographic models proved unreliable in the Cruz presidential campaign, according to Rick Tyler, a former Cruz aide, and another consultant involved in the campaign. In one early test, more than half the Oklahoma voters whom Cambridge had identified as Cruz supporters actually favored other candidates. The campaign stopped using Cambridge’s data entirely after the South Carolina primary.

There's a lot of irony in the fact that Cambridge Analytica seems to be better at telling spin about its amazing abilities of political manipulation in an untargeted way, than they are actually at helping political campaign.

I just saw on scout.ai's about page that they see themselves as being in the science fiction business. Maybe I should be less hard on them.

I want to underline again that the fact that I discuss a possibility doesn't mean that I believe in it. The winning is evidence of intelligent power but given prior about its previous failures, it may be not strong evidence.

Geopolitical forecasting requires you to build a good model of the conflict that you care about. Once you do have a model you can feed the model into a computer like the Bruce Bueno de Mesquita does and the computer might do better at calculating the optimal move. I don't think that current existing AI system are up to the task of modeling a complicated geopolitical event.

I also don't think that it is now possible to model full geopolitics, but if some smaller but effective model of it will be created by humans, it may be used by AI.

Bruce Bueno de Mesquita seems to be of the opinion that even 20 years ago computer models outperformed humans once the modeling is finished but modeling seems crucial.

In his 2008 book, he advocates that the best move for Israel/Palestine would be to make a treaty that requires the two countries to share tourism revenue which each other. That's not the kind of move that an AI like DeepMind would produce without a human coming up with the move beforehand.

So it looks like that if model creation job could be at least partly automated, it would give a strategic advantage in business, politics and military planning.

and AI able to win in it seems imaginable even on current capabilities

Not on current capabilities.For one thing, the set of possible moves is undefined or very very large.

sure, "dumb" AI helping humanimals to amplify the detrimental consequences of their DeepAnimalistic brain reward functions is actually THE risk for the normal evolutionary step, called Singularity (in the Grand Theatre of the Evolution of Intelligence the only purpose of our humanimal stage is to create our successor before reaching the inevitable stage of self-destruction with possible planet-wide consequences)

This seems like something we should talk about more.

Although, afaik there shouldn't be a decision between motivation selection and capability controlling measures – the former is obviously the more important part, but you can also always "box" the AI in addition (insofar as that's compatible with what you want it to do).

If AI will be able to use humans as outsourced form of intuition like in Mechanical Turk, it may be able to play such games with much less own intelligence.

We already see superhuman performance in war-simulating games, but nothing like it in AI self-improving.

Mildly superhuman capabilities may be reached without intelligen... (read more)