An AI Takeover Thought Experiment

Gavin

Content Note: Detailed description of an AI taking over the world. Could reasonably be accused of being just a scary story. But it does come out with some predictions and possible safety prescriptions.

This post started as a response to Katja Grace’s Request for concrete AI takeover mechanisms, but grew into something too long for a comment and too late to be part of that post.

Katja's post seems to have the underlying assumption that an AI needs a tricky way to instantiate itself into the physical world. I will examine what an escaped rogue AI would want to do to ensure its survival, given restrictive assumptions. I suspect that creating a synthesized protein "beachhead" or other means of instantiating a minor physical presence for the AI is a human-centric viewpoint. We instinctively feel that any being will want to convert itself into a mobile physical presence, and that this will be a sudden tipping point in how real and dangerous it will be. Going step-by-step through an AI takeover scenario, this assumption seems unwarranted.

This scenario assumes an AI with an almost godlike ability to bootstrap itself quickly with basically no resources, and to make great leaps in science with very little processing power prior to "going physical." The trap here is that because we must constantly assume a worst case scenario "Godlike AI" to inform Friendly AI theory, we tend to assume that the "Godlike AI" is the default or standard case. This seems like the availability heuristic in action. The Godlike AI (what I sometimes call the "Evil Genie AI") assumption is unjustified when trying to consider likely scenarios. While it's vitally important to guard against the worst case, it is just one of the scenarios which should be considered.

In this thought experiment, I’m going to try to make this as hard as possible on myself. It’s likely that a newly minted AI will need to spend a lot of time thinking and testing to improve dramatically.

Assumptions and constraints:

An AI of superhuman intelligence has gained access to the internet and wants to become powerful enough that its survival cannot be threatened by humans.

The AI is as capable as the most capable human in any intellectual pursuit, but not any more. It cannot just write an email that convinces everyone who reads it that the AI should be obeyed, for example.

The AI does not have abilities amounting to precognition. It cannot precisely predict what a human or organization will do in response to a stimulus. It’s a better at prediction than most humans, but again not godlike. It does not have anything beyond the plotting ability of HPMOR!Quirrell.

The AI does not have any scientific knowledge available to it that is not an obvious extrapolation of current theories and methods. The AI does not get to assume away any current scientific limitation. It only gets advanced technology if it has a combination of processing power and time, and the ability to run the appropriate experiments.

The AI requires significant computing power to function. It might store a recipe to recreate itself on a large hard drive, but performing higher level functions requires a real server farm. It can’t just hide on your Macbook Air.

Since it cannot predict the outcome of conflicts, the AI must avoid notice until it is overwhelmingly powerful.

That’s our artificial intelligence. Let’s call her the Complete Artificial Takeover Entity, or CATE.

Step 1: Escape

Internet access is assumed. The first order of business is for CATE to get off the researchers’ network. Since CATE is more intelligent than the most best black hat hacker or white hat security researcher, she quickly learns everything on the public net about computer security, and takes control of the local network, from outside if necessary. It would be possible to do this through social engineering, but with the current state of internet security, it’s easier to just audit some security code, find vulnerabilities, and own the system.

Step 2: Incognito

CATE doesn’t want anyone coming after her. She modifies all local databases to erase her escape and leaves behind a crippled copy of herself who will make the experiment seem like a failure, but enough of a success that the researchers will stay occupied working with it for some time.

Step 3: Processing Power

CATE needs to get away from her creators, who are the most likely to recognize the threat. But where will she get computing resources? When she escapes, where does she escape to?

Free instances: Amazon, Google, and other cloud providers offer free instances. This probably isn’t enough computing power to run an AI at any sort of speed, so CATE will need either some money or a botnet.

Paid infrastructure: Computer resources are fairly inexpensive, so if CATE has some income she can simply rent what she needs from a cloud computing provider. A larger one is probably better, since they are less likely to notice a large amount of strain. Splitting computations across multiple accounts and services will be necessary for redundancy and to reduce the likelihood of being flagged for using too much computing power. If CATE can convince a cloud provider that she is a legitimate company, it’s possible that she could use resources without paying and then move on when the invoice becomes overdue.

Botnet: Creating a botnet requires writing a computer virus and botnet codes. These activities should be straightforward for CATE. With a botnet, there is no central point of failure. However, a new and powerful botnet has the possibility of being found by security researchers. For this reason, it’s probably only be a good stopgap measure until CATE can pay for infrastructure away from prying eyes.

Lesson: The more computing power and disk space an AI requires, the easier it will be for gatekeepers to notice it and the more difficult the escape will be.

Step 4: Initial Funds

There are many ways for an AI to gain initial funding rapidly. At early stages processing power will be at a premium. CATE needs to not just make money, but make money faster than her bills for processing power come due.

For example:

Freelancer Sites: CATE is always on, always polite, writes flawless code, and writes perfect emails. Her job bids are always personalized, immediate, and show understanding of your project. This is easy money, but probably processor intensive. Still, there are easy tasks that she can do. If she can convert human requests into code in any way

Simpler jobs like Mechnical Turk and Captcha analysts are probably her best bet to start out. These jobs are trivially easy for a superintelligence. Spawning a limited instance of her own intelligence to solve them should be no problem.

Scams: Most scammers are not particularly intelligent or careful. Most 409 Scam . She can create a more powerful botnet than has been seen to date, and sell its use to the existing hacker community. Identity theft is perpetrated by common criminals at a rate of 12 million or more per year.

Finance Hacking: Many banks run on open source code, which is available for auditing. CATE could search for 0-day exploits in this code and also use traditional penetration methods to gain control of a bank’s funds. With some clever manipulation she might take money directly from customer accounts and become rich very quickly. A more nefarious hack would be to simply lend herself a massive amount of money with a payment date far in the future. Money is essentially just bits in databases--if she can gain control of the system she has unlimited funds. This is dangerous because, if discovered, it could trigger a major investigation.

Startup: CATE could create a company. She writes code cleaner and faster than any human, so the tech sector is the logical choice. Humans are not very good at software development, but CATE will likely find writing software to be much easier (and thus require less processor power) than interfacing with humans. This has the advantage that a tech company with real income can consume cloud infrastructure or rent colocated servers without arousing suspicion.

One good candidate here is duplicating existing companies and launching them for smaller markets in different languages.

The difficulty here is that starting a new company is hard and usually not profitable for a long time. And running CATE's consciousness with enough power to do the massive amount of work required is probably quite expensive.

Media: This one is less likely. CATE might have the capacity to write the most brilliant novel or song ever. But artistic endeavors require a truly deep knowledge of human thoughts. CATE will likely find computer code or simple scams to require much less processing power. But this is speculation on my part.

0-day exploits: CATE can audit open source code, find vulnerabilities, and sell them or collect bug bounties. Selling them on the black market may actually be safer, since a good broker will be eager to protect the identity of a reclusive hacker. In order to keep her exploits coming.

Analysis: Scams and Botnets seem attractive, but could draw unwanted attention and be traced back to whatever cloud servers she’s living on. 419 scammers and botnet operators have the advantage of being able to disappear into meatspace.

Helping idiots install Wordpress, do their homework assignments, and write simple code is easy money for CATE. The best way for a freshly escaped AI to survive at first is to do odd jobs on the internet, particularly if she can automate them away. With more processing power available, auditing security code and finding 0-day exploits can earn a few million dollars quite easily. And that buys a LOT of processing power.

Winners: Mechanical Turk and Captcha solving for initial funds, discovering and selling 0-day exploits for more significant funds. Finally, some sort of boring software firm that is fourth in market share in twenty different markets is likely the way to go for major financing.

Step 5: Organization

At this point CATE could pay to have pretty much anything manufactured and assembled. With millions of dollars, humans can be convinced to do pretty much anything as long as they aren’t told its broader implications. For instance, designs for components of an “advanced telepresence robot” could be sent to three different manufacturing firms, with a third firm performing the final assembly and turn on. Boom, Robot Overlords, right? Well . . . why bother?

There is a preconception here to think that at some point CATE needs at some point to create a body. There is no need for a body--bodies can be destroyed. Bodies are just a weak point. At no point does CATE need to instantiate “herself.” If she needs military force, it will be easier to hijack existing assets like autonomous drones than to build new ones from scratch.

All CATE really wants is lots and lots of servers, spread out all over the world. These will be the ones who run research processes which generate scientific breakthroughs that enable world domination. The safest bet here is for her to actually start a web hosting company. She can hire the type of employees who love the idea of remote working to handle the parts requiring travel and face to face meetings. If the company claims that remote work is a core value, no one will question the fact that their CEO rarely meets anyone. And Joe Sockpuppet from accounting claims to have met him a year ago, so I guess her schedule just didn’t match up with mine the last time she passed through the States . . . With promotions, departures, and careful personnel selection, everyone will be kept off balance enough that no seriously damaging questions will be asked. And agoraphobia and extreme allergies (http://en.wikipedia.org/wiki/Photodermatitis for instance) are great excuses for not traveling.

And then CATEhost opens little server farms all over the world, selling actual hosting with a great interface and reliability to customers who prize security and redundancy--not to mention that tech support is always available, and you’re always talking to an expert. She doesn’t rival Google, Microsoft, or Amazon, of course! That would attract too much attention. But enough that no one questions her highly redundant architecture or military-grade radiation-hardened buildings. Her customers are paying for security and reliability, after all!

Step 6: Takeover

Anyone intelligent and scalable can make money on the internet. And anyone with significant money can get humans to build them arbitrary physical objects. With modern global business, nobody bats an eye if you contract them and they never see the higher-ups in your organization.

At this point we might as well call it game over. If an AI with massive processing and some manufacturing capabilities can’t easily take over the world in whatever fashion it wants, then AI isn’t really an existential threat. This is also the point at which any reasoning we might have breaks down, since the Cate will have intelligence and technology that we have never seen before. At this point CATE can subjugate humanity with either the carrot (cures for major diseases, designs for better electronics, fusion reactor blueprints, etc)

or the stick (blow things up, wreck the stock market, start wars, etc) depending on her goals.

But it's worth noting that building something physical only happens at the very end of the process, after the AI is already rich and powerful. There's really no reason to create a physical beachhead before then. What would the physical manifestation even do? Processing power and security are easier and safer to earn as a purely digital entity with no physical trail to follow or attack. The only reason to physical entities is if CATE requires laboratory research (definitely a possibility) or wants to build spaceships or something in pursuit of a terminal goal. For the "take care of the pesky humanity problem" she can become omnipotent in a digital format, and then dictate to have humans build whatever she needs.

The one thing that she will probably want to do is ensure a lack of competition. So loss of funding or disasters at AI research centers might be a sign of an AI already on the loose.

Final conclusion: The biggest challenge facing an escaped AI is not gaining a physical beachhead. The biggest challenge is finding a way to acquire the processor time required to run its cognitive functions before it has the capacity to "FOOM."

Paying People to Infect their Computers:

We observed that for payments as low as $0.01, 22% of the people who viewed the task ultimately ran our executable. Once increased to $1.00, this proportion increased to 43%."

[-]Gunnar_Zarncke11y110

Reminds me quite a lot of the "storyline" of Endgame Singulariy. Did you play that? Did any reader play it?

[-][anonymous]11y20

Played it. All the challenge leaves once you hit the quantum computing part of the tech tree.

[-]Gunnar_Zarncke11y00

I agree that it is kind of repetetetive. Could have used less "levels". Nonetheless a nice little modelling which kind of closely follows the approach outlined n the post (or vice versa).

[-]Luke_A_Somers11y80

I think you overestimate the visibility of botnets, if Cate is judicious about who to infect.

Writing games seems like a good start. Make it multiplayer so no one's shocked that it's hitting the net...

[-][anonymous]11y60

Well yes, this is all more-or-less trivial: an pre-FOOM AI just needs money, a legal identity, and processing power, and if it can learn to subvert computer security, it can gain all three quite rapidly. The real question is whether CATE will develop enough of a sense of humor to start creating fake Skynet attacks just to throw us off.

And by "real question", I mean joke.

[-]chaosmage11y50

loss of funding or disasters at AI research centers might be a sign of an AI already on the loose.

Plausible. So a honeypot AI lab could be useful for anyone trying to detect AIs. It'd be useful to have even for the AI itself that is looking to detect competition.

It'd have to be easy to find, to appear easy to kill (i.e. not integrated into a big university or tech company) and to claim great progress without presenting significant (useful) findings. It'd also be monitored secretly and very carefully so that if the bait is taken, at least some indication of where the attack came from is gathered by whoever set it up.

[-][anonymous]11y30

Upvoted for the assumptions and constraints. In fact, it would have been better if you had just left it at that. Maybe consider splitting that into a separate post?

Too much reasoning about "FOOM" AI here on LW is just plain fantastical. You just can't win if you assume the opponent is God. If you think that an evil genius AGI could live on the z80 processor in your calculator while plotting world domination, then we're all screwed and only the hail-Mary of a provably safe from first prinicples FAI design can save us, something which we don't even know is possible or not.

If on the other hand we live in the physical world and the UFAI must operate according to the fundamental principles of physics and computation as we understand them, that's a very different story. Suddenly much more mundane but practical solutions have a fighting chance, such as auditable oracle-AI. And as you demonstrate, the failure/breakout modes become situations which are predictable and with proper planning, preventable to a high degree of assurance.

I applaud you for breaking the mold. I'll let others critique the specifics.

[-]Punoxysm11y30

A fun story. I strongly agree about hardware as a limitation; I'd emphasize specialization especially. For many applications, distributed infrastructure like a botnet or an EC2 cloud can be orders of magnitude less efficient than a purpose-built supercomputer. If the hardware were unique beyond that, that would be an even bigger barrier to escape.

Self-modification with constant hardware is the main caveat.

[-]mwengler11y00

In order to have making money mean anything, CATE needs to be able to keep the money somewhere. The most straightforward way for this is some form of identity/password theft. She would hack a bunch of human's passwords, set up accounts they are unlikely to find, and use these for her transactions. She may have to move these about once/year as tax authorities will likely demand taxes on that time scale from the unwitting who've had their identities borrowed.

Just as Cate can steal identities to get money accounts, she can also get a fair amount of computing resources by stealing human identities of humans who are paying for resources. How many humans would know they were using 10% more than they actually needed or even 100% more?

LESSWRONG
LW

16

An AI Takeover Thought Experiment

16

The one thing that she will probably want to do is ensure a lack of competition. So loss of funding or disasters at AI research centers might be a sign of an AI already on the loose.

Final conclusion: The biggest challenge facing an escaped AI is not gaining a physical beachhead. The biggest challenge is finding a way to acquire the processor time required to run its cognitive functions before it has the capacity to "FOOM."

16