The subagent problem is really hard

Stuart_Armstrong

13 The subagent problem is really hard

by Stuart_Armstrong

18th Sep 2015

3 min read

10

13

A putative new idea for AI control; index here.

The first step to solving a problem is to define it. The first first step is to realise how tricky it is to define. This is a stub on a difficult problem.

Subagents and turning AIs off

One of the hardest things with designing safe AIs, is turning them off. Installing an OFF button is trivial; motivating them to let you push it can be done. No, the real problem is the subagent problem. The AI can create one or many subagents to carry out their task; once that's done, turning off the original AI is of little value, we have to control or extinguish the subagents.

A general problem

Subagents are a problem for many other control ideas. For example, if you want to program 'achieve X using only this many computer cycles', then 'create a subagent that wants to achieve X without the computer cycle restrictions' is one way of doing it (as long as that can be done within the restrictions). The same goes for energy use and other restrictions. Satisficers can satisfice by creating a maximiser. A boxed Oracle can remain in the box while cheerfully plotting to create an unboxed copy of itself. In fact, whatever the AI's motivation, "create subagents with the same motivation" is almost always a good idea. And, once created, there is no motivation to want to shut them down. Even if the AI is motivated to shut itself down, there is no reason for it to extend this to subagents.

Pre-corriged agents would at least be motivated to keep control of their subagents (unlike corrigible agents, who have no current desire to create subagents that would obey their future preferences).

What is a subagent?

It's also almost impossible to define a subagent. Is a corporation a subagent? Is a thousand pieces of code that reside on different computers and occasionally communicate a subagent? Is an employee or a seduced human a subagent? A pile of rocks with water running through could be a piece of conceptual art - or a slow computer. How about a virulent meme among the human population, making us all into its subagents?

The best idea I've found for defining subagents is that they are reactive intelligences with planning abilities, so random natural events that should have large influence in the future (say, a meteor heading towards Earth) would instead be re-diverted to have little impact. However, if this is used as a definition, then the subagents would simply be programmed to not behave "reactively" in whatever way defined. A slightly more subtle problem is that, since we don't have a definition of agent or subagent, we can't easily protect against holes such as "there is an alien agent on that asteroid, which will optimise the solar system iff it detects no subagent on Earth". Such holes could completely reverse the definition.

No subagent versus no influential subagents

Without the ability to properly define subagents, we can't restrict them directly. Even if you had an AI that was properly motivated to, say, do some action then turn itself off, there is no reason for it to want to do the same with its subagents.

Most of the methods of reduced impact can help with problematic subagents. They do not preclude the creation of subagents, nor mandate their destruction, but they do prevent them from having much influence. This is because reduced impact methods are aimed at preventing the AI itself from having much influence, either directly or via subagent creation. Even the attempt in this post to detect subagents directly, more properly detects influential decisions (of which the creation of a subagent could be one of them).

This is certainly a positive, but it may not be robust. If the definition of "not having much influence" turns out to have holes, it would be less than ideal to have any remaining subagents hanging about.

The "make sure that you cannot be detected in the future" approach is probably the most promising for controlling subagents. The AI there is motivated to erase all traces of its existence, which subagents would presumably be.

In all, it's a very tricky problem, and the core failure of many ideas for AI control.

Personal Blog

13

Mentioned in

36Selection processes for subagents

34New(ish) AI control ideas

31Subagents and impact measures, full and fully illustrated

17In theory: does building the subagent have an "impact"?

16Counterfactuals versus the laws of physics

Load More (5/11)

New Comment

10 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:03 PM

[-]Houshalter10y40

This doesn't really solve your problem, but it's interesting that humans are also trying to create subagents. The whole AI problem is us trying to create subagents. It turns out that that is very very hard. And if we want to solve FAI, making subagents that actually follow our utility function, that's even harder.

So humans are an existence proof for minds which are very powerful, but unable to make subagents. Controlling true superintelligences is a totally different issue of course. But maybe in some cases we can restrict them from being superintelligent?

Reply

[-]Stuart_Armstrong10y40

We'd be considerably better at subagent creation if we could copy our brains and modify them at will...

Reply

[-]Houshalter10y00

Well it's not impossible to restrict the AIs from accessing their own source code. Especially if they are implemented in specialized hardware like we are.

Reply

[-]Stuart_Armstrong10y40

It's not impossible, no. But it's another failure point. And the AI might deduce stuff about itself by watching how it's run. And a world that has built an AI is a world where there will be lots of tools for building AIs around...

Reply

[-]Dagon10y00

It helps to realize that "agent" and "actor" are roles, not identities. The (sub)agents you're discussing are actors to themselves: they have beliefs and desires, just like the entity who's trying to control them.

We're all agents, and we comprise subagents in the form of different mental modules and decision processes.

Reply

[-]VoiceOfRa10y00

Well one thing to keep in mind is that non-superintelligent subagents are a lot less dangerous without their controller.

Reply

[-]Stuart_Armstrong10y40

Why would they be non-superintelligent? And why would they need a controller? If the AI is under some sort of restriction, the most effective idea for it would be to create a superintelligent being with the same motives as itself, but without restrictions.

Reply