There has been focus on controlling super intelligent artificial intelligence, however we currently can't even control our un-agenty computers without having to resort to formatting and other large scale interventions.

Solving the normal computer control problem might help us solve the super intelligence control problem or allow us to work towards safe intelligence augmentation.

We cannot currently keep our computers doing what we want easily. They can get infected with malware, compromised or they get updates that may be buggy. If you have sufficient expertise you can go in and fix the problem or wipe the system, but this is not ideal.

We do not have control our computers, without resorting to out of band manipulation.

Genes have found a way to control the reptilian brain and also the more powerful mammalian and human brains, somewhat, as discussed in the control problem has already been solved. And the system continues to run, our brains aren't reformatted when we get bad behaviours.  Let us call this the normal computer control problem. We don't know how the genes do it, but humans tend to do what they would want  (if they wanted things!) despite our flexibility. There is some control there.

This problem of control has been neglected by traditional AI as there it is not trying to solve a cognitive problem. It is not like solving chess or learning to recognize faces. It is not making anything powerful, it is just weeding out the bad programs.

Comparing the normal computer control and AI control problem

The AI control problem has been defined as asking the question

What prior precautions can the programmers take to successfully prevent the superintelligence from catastrophically misbehaving?

In this language the normal computer control problem can be defined as.

What type of automated system can we implement to stop a normal general purpose computer system misbehaving (and carry on with its good behaviour ) if it has a malign program in it.

To make the differences explicit:

  • The normal control problem assumes a system with multiple programs some good, some bad
  • The normal control problem assumes that there is no specific agency in the programs (especially not super-intelligent agency)
  • The normal control problem allows minor misbehaviour, but that it should not persist over time

These make the problem more amenable to study.   These sorts of systems can be seen in animals. They will stop pursuing behaviours that are malign to themselves. If a horse unknowingly walks into an electric fence whilst it was trying to get to an apple, they will stop trying to walk in that direction. This is operant conditioning, but it has not been applied to a whole computer system with arbitrary programs in.

Imagine being able to remove malware from a normal computer system by training that system. That is what I am looking to produce.

This might not be the right problem definition to help us understand what is going on the control done in brains. But I think it is precise and novel enough to form one of the initially research pathways. Ideally we should have a diverse community around this problem one that includes neuroscientists, psychologists and other relevant scientists. We would also have charitable organisations like the ones trying to solve the super intelligence control problem.  All this would maximize the chance that the right question was being attempted to be answered.

Should we study the normal computer control problem?

I'm not going to try and argue that this is more important that super-intelligence work. Such things are probably unknowable until after the fact. But just that it is a more tractable problem to try and solve and might have insights useful for the super-intelligence work.

But as ever with the future there are trade-offs for doing this work.

Pros:

  • It might help solve the super intelligence control problem, by providing inspiration or allowing people to show exactly where it go wrong in the super intelligence side of things.
  • it might be the best that we can do towards the control problem, along with good training at (if formal proofs to do with values aren't helpful for control)
  • We have can use science on brains in general, to give us inspiration on how this might work.
  • It can be more experimental and less theoretical than current work.
  • It might help with intelligence augmentation work (maybe a Con, depending upon pre-conceptions).
  • if deployed widely it might lead to computation being harder to control by malicious actors. This would making taking over the internet harder (ameliorating one take off scenario).
  • it could grow the pool of people that have heard of the control problem
  • my current hypothesis is that the programs within my current system aren't bound to maximise utility so do not suffer from some of the failure modes associated with normal utility maximisation.

 Cons:

  • It might speed up AI work (however it is not AI in itself, there is no set of problems it is trying to solve). It would speed up AI work by giving a common platform to work from.
  • it might distract from the large scale super intelligence problem

So having said all that: Is anyone with me in trying to solve this problem?

New Comment
14 comments, sorted by Click to highlight new comments since:

An example I like is the Knight Capital Group trading incident. Here are the parts that I consider relevant:

KCG deployed new code to a production environment, and while I assume this code was thoroughly tested in a sandbox, one of the production servers had some legacy code ("Power Peg") that wasn't in the sandbox and therefore wasn't tested with the new code. These two pieces of code used the same flag for different purposes: the new code set the flag during routine trading, but Power Peg interpreted that flag as a signal to buy and sell ~10,000 arbitrary* stocks.

*Actually not arbitrary. What matters is that the legacy algorithm was optimized for something other than making money, so it lost money on average.

They stopped this code after 45 minutes, but by then it was too late. Power Peg had already placed millions of inadvisable orders, nearly bankrupting KCG.

Sometimes, corrigibility isn't enough.

This is a cool idea! My intuition says you probably can't completely solve the normal control problem without training the system to become generally intelligent, but I'm not sure. Also, I was under the impression there is already a lot of work on this front from antivirus firms (i.e. spam filters, etc.)

Also, quick nitpick: We do for the moment "control our computers" in the sense that each system is corrigible. We can pull the plug or smash it with a sledgehammer.

It think there are different aspects of the normal control problem. Stopping it have malware that bumps it into desks is probably easier than stopping it have malware that exfiltrates sensitive data. But having a gradual progression and focusing on control seems like the safest way to build these things.

All the advancements of spam filtering I've heard of recently have been about things like DKIM and DMARC. So not based on user feedback. I'm sure google does some things based on users clicking spam on mail, but it has not filtered into the outside world much. Most malware detection (AFAIK) is based on looking at the signatures of the binaries not on behaviour, to do that you would have to have some idea of what the user wants the system to do.

Also, quick nitpick: We do for the moment "control our computers" in the sense that each system is corrigible. We can pull the plug or smash it with a sledgehammer.

I'll update the control of computers section to say I'm talking about subtler control than wiping/smashing hard disks and starting again. Thanks,

can you smash NSA mass surveillance computer centre with a sledgehammer?

ooops, bug detected... and AGI may have already been in charge

remember, US milispying community is openly crying for years that someone should explain them why is AI doing what it is doing (read: please , dumb it down to our level... not gonna happen)

On the other hand... what level do you want to examine this at?

We actually have pretty good control of our web browsers. We load random untrusted programs, and they mostly behave ok.

It's far from perfect, but it's a lot better than the desktop OS case. Asking why one case seems to be so much farther along than the other might be instructive.

In some ways Browser is better, it is also more limited. It still has things like CSRF and XSS which can be seen as failures of the user to control their systems. Those are getting better, for CSRF by making the server be more wary about what they accept as legitimate requests.

I'll write an article this weekend on the two main system design patterns to avoid. *spoilers* Ambient authority because it causes the confused deputy problem and global namespaces. It is namespaces that web pages browsers have improved, web pages downloaded by the browser can't interact at all, so each one is a little island. It makes some things hard and the user very reliant on external servers.

The normal control problem assumes that no specific agency in the programs (especially not super-intelligent agency)

There seems to be a verb missing in that sentence...did you mean ...assumes that there is no specific agency in the programs...?

(Nitpicks aside, I think this is the right approach...build current safety and control knowledge, rather than assume thjat all furure AIs will follow some very specific decision theory).

Thanks. Edited.

however we currently can't even control our un-agenty computers very well

Hah, computers. We can't control anything very well. Take a hammer -- you might think it's amenable to driving nails in straight, but noooo... It bends the nails, leaves dents in the surface and given the slightest chance will even attack your fingers!

How about we solve the hammer control problem first?

This is operant conditioning, but it has not been applied to a whole computer system with arbitrary programs in.

Applying operant conditioning to malware is problematic for the same reason horses have difficulties learning not to walk into electic fences with a few thousands volts applied to the wires...

We've (mostly) solved the hammer control problem in a restricted domain. It looks like computer-controlled robots. With effort, we can produce an entire car or similar machine without mistakes.

Obviously we haven't solved the control problem for those computers: we don't know how to produce that car without mistakes on the first try, or with major changes. We have to be exceedingly detailed in expressing our desires. Etc.

This may seem like we've just transformed it into the normal computer control problem, but I'm not entirely sure. Air-gapped CNC machinery running embedded OSes (or none at all) is pretty well behaved. It seems to me more like "we don't know how to write programs without testing them" than the "normal computer control problem".

We've (mostly) solved the hammer control problem in a restricted domain.

The "mostly" part is important -- everyone still has QC departments which are quite busy.

Also, I'm not sure that being able to nearly perfectly replicate a fixed set of physical actions is the same thing as solving a control problem.

Air-gapped CNC machinery running embedded OSes (or none at all) is pretty well behaved.

In theory. In practice you still have cosmic rays flipping bits in memory and Stuxnet-type attacks.

However the real issue here is the distinction between "agenty" and "un-agenty". It is worth noting that the type of control that you mention (e.g. "computer-controlled robots") is all about getting as far from "agenty" as possible.

It bends the nails, leaves dents in the surface and given the slightest chance will even attack your fingers!

We've mostly solved that problem.

I'm not sure that being able to nearly perfectly replicate a fixed set of physical actions is the same thing as solving a control problem.

It's precisely what's required to solve the problem of a hammer that bends nails and leaves dents, isn't it?

Stuxnet-type attacks

I think that's outside the scope of the "hammer control problem" for the same reasons that "an unfriendly AI convinced my co-worker to sabotage my computer" is outside the scope of the "normal computer control problem" or "powerful space aliens messed with my FAI safety code" is outside the scope of the "AI control problem".

It is worth noting that the type of control that you mention (e.g. "computer-controlled robots") is all about getting as far from "agenty" as possible.

I don't think it is, or at least not exactly. Many of the hammer failures you mentioned aren't "agenty" problems, they're control problems in the most classical engineering sense: the feedback loop my brain implements between hammer state and muscle output is incorrect. The problem exists with humans, but also with shoddily-built nail guns. Solving it isn't about removing "agency" from the bad nail gun.

Sure, if agency gets involved in your hammer control problem you might have other problems too. But if the "hammer control problem" is to be a useful problem, you need to define it as not including all of the "normal computer control problem" or "AI control problem"! It's exactly the same situation as the original post:

  • The normal control problem assumes that no specific agency in the programs (especially not super-intelligent agency)

We've mostly solved that problem.

Not quite. We mostly know how to go about it, but we didn't actually solve it -- otherwise there would be no need for QC and no industrial accidents.

It's precisely what's required to solve the problem of a hammer that bends nails and leaves dents, isn't it?

Still nope. The nails come in different shapes and sizes, the materials can be of different density and hardness, the space to swing a hammer can vary, etc. Replicating a fixed set of actions does not solve the general "control of the tool" problem.

I think that's outside the scope of the "hammer control problem"

I don't think it is. If you are operating in the real world you have to deal with anything which affects the real-life outcomes, regardless of whether it fits your models and frameworks. The Iranians probably thought that malware was "outside the scope" of running the centrifuges -- it didn't work out well for them.

they're control problems in the most classical engineering sense

Yes, they are. So if you treat the whole thing as an exercise in proper engineering, it's not that hard (by making-an-AI standards :-D) However the point of "agenty" tools is to be able to let the tool find a solution or achieve an outcome without you needing to specify precisely how to do it. In that sense the classic engineering control is all about specifying precise actions and "punishing" all deviations from them via feedback loops.

Again, I'm going to import the "normal computer control" problem assumptions by analogy:

  • The normal control problem allows minor misbehaviour, but that it should not persist over time

Take a modern milling machine. Modern CNC mills can include a lot of QC. They can probe part locations, so that the setup can be imperfect. They can measure part features, in case a raw casting isn't perfectly consistent. They can measure the part after rough machining, so that the finish pass can account for imperfections from things like temperature variation. They can measure the finished part, and reject or warn if there are errors. They can measure their cutting tools, and respond correctly to variation in tool installation. They can measure their cutting tools to compensate for wear, detect broken tools, switch to the spare cutting bit, and stop work and wait for new tools when needed.

Again, I say: we've solved the problem, for things literally as simple as pounding a nail, and a good deal more complicated. Including variation in the nails, the wood, and the hammer. Obviously the solution doesn't look like a fixed set of voltages sent to servo motors. It does look like a fixed set of parts that get made.

How involved in the field of factory automation are you? I suspect the problem here may simply be that the field is more advanced than you give it credit for.

Yes, the solutions are expensive. We don't always use these solutions, and often it's because using the solution would cost more and take more time than not using it, especially for small quantity production. But the trend is toward more of this sort of stuff being implemented in more areas.

The "normal computer control problem" permits some defects, and a greater than 0% error rate, provided things don't completely fall apart. I think a good definition of the "hammer control problem" is similar.