The human mind has many cognitive modules that, though superficially similar and computationally similar (Kurzweil 2013) are still modules that have been evolutionarily optimized - up to a bounded constraint - for distinct functions (Tooby, Cosmides, Buss 2014, Pinker 1995, Minsky 2007).
When we tamper with the human mind with amphetamines or stimulants to make it better by making the entire thing faster, the fact that it has many systems tends to be a hindrance. People end up being motivated about the wrong things, undereating, feeling horny, angry, being mildly autistic etc...
In other words, targeted intervention becomes harder when a mind has many modules, if you goal is to have some of these modules enhanced, but some kept constant.
Stunting a human mind by throwing an anti-psychotic, tranquilizer or a sleep inducer on it, on the other hand, is very effective. The whole brain runs on a digital system of electrochemical communication: the axon's action potential.
So drugs that paralyze or stop human intelligence do so by shutting down the communication system between modules. Seems advisable to run the whole AI with only one, slow system of communication between it's parts.
Contrast this with a stroke:
A stroke destroys some of your modules, but leaves most intact. The consequences may vary from absolute impairment to impossibility of processing symbols into meanings and language
In the second stroke case, within less than two minutes, the human mind came up with a solution to dial a phone.
The takeaway lesson is to make internal communication within the AI slow, and shut it down as a whole, to the extent those are possible.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the thirteenth section in the reading guide: capability control methods. This corresponds to the start of chapter nine.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Two agency problems” and “Capability control methods” from Chapter 9
Summary
Another view
Brian Clegg reviews the book mostly favorably, but isn't convinced that controlling an AI via merely turning it off should be so hard:
This may be related to his view that AI is unlikely to modify itself (from further down the same page):
Notes
1. What do you do with a bad AI once it is under your control?
Note that capability control doesn't necessarily solve much: boxing, stunting and tripwires seem to just stall a superintelligence rather than provide means to safely use one to its full capacity. This leaves the controlled AI to be overtaken by some other unconstrained AI as soon as someone else isn't so careful. In this way, capability control methods seem much like slowing down AI research: helpful in the short term while we find better solutions, but not in itself a solution to the problem.
However this might be too pessimistic. An AI whose capabilities are under control might either be almost as useful as an uncontrolled AI who shares your goals (if interacted with the right way), or at least be helpful in getting to a more stable situation.
Paul Christiano outlines a scheme for safely using an unfriendly AI to solve some kinds of problems. We have both blogged on general methods for getting useful work from adversarial agents, which is related.
2. Cryptographic boxing
Paul Christiano describes a way to stop an AI interacting with the environment using a cryptographic box.
3. Philosophical Disquisitions
Danaher again summarizes the chapter well. Read it if you want a different description of any of the ideas, or to refresh your memory. He also provides a table of the methods presented in this chapter.
4. Some relevant fiction
That Alien Message by Eliezer Yudkowsky
5. Control through social integration
Robin Hanson argues that it matters more that a population of AIs are integrated into our social institutions, and that they keep the peace among themselves through the same institutions we keep the peace among ourselves, than whether they have the right values. He thinks this is why you trust your neighbors, not because you are confident that they have the same values as you. He has several followup posts.
6. More miscellaneous writings on these topics
LessWrong wiki on AI boxing. Armstrong et al on controlling and using an oracle AI. Roman Yampolskiy on 'leakproofing' the singularity. I have not necessarily read these.
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about 'motivation selection methods'. To prepare, read “Motivation selection methods” and “Synopsis” from Chapter 9. The discussion will go live at 6pm Pacific time next Monday 15th December. Sign up to be notified here.