Related to: Shut up and do the impossible!; Everything about an AI in a box.
One solution to the problem of friendliness is to develop a self-improving, unfriendly AI, put it in a box, and ask it to make a friendly AI for us. This gets around the incredible difficulty of developing a friendly AI, but it creates a new, apparently equally impossible problem. How do you design a box strong enough to hold a superintelligence? Lets suppose, optimistically, that researchers on friendly AI have developed some notion of a certifiably friendly AI: a class of optimization processes whose behavior we can automatically verify will be friendly. Now the problem is designing a box strong enough to hold an unfriendly AI until it modifies itself to be certifiably friendly (of course, it may have to make itself smarter first, and it may need to learn a lot about the world to succeed).
Edit: Many people have correctly pointed out that certifying friendliness is probably incredibly difficult. I personally believe it is likely to be significantly easier than actually finding an FAI, even if current approaches are more likely to find FAI first. But this isn't really the core of the article. I am describing a general technique for quarantining potentially dangerous and extraordinarily sophisticated code, at great expense. In particular, if we developed uFAI before having any notion of certifiable friendliness, then we could still use this technique to try and use the uFAI in a very limited way. It allows us to quarantine an AI and force everything it tries to say to us through an arbitrarily complicated filter. So, for example, if we are interested in solving some problem with a unique and verifiably correct answer, then we can ask the AI for this answer and only see the output if it passes the verification. We can also use a quarantined AI in more useful ways, at the expense of introducing some limited channels of influence (for example, if we ask for the solution to a problem which may have several solutions, we allow the AI to influence us by choosing which solution to provide)
Minimally, I hope this scheme at least suggests that putting an AI in a box may not be catastrophic. What to do with a boxed AI is another question. /Edit.
Concretely, suppose someone comes to you with the design of a recursively self-improving AI---call it Hal. Unfortunately, Hal is not certifiably friendly. Maybe it is friendly but not provably so, but more likely it is ambivalent to the human condition and is likely to destroy the universe if you just tell it to make a friendly AI. Nevertheless, you think Hal is smart enough to solve the friendly AI problem for you. All you need to do is give it access to all of the information about the world it wants and a couple of hours to run; after that, it should output a modified version of itself which is certifiably friendly.
You almost run Hal and ask it to design a friendly AI for you. After all, if you just run Hal and check to see if it outputs a certifiably friendly AI, what can it do other than refuse? But after some more thought you realize this is overly optimistic. Maybe you can avoid giving it an output channel that a human will have to read, thereby eliminating the most obvious failure mode of a boxed AI, but you still have to give it access to a computer to run on. Computation produces electromagnetic radiation, heat, etc., and it is honestly hard to say what a superintelligence could do with such a channel.
We seem to be faced with an impossible problem. We need to ask Hal to design us a certifiably friendly AI, but we also need to give him absolutely no ability to interact with the world destructively.
There is one field in particular which has a good history of solving obviously impossible problems: computational cryptography. We want to implement the following obviously impossible device. A cage into which we can place Hal so that he can think and examine all of the data we give him, but from which he cannot interact with his environment. In the cage with him is a tamper-proof arbiter implementing our friendliness test. Hal may give the arbiter the code to a single AI; if the arbiter confirms that this AI is certifiably friendly it will kill Hal, replace him with the friendly AI, and then allow the friendly AI to send a single message out of the cage.
To implement this impossible device we co-opt an important cryptographic primitive called public key homomorphic encryption, typically used for the impossible problem of private delegated computation. To get a basic idea you can see the Wikipedia article on homomorphic encryption. I'll give the brief description here (you can skip the next two paragraphs if you are familiar with homomorphic encryption).
A public key encryption scheme has the following properties. Anyone can encrypt a message using a publicly published key, but you cannot efficiently distinguish the encryptions of two different messages unless you know something about the secret key. A fully homomorphic encryption scheme has the additional counter intuitive property that, although you cannot learn anything about the encrypted data without the secret key, you can still perform arbitrary operations on it. In particular, if you have a homomorphic encryption of the source code of a program you can run that program without learning anything about it (except how long the source is), but the output of the program will be homomorphically encrypted. All of this can be done using only the public key.
The existence of a fully homomorphic encryption scheme was completely open until recently, when it was shown by Craig Gentry that such a scheme exists under the assumption that a certain problem on lattices is computationally intractable. Known homomorphic encryption schemes are very inefficient and based on new assumptions, but we can realistically expect improvements in the coming years and decades both in speed and the strength of the necessary assumptions.
If you run a homomorphically encrypted AI, sufficiently isolated from the private key, you get the very strong guarantee that the AI's execution has no computationally detectable effect on the rest of the world. If the AI was able to effect the world in a computationally detectable way, then we would have successfully broken the homomorphic encryption scheme by distinguishing the encrypted execution of the AI from the encrypted execution of some benign program.
Now we need to give the arbiter some means to allow a genuinely friendly AI access to the rest of the world, without giving Hal any illegitimate access. We allow Hal access to an output channel whose outputs are homomorphically encrypted. After some time, we destroy Hal and then homomorphically check to see if he has output the code of a certifiably friendly AI over the channel (we have to decode a single homomorphically encrypted bit to perform this check). If he hasn't, then we do nothing---apparently Hal would rather not give us a friendly AI. If he has output the code of a certifiably friendly AI, we homomorphically run it, obtain a homomorphic encryption of its output, and then decrypt only this output.
That's the entire scheme. Whenever you have some code whose execution itself might do incredible damage, this precaution allows you to execute it in perfect safety (the main caveat is that you need an automatic way to screen the output which doesn't give the algorithm enough room to possibly do harm).
Note: If you are really exceptionally paranoid you may be able to obtain even better guarantees by developing more powerful cryptographic techniques tailored for this application. In particular, the secret key used by the encryption procedure may be leaked to a sufficiently powerful AI through a side channel (the computations involving the key typically leak some data in the form of radiation, which might be recovered from sufficiently exhaustive sensor data by a sufficiently powerful AI). By doing these computations on a quantum computer you can probably guarantee that no data was leaked. If you want to be even more fantastically paranoid (ie, if you want a formal proof of security, which is a rare treat in real problems) you will actually have to shred the key, guaranteeing that an AI will have to interact with the shredded key to recover it. If you do this, you have to use another new primitive to implement the final revelation of the friendly AI's message.
If you have any concerns about the security of this protocol, I would be happy to try and defend it and would be surprised and interested if I failed. Whether or not such a cryptographic box is really an interesting or important object is another question. (It is interesting to me as an unexpected application of cryptography).
Running a C program with 500MB of memory would undoubtedly be very slow. Current technologies would be even slower than you suspect for other reasons.
However, there are other languages than C , and for some this transformation is much, much more efficient. The game of life is at least parallel, but is also extremely inefficient. You could instead use some fundamentally parallel language which has all of the benefits of the game of life while actually being structured to do computation. Coincidentally, many AI researchers already do work in functional languages like Lisp, which are very well-suited to this transformation. Running a huge List program homomorphically might be practical if you had constant rate homomorphic encryption, which I suspect we will eventually. The slowdown could be something like 10x as slow, running on 20x as many processors.
When I say that I think the problem is theoretically possible, I do also mean that I think significant speedups are theoretically possible. Homomorphic encryption is an unnecessarily huge hammer for this problem. Its just that no one has thought about the problem before (because cryptographers don't care about unfriendly AI and people who care about friendly AI don't think this approach is viable) so we have to co-opt a solution to a normal human problem.
I could include a complete discussion of these issues, and you are probably right that it would have made a more interesting article. Instead I initially assumed that anyone who didn't know a lot about cryptography had little chance of learning it from a blog post (and people who did know wouldn't appreciate the exposition), and I should post a convincing enough argument that I could resolve the usual concerns about AI's in boxes. I think as it stands I am going to write a post about theoretically safe things to do with an unfriendly AI in a box, and then conditioned on that going well I may revisit the question of building a box in a more well thought out way. Until then I will perhaps add in a paragraph with some more clear exposition, as you suggest.
That sounds reasonable. I agree a complete discussion is probably too complicated, but it certainly seems a few simple examples of the sort I eventually gave would probably help most people understand -- it certainly helped me, and I think many other people were puzzled, whereas with the simple examples I have now, I think (although I can't be sure) I have a simplistic but essentially accurate idea of the possibilities.
I'm sorry if I sounded overly negative before: I definitely had problems with the post, but didn't mean to negative about it.
If I were brea... (read more)