Pfft comments on Cryptographic Boxes for Unfriendly AI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (155)
I am sorry that my writing is unclear, but I think you should be a little more charitable.
In particular, note that I wrote explicitly "the output of the program will be homomorphically encrypted." This is exactly the difference between program obfuscation and homomorphic encryption. A homomorphic encryption scheme allows you to do exactly what I just did with it. In this case, you as the user still have access to the private key which can be used to decrypt the homomorphically encrypted outputs, once the AI has been safely dismantled.
The source code verifier for friendliness is basically the weakest possible thing a friendliness researcher could hope for. It doesn't verify all source code: it verifies source code of an arbitrarily special form. If you can ever prove that any AI is friendly, then you have to implement something at least this strong. I believe that the possibility of putting an AI in a box has been discussed at reasonable length here as a solution to precisely the problem I am considering. I think that trying to reduce the demands on friendliness to a minimum, and motivating the cryptographic tools we will need for that reduction, is worthwhile. After all, I am a cryptographer not an AI researcher, so I do what I can.
I am sure that the uFAI is non-quantum because I am the one running it on my computer, and it is a classical computer. If I were using the scheme of Gentry, I could not possibly run a quantum AI in this way. The measurements I perform during the homomorphic operations destroy any quantum computation the AI tries to do, even if the computer isn't classical. I believe that, on top of all this, Gentry's scheme is conjecturally secure against quantum adversaries.
In terms of the security guarantees, I don't trust them all that much, but its a lot better than nothing. There is some significant probability that the schemes are completely secure, in which case we are really safe. Otherwise, we lost nothing except a small amount of time. We can also hope that by the time this is an issue we will have unconditionally secure schemes (although this seems sort of hopelessly optimistic, I will admit).
In the last section I am suggesting precisely destroying all copies of the private key and then expecting to decrypt the answer myself. This requires producing a new object which allows you to decode the output of a certifiably friendly AI, but not to decode any of the homomorphic encryptions in which the uFAI is running. You are right that this is probably impossible classically, but I have reasonably good evidence that it is possible quantum mechanically. I think this probably isn't reasonable to discuss until the much more basic confusion is cleared up.
This is a good observation -- I think the article would be stronger if you added it to the intro paragraph. As it is, I was a bit amused when reading it when you say "this creates a problem", and then mention the problem of the box, while the problem of the verifier seems immensely harder...