JackV comments on Cryptographic Boxes for Unfriendly AI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (155)
Let me see if I understand. Firstly, is there any reason what you're trying to do is create a friendly AI? Would, for instance, getting an unknown AI to solve a specific numerical problem with an objectively checkable answer be an equally relevant example, without the distraction of whether we would ever trust the so-called friendly AI?[1]
If that is the case, then as I understand the proposal, we take the code for the unknown AI and the input data, use this encryption scheme on it, and then have a program which performs various operations on them.
As I understand my 30-second reading of wikipedia "various operations" includes only addition and multiplication of encrypted data, which is very useful for lots of things, but not enough to run a program (which typically involves, say, branching). But this doesn't matter because (a) maybe you could have a program which works like that or (b) even if only some of the content is encrypted, but "when we branch" is not, it's still useful for the purpose.
At the end of this, we have an encrypted result. We then -- still using the encrypted data -- test this result against the problem we set. If it matches, we decrypt the answer. If not, we don't.
The intended purpose is (a) the machine can't spit out an answer that instead of being an actual answer says "mwahahahaha I trick you into running me on the internet" or "Hi! Actually I'm a friendly AI after all. Free me!" because if it doesn't match, it doesn't get read, and (b) it's harder for the AI to -- for instance -- hack the hardware of the computer it's running on, because as all the data is encrypted, it has no way of knowing where of the architecture the data lives, etc.
Is this useful? I'm really not sure. (a) fails because you need the key to decrypt the data, so if you get curious, you may decrypt a "bad" answer just to see. What you need is a different cyptography scheme, one where you can ONLY decrypt the answer if it matches certain predicates. (b) may be relevant, if you're worried about the AI hacking the computer it runs on, this is obviously much, much, harder. But it does still have some channels -- for instance, I think it's possible to monitor program flow even if not program content, so it could encode a threatening message in the pattern of branches.
[1] The question of whether we ever CAN test a friendly AI is interesting in itself, more for what it says about our assumptions than what it says about AI, but I will return to this.
Other thoughts not relevant to the central point:
(a) Do you really want to rely on cryptography to contain an unfriendly super-intelligence? It's better than NOT doing so. But notice that all the cryptographic schemes we have ever had have one thing in common: they've been broken, or we expect them to be broken eventually. it's an arms race, and if the AI is that great, then even if it can't factorise large numbers in reasonable polynomial time, it may be able to crack the encryption.
(b) We could have simply used a normal non-encrypted program to "compare answer to problem, test, reject or publish". It would have been somewhat more vulnerable to being hacked by bad input, but not necessarily much more.
(c) I don't think we're able to say much about friendly or non-friendly AIs until we know a lot more about building AI at all, although that doesn't mean it's bad to think about. The idea of a test for a friendly AI is, um, worrying. Judging by computing history, all our current thoughts will turn out to be incredibly simplistic.
(d) We're more certain of the existence of this encryption scheme, provided we're right about certain mathematical results about lattices. OTOH, if we DID build an unfriendly AI and it DID get loose on the internet and take over the world, is "well, what do you know, we were wrong about lattices" a consolation? :)
I think most people strongly suspect that there exist cryptographic schemes which can't be broken, because they are actually hard. RSA is probably such a scheme against classical adversaries (as our AI would be). Its true that they could break it without factoring numbers, but they can't break it without defeating the RSA assumption.
We know that there exist such schemes. One times pads are an example. What we don't know is whether there exist secure crypto schemes without a shared source of randomness.
Note however that the existence of such systems implies that P != NP, so this is a fairly strong statement. The existence of secure homomorphic encryption implies that P != NP. So whatever your confidence is that P != NP your confidence in this should be lower.
I think we've had this very discussion before :)
Well, we had it about P ?= NP. So how much does your confidence go down for the stronger claim?
Significantly. Its very hard to put probabilities on this sort of thing, but I'd take a bet if the odds were... 100:1? I don't know if I would take significantly worse odds. My brain doesn't handle either small probabilities or epistemic uncertainty very well.
Hmm, that's very interesting. I'm surprised that the estimates aren't closer together. My own estimate for the existence of provably secure homomorphic encryption given that P != NP is very high, around, .75. So it seems that you much more strongly believe that P !=NP but are much less comfortable given that P !=NP assigning a high probability to the existence of homomorphic encryption.
Best to be careful with the term "provably secure" - since "provable security" is a technical term with a rather counter-intuitive meaning.
I definitely have the impression that even if the hard problem a cryptosystem is based on actually is hard (which is yet to be proved, but I agree is almost certainly true), most of the time the algorithm used to actually encrypt stuff is not completely without flaws, which are successively patched and exploited. I thought this was obvious, just how everyone assumed it worked! Obviously an algorithm which (a) uses a long key length and (b) is optimised for simplicity rather than speed is more likely to be secure, but is it really the consensus that some cryptosystems are free from obscure flaws? Haven't now-broken systems in the past been considered nearly infalliable? Perhaps someone (or you if you do) who knows professional cryoptography systems can clear that up, which ought to be pretty obvious?
A provably correct implementation of any reasonable encryption scheme requires automatic verification tools which are way beyond our current ability. I strongly suspect that we will solve this problem long before AGI though, unless we happen to stumble upon AGI rather by accident.
I think that you could implement the scheme I described with reasonable confidence using modern practices. Implementing encryption alone is much easier than building an entire secure system. Most flaws in practice come from implementation issues in the system surrounding the cryptography, not the cryptography itself. Here the surrounding system is extremely simple.
You also have a large advantage because the design of the system already protects you from almost all side channel attacks, which represent the overwhelming majority of legitimate breaks in reasonable cryptosystems.
This isn't really right. With the cases where algorithms can be proved as secure as some math problem, you can attack the protocol they are used in - or the RNG that seeds them - but the not really the algorithm itself. Of course, not all cryptography is like that, though.
http://en.wikipedia.org/wiki/Provable_security