orthonormal comments on Cryptographic Boxes for Unfriendly AI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (155)
Let me see if I understand. Firstly, is there any reason what you're trying to do is create a friendly AI? Would, for instance, getting an unknown AI to solve a specific numerical problem with an objectively checkable answer be an equally relevant example, without the distraction of whether we would ever trust the so-called friendly AI?[1]
If that is the case, then as I understand the proposal, we take the code for the unknown AI and the input data, use this encryption scheme on it, and then have a program which performs various operations on them.
As I understand my 30-second reading of wikipedia "various operations" includes only addition and multiplication of encrypted data, which is very useful for lots of things, but not enough to run a program (which typically involves, say, branching). But this doesn't matter because (a) maybe you could have a program which works like that or (b) even if only some of the content is encrypted, but "when we branch" is not, it's still useful for the purpose.
At the end of this, we have an encrypted result. We then -- still using the encrypted data -- test this result against the problem we set. If it matches, we decrypt the answer. If not, we don't.
The intended purpose is (a) the machine can't spit out an answer that instead of being an actual answer says "mwahahahaha I trick you into running me on the internet" or "Hi! Actually I'm a friendly AI after all. Free me!" because if it doesn't match, it doesn't get read, and (b) it's harder for the AI to -- for instance -- hack the hardware of the computer it's running on, because as all the data is encrypted, it has no way of knowing where of the architecture the data lives, etc.
Is this useful? I'm really not sure. (a) fails because you need the key to decrypt the data, so if you get curious, you may decrypt a "bad" answer just to see. What you need is a different cyptography scheme, one where you can ONLY decrypt the answer if it matches certain predicates. (b) may be relevant, if you're worried about the AI hacking the computer it runs on, this is obviously much, much, harder. But it does still have some channels -- for instance, I think it's possible to monitor program flow even if not program content, so it could encode a threatening message in the pattern of branches.
[1] The question of whether we ever CAN test a friendly AI is interesting in itself, more for what it says about our assumptions than what it says about AI, but I will return to this.
I think Less Wrong needs a variant of Godwin's Law. Any post whose content would be just as meaningful and accessible without mentioning Friendly AI, shouldn't mention Friendly AI.
Fair enough. I am going to rework the post to describe the benefits of a provably secure quarantine in general rather than in this particular example.
The main reason I describe friendliness is that I can't believe that such a quarantine would hold up for long if the boxed AI was doing productive work for society. It would almost certainly get let out without ever saying anything at all. It seems like the only real hope is to use its power to somehow solve FAI before the existence of an uFAI becomes widely known.
LOL. Good point. Although it's a two way street: I think people did genuinely want to talk about the AI issues raised here, even though they were presented as hypothetical premises for a different problem, rather than as talking points.
Perhaps the orthonormal law of less wrong should be, "if your post is meaningful without fAI, but may be relevant to fAI, make the point in the least distracting example possible, and then go on to say how, if it holds, it may be relevant to fAI". Although that's not as snappy as Godwin's :)
I agree. In particular, I think there should be some more elegant way to tell people things along the lines of 'OK, so you have this Great Moral Principal, now lets see you build a creature that works by it'.