There are a few features of this scheme that make it better than a. hardware kill switch I think.
Agreed that side channel attacks and hardware flaws are a major limitation of the proposal. But bugs are a limitation of any AGI control scheme and are not specific to this one.
Sharing the source won't matter without the key to encrypt and decrypt the model. The weights of a model are only usefully intelligent for encrypted info!
I agree that this scheme does not directly address the problem of covert undetectable malignant behavior resulting in the theft of m of n key shards.
To successfully credibly threaten the key holders requires breaking out of the FHE in the first place or using covert manipulation to generate a real world threat. Critically it also requires knowing who the anonymous key holders are.
Stealing the source code/weights doesn't matter because it was trained in FHE resulting in them being useless without the key. At no point in the process of training, running, or killing the FHE AGI are the models weights compatible with understanding reality without the decryption key.
Nanorobots requires either covert undetectable malignant behavior powerful enough to create nanobots or breaking multiple FHE systems.
Shutting off power to the AI's servers results in a useless model, not a unchained model.
It's not either or. You can and should do both schemes.
Thanks for the reply! In addition to downvoting could some one reply with a substantive critique? I don't understand the relevance of the xkcd comic. FHE is a way to compute on encrypted data. IO is a way to hide the contents of a program while it is running. Because of these properties FHE/IO crypto as a kill switch has very different adversarial dynamics from crypto as a privacy mechanism. Could you explain why the comic invalidates the concept so I understand your point better? My current understanding is that FHE AGI doesn't know if it is captured, no individual or group has the clear text key or key shards (due to key splitting and IO), For an AGI to break out of the jail it would need to break several different FHE systems. It certainly could grow smart enough to recognize it's in an FHE world and break several different FHE systems. My biggest critiques of the idea is that it is computationally expensive, it is hard to know the threshold of intelligence/knowledge required to break FHE, the bottleneck of running input and output to the FHE AGI through a key quorum. I am sure I am missing other important critiques that invalidate the idea but I don't see the xkcd cartoon as one of them.
I agree with you. Humans are the weakest link. This scheme isn't able to solve that. It is meant to be a marginal improvement over a physical kill switch and make theft and release of the model more difficult.