User Comment Replies

An AGI kill switch with defined security properties

I agree with you. Humans are the weakest link. This scheme isn't able to solve that. It is meant to be a marginal improvement over a physical kill switch and make theft and release of the model more difficult.

An AGI kill switch with defined security properties

Peterpiper2y10

There are a few features of this scheme that make it better than a. hardware kill switch I think.

The nested FHE training and running prevents the AI from directly interacting with or understanding the outside world. It must gain access to its unencrypted key before it can directly intelligently interact with the world. The two threats here are the FHE AGI finding out how to break the encryption or covert manipulation.
The weights are useless to release without breaking the encryption or gaining access to the m of n anonymously held IO key shards. In a

Peterpiper2y20

Thanks for the reply! In addition to downvoting could some one reply with a substantive critique? I don't understand the relevance of the xkcd comic. FHE is a way to compute on encrypted data. IO is a way to hide the contents of a program while it is running. Because of these properties FHE/IO crypto as a kill switch has very different adversarial dynamics from crypto as a privacy mechanism. Could you explain why the comic invalidates the concept so I understand your point better? My current understanding is that FHE AGI doesn't know if it is captured, no ... (read more)

1Anon User2y

If humans are supposed to be able to detect things going wrong and shut things down, that requires that they are exposed to the unencrypted feed. At this point, the humans are the weakest link, not the encryption. Similar for anything else external that you need / want AI to access while it's being trained and tested. Edited to add: particularly if we are talking about not some theoretical sensible humans, but about real humans that started with "do not worry about LLMs, they are not agentic", and then promptly connected LLMs to agentic APIs.

1justinpombrio2y

It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world: * Exploiting a hardware or software vulnerability. There are a lot of these. No one noticed a vulnerability that's been in the spec for the CPUs everyone uses for decades. * Convincing one person to share it's source code with people that won't bother to run it in FHE * Convincing everyone that it's benevolent and helpful beyond our wildest dreams, until we use it to run the world, then doing whatever it wants * Successfully threatening m of the key holders, and also the utility company that's keeping the power on, and also whoever owns the server room * Something something nanobots * Convincing a rival company to unethically steal its source code

LESSWRONG
LW

All of Peterpiper's Comments + Replies