Pointers to prior art in closely similar ideas (especially if they have more technical depth or detail than this post.)
As it happens, this is exactly one of the proposed use-cases for "k-time programs" in cryptography: https://eprint.iacr.org/2022/658.pdf
Alternatively, we could release a sensitive and proprietary program (such as a well-trained ML model) and be guaranteed that the program can be used only a limited number of times, thus potentially preventing over-use, mission-creep, or reverse engineering. Such programs can also be viewed as a commitment to a potentially exponential number of values, with a guarantee that only few of these values are ever opened.
(I don't buy their polymer idea though.)
Given this as a foundation, I wonder if it'd be possible to make systems that report potentially dangerously high concentrations of compute, places where an abnormally large amount of hardware is running abnormally hot, in an abnormally densely connected network (where members are communicating with very low latency, suggesting that they're all in the same datacenter).
Could it be argued that potentially dangerous ML projects will usually have that characteristic, and that ordinary distributed computations (EG, multiplayer gaming) will not? If so, a system like this could expose unregistered ML projects without imposing any loss of privacy on ordinary users.
I think this depends a lot on the use case. I envision for the most part this would be used in/on large known clusters of computation, as an independent check on computation usage and a failsafe. In that case it will be pretty easy to distinguish from other uses like gaming or cryptocurrency mining. If we're in the regime where we're worried about sneaky efforts to assemble lots of GPUs under the radar and do ML with them, then I'd expect there would be pattern analysis methods that could be used as you suggest, or the system could be set up to feed back more information than just computation usage.
If someone tries to mint an AI safety coin, I will go explain to as many people as I can about some the details of how cryptocurrency is an obvious, obvious scam involving rich people minting worthless tokens and selling them to poor people, who are much more likely to fall for this sort of thing.
For example, anecdotal evidence of poor people from 2017 getting rich, even though the vast majority of the trading volume was money moving from poor people to rich people, since rich people knew exactly when to buy and sell, and poor people didn't know when to buy and sell because they randomly oscillated between believing false arguments that cryptocurrency could replace fiat currency without overwhelming retaliation by the government, then realizing that if it looks like a scam then it probably is, then encountering carefully-selected anecdotal evidence of poor people from 2017 becoming rich even though they themself didn't, and then oscillating back and forth from there. If you were spending cryptocurrency in 2017 and weren't a perpetrator yourself, then this almost certainly happened to you, almost exactly as I described.
All that needs to happen is zero AI safety coins get minted from this point on, and I will return to avoiding the topic unless someone else brings it up. If that's not what happens (i.e. if someone starts to mint an AI safety coin), I'll try to protect as many people as I can.
Cryptography and blockchain are fine, of course, and may be helpful for AI safety. Generating and selling "coins" to people who care about AI safety is not. There is a well-developed immune system for that here.
The purpose of the COMPUTE token and blockchain here would be to provide a publicly verifiable ledger of the computation done by the computational cores. It would not be integral to the scheme but would be useful for separating the monitoring and control, as detailed in the post. I hope it is clear that a token as a tradeable asset is not at all important to the core idea.
Overview
The idea of this post is to describe, discuss, and if warranted understand how to create, a model of crypto-fed computation.[1] The basic idea is that high-powered GPU (or other ML-specialized) hardware could be equipped with in-chip hardware locks such that the computational cores require a steady stream of cryptographic keys in order to continue performing. In the absence of such continually supplied keys the hardware would downgrade to a small or zero fraction of its nominal capability.
Deploying such hardware would allow three key things:
I'll argue that such a capabilities appears to be quite technically feasible, and there may be plausible pathways to adopting it. However there are plenty of open questions, and many details to be nailed down before attempting such an effort.
How would it work
Let's lay out some design criteria for such a system, aiming to provide at least the three services above in situations where agents attempting to subvert the system are verycapable.
Example schemas
Here are a few examples of ways this might work. As this is not my area of expertise, it is extremely likely that superior schemes could be invented or already exist. I'll denote by Controller the agent that seeks to monitor/limit the activity done by the computational cores (CCs).
Coin-fed
A very simple idea is to have a crypto wallet "belonging" to a set of CCs, i.e. the CCs have the private key to the wallet and can initiate transactions on it. The wallet contains "COMPUTE" tokens. In order to do some number N of computations, the CC must send M (which is connected to N at the hardware level) tokens to some other (perhaps null) address, which is considered "burning" them. Once the wallet is empty, this can no longer be done, and computation stops. The external agent can then "feed" the computation simply by depositing coins in this wallet.
This is a pleasing scheme in that the interface layer is very simple, and well-established by cryptocurrencies; it could run on many existing blockchains. And COMPUTE could be an actual coin that is traded etc.
What's unclear to me is how reliable we can make the verification by the hardware of the "burn" transaction having taken -- i.e. eliminating the possibility if intercepting the "send to null" message and spoofing the verification, to get computation for free. Perhaps good solutions to this exist.
Continuous signed message exchange
A second method would have a hardware controller (HC) for each CC. Both the CC and the hardware controller would have a signature (private) key that is known only to them, is unreadable, and that never exists anywhere else. At initialization, the HC and CC exchange verification (public) keys.
Thereafter, the CC can send a signed message to the HC requesting permission to do N additional computations, and the HC can send permission or not. The HC side would have some software control panel that would provide an interface. It's likely that one HC could be paired with many CCs, which would be good for efficiency.
This scheme also relies on well-known and widely adopted technology, which is a big plus. One potential concern is that the communication technology on the CC feels like a nontrivial piece of software that would have to be provably secure against meddling (without extensive hardware changes) on the CC side.
Many keys
Another scheme, which is rather "low level" (i.e. feels embeddible in pretty raw form in hardware), is as follows.
This system is a bit less flexible than others (for example each CC would have a hard limit to how many computations it could ever do), but such simplicity could also be a strength. It also requires an additional "communication" layer for the CC to request additional keys from the Controller. This does not seem like a huge security issue, however, unless the keys are stolen from the Controller, since the keys are finite and single-use.
Combinations
The above schemes can be combined. For example, the following system seems pretty strong and redundant, albeit somewhat complex.
Applications
Here I describe in a bit more detail the envisaged applications of this hardware.
Monitoring/control/off-switch
A key question in terms of applications is whether the monitoring agent ("Monitor") is the same as the Controller. This is natural in the message-exchange and many-keys schemes. In these cases, the requests for keys translate directly into computation usage that can be monitored. (A possible weak point here is that the CC could "over request" keys for a time thus saving up "credits" to be used later. Mitigating this seems quite doable, however.)
If the Monitor and Controller are separate, something like the coin-fed scheme may be important. Monitoring in that scheme is simple, as the Monitor can simply watch the movements in and out of blockchain wallets.
In all cases, what is naturally monitored is computation usage. But of course other things could be required to be put into the "request" messages, as long as these are non-spoofable.
Control and the off-switch are also fairly obvious: once the tokens, or signed messages, or keys, stop being sent and run out, the hardware turns off or downcycles to a low rate. (The latter may be preferable so that processes can continue without disruption, just rather ineffectively – but this would depend upon the use case and risk assessment.)
The scheme lends itself equally well to the Monitor/Controller being part of the same organization (e.g. a security team monitoring a research team) or different organizations (e.g. an agency or hardware provider monitoring a company or lab.)
Other applications
Here are a few off-the-cuff ideas for other applications that may or may not have some market utility:
Obstacles and paths to adoption
Even if crypto-locked computational hardware could be engineered and built, this is no guarantee that it will actually be useful, i.e. either widely adopted or adopted in some critical areas.
Obstacles to adoption
Paths to adoption
Nonetheless there may be avenues to incentivize the development of such hardware, and at least scale it to the level where it is relatively commoditized and could be rolled out much more widely if needed or desired. (How to drive very widespread or universal adoption is, I think, a topic suitable for a separate study.) As some examples:
Some open questions
Help wanted
If this plan continues to look viable it is possible that FLI could invest non-negligible fiscal or other resources into getting it off the ground. But it's still embryonic. I'd love help on any of the following:
Acknowledgements
I have a vague recollection that someone, I think Connor Flexman, suggested to me a version of the "many keys" scheme.
I will not apologize for the unconventional and archaic use of the noun "computation" rather than the verb "compute." But as a peace offering I've called the coins "COMPUTE" coins. ↩︎