Curated. I feel like I've heard the idea of tracking GPU compute over the years, but always in the high-level abstract, with no one actually thinking hard about what it would look like and how feasible it is. I'm very in favor of people trying to flesh out proposals into something that can be discussed concretely, so kudos to this author.
There are some serious issues that need to be overcome for any scheme of this nature to be secure.
I think that from a quick read of the paper or from the summary in the post one might be led to believe that such a scheme could be implemented with a few years of engineering effort and the cooperation of chip manufacturers. In fact, substantial advances in cryptography would be required.
Policy-makers' attempts to enforce policy by requiring the use of special chips have in the past largely been broken: Clipper chip, DRM via SGX, etc.
When I first saw "save all weights to on chip hardware", I thought it would be super expensive, but actually saving like 5 times the GPU's memory to a seperate flash chip would only cost $20 (80GB*5 at 5 cents per gigabyte for flash storage). It can be way cheaper bc it's low bandwidth and slow.
It feels like the implicit message here is "And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules", which... really doesn't seem realistic, for a million reasons?
Like, even assuming major powers can agree to an "AI non-proliferation treaty" with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go "What are you gonna do, invade us?", under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can't be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.
Anyway, my point is: any analysis of a "restrict all compute everywhere" strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.
It feels like the author or this paper haven't even begun to do that work.
There's also the infohazard problem since it heavily involves geopolitical considerations, if someone or some group actually figured out a practical means, would they ever reveal it publicly?
Or is it even possible to reveal such a design without undermining the means by which it functions?
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Isn't this akin to a protocol for securely monitoring private industry's experiments in thermonuclear weapons? It's better than nothing, but when something is dangerous enough, industrial regulation is never strict enough.
Some things are too dangerous to allow private competition in. The only sensible thing to do is nationalize them and have them run exclusively by extremely security-minded government agencies, if at all. And even that might not be good enough for AI, because we've never had tech whose base case scenario was "kill everyone".
The provable training part seems achievable using the existing cryptography and without requiring Confidential Computing techniques.
For example, by committing to weights at the snapshot point of time using a Hash function and also generating proof that, indeed, at least amount of T has passed between each snapshot using Verifiable Delay Functions (VDFs), we can already achieve a rate limit at how quick someone can train a model. In addition to that, to prove that we have been consistent and the next weight checkpoint was correctly produced by applying X training operations from the previous checkpoint, we can use zk-SNARKs to generate a succinct proof. As this is essentially a continuous process of proving computations, we can also apply ideas from incrementally verifiable computations (IVC), making the scheme even more efficient. I must note that SNARKs still need to be sufficiently efficient to prove extensive computations, such as training of neural networks such as GPT-4, but there are already examples of using them to prove training of GPT-2. Furthermore, as the whole scheme can be made Zero Knowledge, we can have the prover proactively submit the proofs to the verifier to ensure they are consistently compliant.
To ensure that someone cannot ignore the regulation and do training hidden, we would need to require any capable training hardware to be actively registered with the officials. Requiring an active signature from a government server to enable doing X computations could be a solution. This should be achievable as SGX and similar providers have only incremental counters or alternative solutions. Furthermore, requesting computation integrity is a far less strict property on the SGX hardware than attestation, as it does not depend on crucial extraction attacks and would require actual chip-level alterations. Considering we are already using a lot of similar gadgets around ourselves, supported by ARM TEE, this should be achievable.
Current SNARK provers are many orders of magnitude slower than the underlying computation - certainly you can prove that you took sufficient time to do the computation (e.g. using a VDF) or did it in few enough steps or according to some policy (e.g. using NARKS) but what is the point of using a modern GPU in the first place if you're going to be limited to speeds easily achievable by a 1990's CPU?
The only way this scheme becomes more useful than just banning GPU usage outright is if the proof of policy compliance can be generated only slightly slower than actually doing the computation. We don't have primitives that can do that currently.
I agree that the current state of the art does indeed very much limit the throughput of the models. However, considering the current effort in scaling the space and the amount of funding it has been receiving, I would hope that in 5-10 year time this would be a more negligible slow down. After all, it has been shown that several SNARKs and STARKs can achieve linear prover time, so right now it is all about fighting the constants and getting the schemes ever more efficient. And, considering how usually slow the governments are even with urgent matters, 5-10 year period should be just right for that. For the temporary solution we could reserve to SGX and other TEE providers if we have to, and trust that the patching they constantly release would work for now.
After all, this is better than nothing, and having some, even potentially not 100% effective solutions would be good for our cause. As long as we can gradually improve their security in parallel to growing concern over the AI threat.
Yonadav Shavit (CS PhD student at Harvard) recently released a paper titled What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring.
The paper describes a compute monitoring regime that could allow governments to monitor training runs and detect deviations from training run regulations.
I think it's one of the most detailed public write-ups about compute governance, and I recommend AI governance folks read (or skim) it. A few highlights below (bolding mine).
Abstract:
As advanced machine learning systems' capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other's compliance with potential future international agreements on advanced ML development. This work analyzes one mechanism to achieve this, by monitoring the computing hardware used for large-scale NN training. The framework's primary goal is to provide governments high confidence that no actor uses large quantities of specialized ML chips to execute a training run in violation of agreed rules. At the same time, the system does not curtail the use of consumer computing devices, and maintains the privacy and confidentiality of ML practitioners' models, data, and hyperparameters. The system consists of interventions at three stages: (1) using on-chip firmware to occasionally save snapshots of the the neural network weights stored in device memory, in a form that an inspector could later retrieve; (2) saving sufficient information about each training run to prove to inspectors the details of the training run that had resulted in the snapshotted weights; and (3) monitoring the chip supply chain to ensure that no actor can avoid discovery by amassing a large quantity of un-tracked chips. The proposed design decomposes the ML training rule verification problem into a series of narrow technical challenges, including a new variant of the Proof-of-Learning problem [Jia et al. '21].
Solution overview:
In this section, we outline a high-level technical plan, illustrated in Figure 1, for Verifiers to monitor Provers’ ML chips for evidence that a large rule-violating training occurred. The framework revolves around chip inspections: the Verifier will inspect a sufficient random sample of the Prover’s chips (Section 3.2), and confirm that none of these chips contributed to a rule-violating training run. For the Verifier to ascertain compliance from simply inspecting a chip, we will need interventions at three stages: on the chip, at the Prover’s data-center, and in the supply chain.
These steps, put together, enable a chain of guarantees:
Thus, so long as the Prover complies with the Verifier’s steps, the Verifier will detect the Prover’s rule-violation with high probability. Just as in financial audits, a Prover’s refusal to comply with the verification steps would itself represent an indication of guilt.