Report on Frontier Model Training
Understanding what drives the rising capabilities of AI is important for those who work to forecast, regulate, or ensure the safety of AI. Regulations on the export of powerful GPUs need to be informed by understanding of how these GPUs are used, forecasts need to be informed by bottlenecks, and safety needs to be informed by an understanding of how the models of the future might be trained. A clearer understanding would enable policy makers to target regulations in such a way that they are difficult for companies to circumvent with only technically compliant GPUs, forecasters to avoid focus on unreliable metrics, and technical research working on mitigating the downsides of AI to understand what data models might be trained on. This doc is built from a collection of smaller docs I wrote on a bunch of different aspects of frontier model training I consider important. I hope for people to be able to use this document as a collection of resources, to draw from it the information they find important and inform their own models. I do not expect this doc to have a substantial impact on any serious AI labs capabilities efforts - I think my conclusions are largely discoverable in the process of attempting to scale AIs or for substantially less money than a serious such attempt would cost. Additionally I expect major labs already know many of the things in this report. Acknowledgements I’d like to thank the following people for their feedback, advice, and discussion: * James Bradbury, Software Engineer, Google DeepMind * Benjamin Edelman, Ph.D. Candidate, Harvard University * Lukas Finnveden, Research Analyst, Open Philanthropy Project * Horace He, Software Engineer, PyTorch/Meta * Joanna Morningstar, Chief Scientific Officer, Nanotronics * Keller Scholl, Ph.D. Candidate, Pardee RAND Graduate School * Jaime Sevilla, Director, Epoch * Cody Wild, Research Engineer, Google Index Cost Breakdown of ML Training Estimates the costs of training a frontier (state o
IMO this is a potentially significant issue that this post should have spent more time addressing, since it means that the earlier sections of the trend lines are coming from a source we know nothing about.