Epoch AI collects key data on machine learning models from 1950 to the present to analyze historical and contemporary progress in AI. This is a big update to the website, and the datasets have substantially expanded since last year.
The performance of machine learning models is closely related to their amount of training data, compute, and number of parameters. At Epoch, we’re investigating the key inputs that enable today’s AIs to reach new heights. Our recently expanded Parameter, Compute and Data Trends database traces these details for hundreds of...
Summary: Some techniques allow to increase the performance of Machine Learning models at the cost of more expensive inference, or reduce inference compute at the cost of lower performance. This possibility induces a tradeoff between spending more resources on training or on inference. We explore the characteristics of this tradeoff...
Summary: As part of my work at Epoch, I investigated the horizon length hypothesis - the idea that the horizon length of a task is predictive of the training compute needed to learn that task. My current (weak) conclusion is that the horizon length hypothesis can't be used in practice...
Common shape of a scaling law, taken from Hestness et al. (2017) Executive summary * Scaling laws are predictable relations between the scale of a mode and performance or other useful properties. * I have collected a database of scaling laws for different tasks and architectures, and reviewed dozens of...
Summary: I illustrate the relationship between Wentworth-style causal abstractions and infradistributions, and how they both deal with nonrealizability by throwing away information. If you have a basic intuition for causal abstractions, this might help you understand infradistributions better. And if you are comfortable with infradistributions, this might help you translate...
Summary: Based on our previous analysis of trends in dataset size, we project the growth of dataset size in the language and vision domains. We explore the limits of this trend by estimating the total stock of available unlabeled data over the next decades. Read the full paper in arXiv....