by Jaime Sevilla, Lennart Heim, Marius Hobbhahn, Tamay Besiroglu, and Anson Ho
You can find the complete article here. We provide a short summary below.
In short:
To estimate the compute used to train a Deep Learning model we can either: 1) directly count the number of operations needed or 2) estimate it from GPU time.
Method 1: Counting operations in the model
2×# of connectionsOperations per forward pass×3Backward-forward adjustment×# training examples×# epochsNumber of passes
Method 2: GPU time
training time×# cores×peak FLOP/s×utilization rate
We are uncertain about what utilization rate is best, but our recommendation is 30% for Large Language Models and 40% for other models.
You can read more about method 1 here and about method 2 here.
Other
... (read more)