For that earlier section, we used smaller models trained on S4 intersect A4×2 (4,000 parameters) instead of S5 intersect A5×2 (80,000 parameters) -- the only reason for this was to allow for a larger sample size of 10,000 models with our compute budget. All subsequent sections use the S5 models.
"Utter elitism" is a nice article about this phenomenon