User Comment Replies

For that earlier section, we used smaller models trained on $S_{4}$ intersect $A_{4} \times 2$ (4,000 parameters) instead of $S_{5}$ intersect $A_{5} \times 2$ (80,000 parameters) -- the only reason for this was to allow for a larger sample size of 10,000 models with our compute budget. All subsequent sections use the $S_{5}$ models.

LESSWRONG
LW

All of Wilson Wu's Comments + Replies