How do labs working at or near the frontier assess major architecture and/or algorithm changes before committing huge compute resources to try them out? For example, how do they assess stability and sample efficiency without having to do full-scale runs?
Thanks for this answer! Interesting. It sounds like the process may be less systematized than how I imagined it to be.