How do labs working at or near the frontier assess major architecture and/or algorithm changes before committing huge compute resources to try them out? For example, how do they assess stability and sample efficiency without having to do full-scale runs?
Thanks! It's no problem :)
Agreed that the interview is worth watching in full for those interested in the topic. I don't think it answers your question in full detail, unless I've forgotten something they said - but it is evidence.
(Edit: Dwarkesh also posts full transcripts of his interviews to his website. They aren't obviously machine-transcribed or anything, more like what you'd expect from a transcribed interview in a news publication. You'll lose some body language/tone details from the video interview, but may be worth it for some people, since most can probably read the whole thing in less time than just watching the interview at normal speed.)