This is a special post for quick takes by Arthur Conmy. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
1 comment, sorted by Click to highlight new comments since:


Has anyone done any reproduction of double descent [https://openai.com/blog/deep-double-descent/] on the transformers they train (or better, GPT-like transformers)? Since grokking can be somewhat understood by transformer interpretability [https://openreview.net/forum?id=9XFSbDPmdW], this seems like a possibly tractable direction