Thank you for the comment! Let me reply to your specific points.
First and TL; DR, in terms of whether NTK parameterization is "right" or "wrong" is perhaps an issue of prescriptivism vs. descriptivism: regardless of which one is "better", the NTK parameterization is (close to what is) commonly used in practice, and so if you're interested in modeling what practitioners do, it's a very useful setting to study. Additionally, one disadvantage of maximal update parameterization from the point of view of interpretability is that it's in the strong-coupling regi...
Sho and I want to thank jylin04 for this really nice post and endorse the distillation of our key results in her 8-page summary. We also agree that it would be interesting to make further connections between our work -- in particular the effective theory framework -- and interpretability, and we'd be really glad to explore and discuss that further.
I imagine that the Peekskill, New York location might be similar in setting, environment, and overall relationship to NYC as Princeton, NJ. So it might be worth talking to people who've spent time at one of the universities or institutes in Princeton in order to understand the relative merits of such a setting and how they felt about the balance there between rural and urban.
(My disclosure is that I have spent time in such a setting and found it overly isolating, to the point of struggling to get any useful work completed there, and ended up moving t...
Thanks for your summary of the book!
I think that the post and analysis is some evidence that it might perhaps be tractable to apply tools from the book directly to transformer architectures and LLMs.