An interactive introduction to grokking and mechanistic interpretability
Our write up largely agrees with @Quintin Pope's summary, with the addition of training trajectory visualizations and an explanation of the MLP construction that solves modular addition. A meta note that didn't make it into the article — with so many people looking into this problem over the last 18...
Lots of custom d3 https://github.com/PAIR-code/ai-explorables/tree/master/source/grokking