LESSWRONG
LW

Asma Ghandeharioun

Message

An interactive introduction to grokking and mechanistic interpretability

Our write up largely agrees with @Quintin Pope's summary, with the addition of training trajectory visualizations and an explanation of the MLP construction that solves modular addition. A meta note that didn't make it into the article — with so many people looking into this problem over the last 18...

Aug 7, 2023•23

Asma Ghandeharioun

Asma Ghandeharioun — LessWrong

Asma Ghandeharioun

Message

An interactive introduction to grokking and mechanistic interpretability

Aug 7, 2023•23

Asma Ghandeharioun

An interactive introduction to grokking and mechanistic interpretability

Adam Pearce

Adam Pearce, Asma Ghandeharioun+ 0 more

Adam Pearce, Asma Ghandeharioun

Our write up largely agrees with @Quintin Pope's summary, with the addition of training trajectory visualizations and an explanation of the MLP construction that solves modular addition.

A meta note that didn't make it into the article — with so many people looking into this problem over the last 18 months, I'm surprised this construction took so long to find. The modular addition task with a 1-layer MLP is about as simple as you can get!^[1]

Scaling mechanistic interpretability up to more complex tasks/models seems worth continuing to try, but I'm less sure extracting crisp explanations will be possible.^[2] Even if we "solve" superposition, figuring the construction here — where there's no superposition in... (read 173 more words →)