Mechanistic Interpretability Reading group
Greetings. This is an open invitation for folks interested in technical AI alignment literature to join our reading group, which has been running for 22 weeks now. Our meetings take place online via the Mechanistic Interpretability Group Discord server, accessible through this invite link. Each week 4-5 papers are shortlisted...
Embedding norm is a proxy with many conflated factors, you'd wanna run ablations instead of using it as conclusive.
Also, the unused tokens -> weight decay assumes embeddings had decoupled decay and werent tied to the LM head, and no input-output tying. Does the model card specify details on this? Otherwise we can't assume so.