x

LESSWRONG
LW

PaulPauls — LessWrong

PaulPauls

PaulPauls

Message

20

1

1

1y

PaulPauls

20

1y

Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders

I recently published a rather big side project of mine that attempts to replicate the mechanistic interpretability research on proprietary and open-source LLMs that was quite popular this year and produced great research papers by Anthropic[1][2], OpenAI[3][4] and Google Deepmind[5] with the humble but open source Llama 3.2-3B model. The...

Nov 24, 2024•19