(Not) Explaining GPT-2-Small Forward Passes with Edge-Level Autoencoder Circuits
Repo: https://github.com/DavidUdell/sparse_circuit_discovery TL;DR: A SPAR project from a while back. A replication of an unsupervised circuit discovery algorithm in GPT-2-small, with a negative result. Thanks to Justis Mills for draft feedback and to Neuronpedia for interpretability data. Introduction I (David) first heard about sparse autoencoders at a Bay Area party....