CLIP-Scope? Inspired by Gemma-Scope We trained 8 Sparse Autoencoders each on 1.2 billion tokens on different layers of a Vision Transformer. These (along with more technical details) can be accessed on huggingface. We also released a pypi package for ease of use. We hope that this will allow researchers to...
TL;DR A Sparse Autoencoder trained on CLIP activations detects features in youtube thumbnails associated with high view counts across a dataset of 524,288 thumbnails from youtube-8m. Most of these features are interpretable and monosemantic. The most strongly associated feature has a median view count > 6.5x the dataset median. After...
This article is a footnote to the (excellent) Tokenized SAEs. To briefly summarize: the authors (Thomas Dooms and Daniel Wilhelm) start by proposing the idea of a token "unigram". Let Ut be the unigram for a specific token t. Conceptually there is a unique Ut for each layer Mi in...
Summary 300 million GPT2-small activations are cached on s3, we pull these very quickly onto a g4dn.8xlarge ec2 instance in the same region and use them to train a 24,576 dimensional Switch Sparse Autocoder in 26 minutes (excluding generation time). We achieve similar L0/reconstruction loss to Gao et al. and...
Summary Sparse Autoencoders (SAEs) have become a popular tool in Mechanistic Interpretability but at the time of writing there is no standardm automated way to accurately assess the quality of a SAE. In practice many different metrics are applied and interpreted to assess SAE quality in different studies. Conceptually however...