x

RRGoyal

Message

23

1y

A suite of Vision Sparse Autoencoders

CLIP-Scope? Inspired by Gemma-Scope We trained 8 Sparse Autoencoders each on 1.2 billion tokens on different layers of a Vision Transformer. These (along with more technical details) can be accessed on huggingface. We also released a pypi package for ease of use. We hope that this will allow researchers to...

Oct 27, 202425

RRGoyal

Subscribe

Message

23

1y

RRGoyal

A suite of Vision Sparse Autoencoders

CLIP-Scope? Inspired by Gemma-Scope We trained 8 Sparse Autoencoders each on 1.2 billion tokens on different layers of a Vision Transformer. These (along with more technical details) can be accessed on huggingface. We also released a pypi package for ease of use. We hope that this will allow researchers to...

Oct 27, 202425

A suite of Vision Sparse Autoencoders

Louka Ewington-Pitsos

Louka Ewington-Pitsos, RRGoyal

1y

CLIP-Scope?

Inspired by Gemma-Scope We trained 8 Sparse Autoencoders each on 1.2 billion tokens on different layers of a Vision Transformer. These (along with more technical details) can be accessed on huggingface. We also released a pypi package for ease of use.

We hope that this will allow researchers to experiment with vision transformer interpretability without having to go through the drudgery of training sparse auto encoders from scratch.

the top 9 activating images in LAION for feature 392 of the SA trained on layer 20

Vital Statistics

Model: laion/CLIP-ViT-L-14-laion2B-s32B-b82K
Number of tokens trained per autoencoder: 1.2 Billion
Token type: all 257 image tokens (as opposed to just the CLS token)
Number of unique images trained per autoencoder: 4.5 Million
Training

... (read 221 more words →)

25

LESSWRONG
LW

LESSWRONG
LW

RRGoyal

RRGoyal

RRGoyal

A suite of Vision Sparse Autoencoders

RRGoyal

RRGoyal

RRGoyal

A suite of Vision Sparse Autoencoders

CLIP-Scope?

Vital Statistics