JumpReLU SAEs + Early Access to Gemma 2 SAEs
New paper from the Google DeepMind mechanistic interpretability team, led by Sen Rajamanoharan! We introduce JumpReLU SAEs, a new SAE architecture that replaces the standard ReLUs with discontinuous JumpReLU activations, and seems to be (narrowly) state of the art over existing methods like TopK and Gated SAEs for achieving high...