x

LESSWRONG
LW

Demian Till — LessWrong

Demian Till

Demian Till

Message

100

1

16

2y

Demian Till

100

2y

Broken Latents: Studying SAEs and Feature Co-occurrence in Toy Models

Thanks to Jean Kaddour, Tomáš Dulka, and Joseph Bloom for providing feedback on earlier drafts of this post. In a previous post on Toy Models of Feature Absorption, we showed that tied SAEs seem to solve feature absorption. However, when we tried to training some tied SAEs on Gemma 2...

Dec 30, 2024•24

Do sparse autoencoders find "true features"?

Thanks to Joseph Bloom and James Oldfield for giving feedback on drafts which helped improve the post In this post I'll discuss an apparent limitation of sparse autoencoders (SAEs) in their current formulation as they are applied to discovering the latent features within AI models such as transformer-based LLMs. In...

Feb 22, 2024•76