Sing along! https://suno.com/song/35d62e76-eac7-4733-864d-d62104f4bfd0
You might enjoy this classic: https://www.lesswrong.com/posts/9HSwh2mE3tX6xvZ2W/the-pyramid-and-the-garden
Rather than doubling down on a single single-layered decomposition for all activations, why not go with a multi-layered decomposition (ie: some combination of SAE and metaSAE, preferably as unsupervised as possible). Or alternatively, maybe the decomposition that is most useful in each case changes and what we really need is lots of different (somewhat) interpretable decompositions and an ability to quickly work out which is useful in context.
Definitely seems like multiple ways to interpret this work, as also described in SAE feature geometry is outside the superposition hypothesis. Either we need to find other methods and theory that somehow finds more atomic features, or we need to get a more complete picture of what the SAEs are learning at different levels of abstraction and composition.
Both seem important and interesting lines of work to me!
Great work! Using spelling is very clear example of how information gets absorbed in the SAE latent, and indeed in Meta-SAEs we found many spelling/sound related meta-latents.
I have been thinking a bit on how to solve this problem and one experiment that I would like to try is to train an SAE and a meta-SAE concurrently, but in an adversarial manner (kind of like a GAN), such that the SAE is incentivized to learn latent directions that are not easily decomposable by the meta-SAE.
Potentially, this would remove the "Starts-with-L"-component from the "lion"-token direction and activate the "Starts-with-L" latent instead. Although this would come at the cost of worse sparsity/reconstruction.
Having been at two LH parties, one with music and one without, I definitely ended up in the "large conversation with 2 people talking and 5 people listening"-situation much more in the party without music.
That said, I did find it much easier to meet new people at the party without music, as this also makes it much easier to join conversations that sound interesting when you walk past (being able to actually overhear them).
This might be one of the reasons why people tend to progressively increase the volume of the music during parties. First give people a chance to meet interesting people and easily join conversations. Then increase the volume to facilitate smaller conversations.
I just finished reading "Zen and the Art of Motorcycle Maintenance" yesterday, which you might enjoy reading as it explores the topic of Quality (what you call excellence). From the book:
“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.”
Great work! I have been working on something very similar and will publish my results here some time next week, but can already give a sneak-peak:
I can confirm that on Gemma-2-2B Matryoshka SAEs dramatically improve the absorption score on the first-letter task from Chanin et al. as implemented in SAEBench!
Yes! My experiments with Matryoshka SAEs are using BatchTopK.
Are you planning to continue this line of research? If so, I would be interested to collaborate (or otherwise at least coordinate on not doing duplicate work).