Jakob Hansen
Jakob Hansen has not written any posts yet.

Jakob Hansen has not written any posts yet.

Here is the promised Colab notebook for exploring SAE features with TDA. It works on the top-k GPT2-small SAEs by default, but should be pretty easily adaptable to most SAEs available in sae_lens. The graphs will look a little different from the ones shown in the post because they are constructed directly from the decoder weight vectors rather than from feature correlations across a corpus.
One of the interesting things I found while putting this together is a large group of "previous token" features, which are mostly misinterpreted by the LLM-generated explanations. These have been noted in attention SAEs (e.g. https://www.alignmentforum.org/posts/xmegeW5mqiBsvoaim/we-inspected-every-head-in-gpt-2-small-using-saes-so-you-don), but I haven't seen much discussion of them, although they seem very... (read more)
The CLTs themselves are available here on Hugging Face: https://huggingface.co/collections/bluelightai/qwen3-cross-layer-transcoders. They can be used with the circuit-tracer library.