I think so, but expect others to object. I think many people interested in circuits are using attn and MLP SAEs and experimenting with transcoders and SAE variants for attn heads. Depends how much you care about being able to say what an attn head or MLP is doing or you're happy to just talk about features. Sam Marks at the Bau Lab is the person to ask.
Does it make sense to extract sparse feature graph for a behavior from only residual layers of gpt2 small or do we need all mlp and attention as well?