Empirical Insights into Feature Geometry in Sparse Autoencoders
Key Findings: 1. We demonstrate that subspaces with semantically opposite meanings within the GemmaScope series of Sparse Autoencoders are not pointing towards opposite directions. 2. Furthermore, subspaces that are pointing towards opposite directions are usually not semantically related. 3. As a set of auxiliary experiments, we experiment with the compositional...
Jan 24, 20257