This blog post discusses a collaborative research paper on sparse autoencoders (SAEs), specifically focusing on SAE evaluations and a new training method we call p-annealing. As the first author, I primarily contributed to the evaluation portion of our work. The views expressed here are my own and do not necessarily reflect the perspectives of my co-authors. You can access our full paper here.
Key Results
In our research on evaluating Sparse Autoencoders (SAEs) using board games, we had several key findings:
- We developed two new metrics for evaluating Sparse Autoencoders (SAEs) in the context of board games: board reconstruction and coverage.
- These metrics can measure progress between SAE training approaches that is invisible on existing metrics.
- These metrics
... (read 2686 more words →)