Rico Angell

Message

Evaluating Sparse Autoencoders with Board Game Models

This blog post discusses a collaborative research paper on sparse autoencoders (SAEs), specifically focusing on SAE evaluations and a new training method we call p-annealing. As the first author, I primarily contributed to the evaluation portion of our work. The views expressed here are my own and do not necessarily...

Aug 2, 2024•38

Message

23 karma

Member for 2 years

Rico Angell — LessWrong

Rico Angell

Message

Rico Angell

Evaluating Sparse Autoencoders with Board Game Models

Aug 2, 2024•38

Message

23 karma

Member for 2 years

Evaluating Sparse Autoencoders with Board Game Models

Adam Karvonen

Adam Karvonen, Sam Marks, Can, Benjamin Wright, Jannik Brinkmann, Logan Riggs, Rico Angell+ 0 more

Adam Karvonen, Sam Marks, Can, Benjamin Wright, Jannik Brinkmann, Logan Riggs, Rico Angell

Key Results

In our research on evaluating Sparse Autoencoders (SAEs) using board games, we had several key findings:

We developed two new metrics for evaluating Sparse Autoencoders (SAEs) in the context of board games: board reconstruction and coverage.
These metrics can measure progress between SAE training approaches that is invisible on existing metrics.
These metrics

... (read 2686 more words →)

LESSWRONG
LW

LESSWRONG
LW

Rico Angell

Rico Angell

Rico Angell

Evaluating Sparse Autoencoders with Board Game Models

Rico Angell

Rico Angell

Rico Angell

Evaluating Sparse Autoencoders with Board Game Models