Are SAE features from the Base Model still meaningful to LLaVA?
Shan Chen, Jack Gallifant, Kuleen Sasse, Danielle Bitterman[1] Please read this as a work in progress where we are colleagues sharing this in a lab (https://www.bittermanlab.org) meeting to help/motivate potential parallel research. TL;DR: * Recent work has evaluated the generalizability of Sparse Autoencoder (SAE) features; this study examines their effectiveness...