x

LESSWRONG
LW

Fengyuan Hu — LessWrong

Fengyuan Hu

Fengyuan Hu

Message

22

1

8

2y

Fengyuan Hu

22

2y

Normalizing Sparse Autoencoders

TL;DR Sparse autoencoders (SAEs) presents us a promising direction towards automating mechanistic interpretability, but it not without flaws. One known issue of the original sparse autoencoders is the feature suppression effect which is caused by the conflict between the L2 and L1 loss and the unit norm constraint on the...

Apr 8, 2024•22