Diego Caples

Posts

Sorted by New

86Scaling Sparse Feature Circuit Finding to Gemma 9B

4mo

11

70Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

8mo

7

Wikitag Contributions

Comments

Sorted by

Newest

Scaling Sparse Feature Circuit Finding to Gemma 9B

Diego Caples4mo40

Thanks! We use mean ablation because this lets us create circuits including only the things which change between examples in a task. So, for example, in the code task, our circuits do not need “is python” latents, as these latents are consistent across all samples. Were we to zero ablate, we would need every single SAE latent necessary for every part of the task. This includes things which were consistent across all tasks! This means many latents which we don’t really care about are included in our circuits.

Reply

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples8mo50

If we were to start training with Adam and later switch to SGD, I would guess that the privileged basis would persist.

There is no mechanism in SGD which opposes solutions with basis aligned features, it’s just that SGD is agnostic to all choices of directions for features in the residual stream. Because there are -many possible directions for features to point, the reason an SGD trained model does not have privileged basis is simply because it is exceedingly unlikely to be randomly initialized into one.

On the other hand, Adam collects statistics with respect to each basis dimension, making basis dimensions different other directions. Somehow, this causes model features to align with basis dimensions.

Reply