Wiki Contributions

Comments

Thanks for the comment! We always use the pre-ReLU feature activation, which is equal to the post-ReLU activation (given that the feature is activate), and is purely linear function of z. Edited the post for clarity. 

Amazing! We found your original library super useful for our Attention SAEs research, so thanks for making this!

Code for this token filtering can be found in the appendix and the exact token list is linked.

Maybe I just missed it, but I'm not seeing this. Is the code still available?