LESSWRONG
LW

All of Johnny Lin's Comments + Replies

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Johnny Lin4mo40

apologies for the issue with the neuronpedia link. it's now been resolved.

Transcoders enable fine-grained interpretable circuit analysis for language models

Johnny Lin1y184

Hey Jacob + Philippe,

Hope you all don't mind but we put up layer 8 of your transcoders onto Neuronpedia, with ~22k dashboards here:

https://neuronpedia.org/gpt2-small/8-tres-dc

Each dashboard can be accessed at their own url:

https://neuronpedia.org/gpt2-small/8-tres-dc/0 goes to feature index 0.

You can also test each feature with custom text:

Or search all features at: https://www.neuronpedia.org/gpt2-small/tres-dc

An example search: https://www.neuronpedia.org/gpt2-small/?sourceSet=tres-dc&selectedLayers=[]&sortIndexes=[]&q=the%20cat%20sat%20on%20... (read more)

3Jacob Dunefsky1y

Just started playing around with this -- it's super cool! Thank you for making this available (and so fast!) -- I've got a lot of respect for you and Joseph and the Neuronpedia project.

4Neel Nanda1y

That's awesome, and insanely fast! Thanks so much, I really appreciate it

SAE-VIS: Announcement Post

Johnny Lin1y40

Thanks Callum and yep we've been extensively using SAE-Vis at Neuronpedia - it's been extremely helpful for generating dashboards and it's very well maintained. We'll have a method of directly importing to Neuronpedia using the exports from SAE-Vis coming out soon.

2CallumMcDougall1y

Thanks!! Really appreciate it

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Johnny Lin1yΩ9182

Hey Joseph (and coauthors),

Your directions are really fantastic. I hope you don't mind, but I generated the activation data for the first 3000+ directions for each of the 12 layers and uploaded your directions to Neuronpedia:

https://www.neuronpedia.org/gpt2-small/res-jb

Your directions are also linked on the home page and the model page.

They're also accessible by layer (sorted by top activation), eg layer 6: https://neuronpedia.org/gpt2-small/6-res-jb

I added the "Anthropic dashboard" to Neuronpedia for your dataset.

Explanations, comments, and autointe... (read more)

1Joseph Bloom1y

Agreed, thanks so much! Super excited about what can be done here!

Neel Nanda1y102

Thanks for doing this, I'm excited about Neuronpedia focusing on SAE features! I expect this to go much better than neuron interpretability

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around

Johnny Lin1y41

Apparently an anonymous user(s) got really excited and ran a bunch of simultaneous searches while I was sleeping, triggering this open tokenizer bug/issue and causing our TransformerLens server to hang/crash. This caused some downtime.

A workaround has been implemented and pushed.

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around

Johnny Lin1y20

Thanks for the tip! I've added the link under "Exploration Tools" after the first mention of Neuronpedia. Let me know if that is the proper way to do it - I couldn't find a feature on LW for a special "context link" if there is such a feature.

2mishka1y

I think this is a good place for this link, thanks!

New Tool: the Residual Stream Viewer

Johnny Lin2y40

Great work Adam, especially on the customizability. It's fascinating clicking through various types and indexes to look for patterns, and I'm looking forward to using this to find interesting directions.