PhD student at UCL. Interested in mech interp.
This work is very exciting to me, and I'm curious to hear the authors' thoughts on whether we could verify specific predictions made by this model in real models.
I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing
Would love to discuss this more!
Hey Jacob + Philippe,
I took the liberty of making a clean installable version of your original codebase. Hope you don't mind, and happy to make any changes that you request! https://github.com/dtch1997/transcoders-slim