Daniel Tan

PhD student at UCL. Interested in mech interp. 

Posts

Sorted by New

Wiki Contributions

Comments

Hey Jacob + Philippe, 

I took the liberty of making a clean installable version of your original codebase. Hope you don't mind, and happy to make any changes that you request! https://github.com/dtch1997/transcoders-slim

This work is very exciting to me, and I'm curious to hear the authors' thoughts on whether we could verify specific predictions made by this model in real models. 

  • For example, the proposed U-AND operator - do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models? 

I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing

Would love to discuss this more!