Daniel Tan

PhD student at UCL. Interested in mech interp. 

Posts

Sorted by New

Wiki Contributions

Comments

This work is very exciting to me, and I'm curious to hear the authors' thoughts on whether we could verify specific predictions made by this model in real models. 

  • For example, the proposed U-AND operator - do we expect this to occur in real LLMs, and could we try to find evidence of this by applying mech interp to carefully-chosen toy models? 

I have a more detailed write-up on model organisms of superposition here: https://docs.google.com/document/d/1hwI30HNNB2MkOrtEzo7hppG9X7Cn7Xm9a-1LBqcttWc/edit?usp=sharing

Would love to discuss this more!