Enabling New Applications with Today's Mechanistic Interpretability Toolkit
Working on the more applied side of Mechanistic Interpretability (MI) research, I wanted to share some more evidence on why building on top of MI tools both have valuable applications for model performance on existing tasks and enable new applications right now. Over the past few months, I’ve been working...
Oct 25, 20243