x

LESSWRONG
LW

ananya_joshi — LessWrong

ananya_joshi

ananya_joshi

Message

2

1

1y

ananya_joshi

2

1y

;

Enabling New Applications with Today's Mechanistic Interpretability Toolkit

Working on the more applied side of Mechanistic Interpretability (MI) research, I wanted to share some more evidence on why building on top of MI tools both have valuable applications for model performance on existing tasks and enable new applications right now. Over the past few months, I’ve been working...

Oct 25, 2024•3