This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
200 Concrete Open Problems in Mechanistic Interpretability
LW
Login
200 Concrete Open Problems in Mechanistic Interpretability
57
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Ω
Neel Nanda
2y
Ω
7
106
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Ω
Neel Nanda
2y
Ω
0
40
200 COP in MI: The Case for Analysing Toy Language Models
Ω
Neel Nanda
2y
Ω
3
16
200 COP in MI: Looking for Circuits in the Wild
Ω
Neel Nanda
2y
Ω
5
33
200 COP in MI: Interpreting Algorithmic Problems
Ω
Neel Nanda
2y
Ω
2
34
200 COP in MI: Exploring Polysemanticity and Superposition
Ω
Neel Nanda
2y
Ω
6
16
200 COP in MI: Analysing Training Dynamics
Ω
Neel Nanda
2y
Ω
0
13
200 COP in MI: Techniques, Tooling and Automation
Ω
Neel Nanda
2y
Ω
0
18
200 COP in MI: Image Model Interpretability
Ω
Neel Nanda
2y
Ω
3
25
200 COP in MI: Interpreting Reinforcement Learning
Ω
Neel Nanda
2y
Ω
1
24
200 COP in MI: Studying Learned Features in Language Models
Ω
Neel Nanda
2y
Ω
2