This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Wikitags
LW
Login
Subscribe
Discussion
1
1
Transformer Circuits
Subscribe
Discussion
1
1
This page is a stub.
Posts tagged
Transformer Circuits
Most Relevant
3
33
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Ω
wesg
,
Neel Nanda
2y
Ω
6
2
135
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Ω
Neel Nanda
8mo
Ω
16
2
116
Interpreting OpenAI's Whisper
EllenaR
1y
13
2
106
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Ω
Neel Nanda
2y
Ω
0
2
70
Finding Sparse Linear Connections between Features in LLMs
Ω
Logan Riggs
,
Sam Mitchell
,
Adam Kaufman
1y
Ω
5
2
50
How to Think About Activation Patching
Ω
Neel Nanda
2y
Ω
5
2
44
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Ω
Neel Nanda
,
Tom Lieberum
,
Matthew Rahtz
,
János Kramár
,
Geoffrey Irving
,
Rohin Shah
,
Vlad Mikulik
2y
Ω
3
2
36
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Ω
Neel Nanda
2y
Ω
1
2
34
200 COP in MI: Exploring Polysemanticity and Superposition
Ω
Neel Nanda
2y
Ω
6
2
33
200 COP in MI: Interpreting Algorithmic Problems
Ω
Neel Nanda
2y
Ω
2
2
30
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Ω
Neel Nanda
2y
Ω
15
2
20
A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2
Ω
Neel Nanda
2y
Ω
0
2
16
200 COP in MI: Looking for Circuits in the Wild
Ω
Neel Nanda
2y
Ω
5
2
16
Understanding the tensor product formulation in Transformer Circuits
Tom Lieberum
3y
2
2
16
200 COP in MI: Analysing Training Dynamics
Ω
Neel Nanda
2y
Ω
0