This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Subscribe
Discussion
(1)
(1)
Transformer Circuits
Subscribe
Discussion
(1)
(1)
This page is a stub.
Posts tagged
Transformer Circuits
Most Relevant
3
33
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Ω
wesg
,
Neel Nanda
2y
Ω
6
2
135
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Ω
Neel Nanda
7mo
Ω
16
2
115
Interpreting OpenAI's Whisper
EllenaR
1y
13
2
106
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Ω
Neel Nanda
2y
Ω
0
2
69
Finding Sparse Linear Connections between Features in LLMs
Ω
Logan Riggs
,
Sam Mitchell
,
Adam Kaufman
1y
Ω
5
2
50
How to Think About Activation Patching
Ω
Neel Nanda
2y
Ω
5
2
44
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Ω
Neel Nanda
,
Tom Lieberum
,
Matthew Rahtz
,
János Kramár
,
Geoffrey Irving
,
Rohin Shah
,
Vlad Mikulik
2y
Ω
3
2
36
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Ω
Neel Nanda
1y
Ω
1
2
34
200 COP in MI: Exploring Polysemanticity and Superposition
Ω
Neel Nanda
2y
Ω
6
2
33
200 COP in MI: Interpreting Algorithmic Problems
Ω
Neel Nanda
2y
Ω
2
2
30
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Ω
Neel Nanda
2y
Ω
15
2
20
A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2
Ω
Neel Nanda
2y
Ω
0
2
16
200 COP in MI: Looking for Circuits in the Wild
Ω
Neel Nanda
2y
Ω
5
2
16
Understanding the tensor product formulation in Transformer Circuits
Tom Lieberum
3y
2
2
16
200 COP in MI: Analysing Training Dynamics
Ω
Neel Nanda
2y
Ω
0