This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Transformer Circuits
•
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
3d
ago
•
Applied to
Can quantised autoencoders find and interpret circuits in language models?
by
charlieoneill
3mo
ago
•
Applied to
Sparse Autoencoders Work on Attention Layer Outputs
by
robertzk
5mo
ago
•
Applied to
Finding Sparse Linear Connections between Features in LLMs
by
Logan Riggs
6mo
ago
•
Applied to
AISC project: TinyEvals
by
Jett
6mo
ago
•
Applied to
Polysemantic Attention Head in a 4-Layer Transformer
by
Jett
7mo
ago
•
Applied to
Graphical tensor notation for interpretability
by
Jordan Taylor
8mo
ago
•
Applied to
Interpreting OpenAI's Whisper
by
Neel Nanda
8mo
ago
•
Applied to
Automatically finding feature vectors in the OV circuits of Transformers without using probing
by
Jacob Dunefsky
9mo
ago
•
Applied to
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
by
Can
9mo
ago
•
Applied to
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
by
Neel Nanda
9mo
ago
•
Applied to
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
by
Alex Makelov
9mo
ago
•
Applied to
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
by
Neel Nanda
10mo
ago
•
Applied to
How to Think About Activation Patching
by
Neel Nanda
1y
ago
•
Applied to
An Analogy for Understanding Transformers
by
CallumMcDougall
1y
ago
•
Applied to
Finding Neurons in a Haystack: Case Studies with Sparse Probing
by
wesg
1y
ago
•
Applied to
Explaining the Transformer Circuits Framework by Example
by
RobertM
1y
ago
•
Applied to
Addendum: More Efficient FFNs via Attention
by
Robert_AIZI
1y
ago