This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Wikitags
LW
Login
Transformer Circuits
Settings
Applied to
Scaling Sparse Feature Circuit Finding to Gemma 9B
by
Diego Caples
2mo
ago
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan23Chen
4mo
ago
Applied to
Concrete Methods for Heuristic Estimation on Neural Networks
by
Oliver Daniels
4mo
ago
Applied to
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
by
Connor Kissane
5mo
ago
Applied to
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
by
Taras Kutsyk
6mo
ago
Applied to
SAEs (usually) Transfer Between Base and Chat Models
by
Connor Kissane
8mo
ago
Applied to
Arrakis - A toolkit to conduct, track and visualize mechanistic interpretability experiments.
by
Yash Srivastava
8mo
ago
Applied to
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
by
Neel Nanda
9mo
ago
Applied to
Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability
by
ntt123
9mo
ago
Applied to
"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents
by
IwanWilliams
9mo
ago
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
10mo
ago
Applied to
Can quantised autoencoders find and interpret circuits in language models?
by
charlieoneill
1y
ago
Applied to
Sparse Autoencoders Work on Attention Layer Outputs
by
robertzk
1y
ago
Applied to
Finding Sparse Linear Connections between Features in LLMs
by
Logan Riggs
1y
ago
Applied to
AISC project: TinyEvals
by
Jett Janiak
1y
ago
Applied to
Polysemantic Attention Head in a 4-Layer Transformer
by
Jett Janiak
1y
ago
Applied to
Graphical tensor notation for interpretability
by
Jordan Taylor
1y
ago
Applied to
Interpreting OpenAI's Whisper
by
Neel Nanda
1y
ago
Applied to
Automatically finding feature vectors in the OV circuits of Transformers without using probing
by
Jacob Dunefsky
2y
ago