x

LESSWRONG
LW

Transformer Circuits — LessWrong

Transformer Circuits

This page is a stub.

Add Posts

1

1

Posts tagged Transformer Circuits

3

33Finding Neurons in a Haystack: Case Studies with Sparse Probing

wesg, Neel Nanda

3y

6

2

144An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2

2y

17

2

116Interpreting OpenAI's Whisper

2y

13

2

108200 Concrete Open Problems in Mechanistic Interpretability: Introduction

3y

0

2

70Finding Sparse Linear Connections between Features in LLMs

Logan Riggs, Sam Mitchell, Adam Kaufman

2y

5

2

50How to Think About Activation Patching

3y

5

2

44Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah, Vlad Mikulik

3y

3

2

36Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

2y

1

2

34200 COP in MI: Exploring Polysemanticity and Superposition

3y

6

2

33200 COP in MI: Interpreting Algorithmic Problems

3y

2

2

30A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

3y

15

2

20A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2

3y

0

2

17Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.

Ilia Shirokov, Ilya Nachevsky

10mo

4

2

16200 COP in MI: Looking for Circuits in the Wild

3y

5

2

16200 COP in MI: Analysing Training Dynamics

3y

0

Load More (15/42)

Add Posts