x

Derek Larson

Message

9

1

3y

Expanding the Scope of Superposition

Overview One of the active research areas for interpretability involves distilling neural network activations into clean, labeled features. This is made difficult because of superposition, where a neuron may fire in response to multiple, disparate signals making that neuron polysemantic. To date, research has focused on one type of such...

Sep 13, 202310

Derek Larson

Subscribe

Message

9

1

3y

Derek Larson

Expanding the Scope of Superposition

Overview One of the active research areas for interpretability involves distilling neural network activations into clean, labeled features. This is made difficult because of superposition, where a neuron may fire in response to multiple, disparate signals making that neuron polysemantic. To date, research has focused on one type of such...

Sep 13, 202310

Expanding the Scope of Superposition

Derek Larson

2y

Overview

One of the active research areas for interpretability involves distilling neural network activations into clean, labeled features. This is made difficult because of superposition, where a neuron may fire in response to multiple, disparate signals making that neuron polysemantic. To date, research has focused on one type of such superposition: compressive superposition where a network can represent more features than it has neurons. I report on another type of superposition that can arise when a network has more neurons than features: “symmetric mixtures”. Essentially, this is a form of “favored basis” that allows a network to reinforce the magnitude of its logits via parallelism. I believe understanding this concept can help flesh... (read 1029 more words →)

10

LESSWRONG
LW

LESSWRONG
LW

Derek Larson

Derek Larson

Derek Larson

Expanding the Scope of Superposition

Derek Larson

Derek Larson

Derek Larson

Expanding the Scope of Superposition

Overview