x

LESSWRONG
LW

Wei Shi — LessWrong

Wei Shi

Wei Shi

Message

2

1y

Wei Shi has not written any posts yet.

Wei Shi

1y

Replying toOpen Source Replication of Anthropic’s Crosscoder paper for model-diffing

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

I got it, thank you very much!

1

0

Replying toOpen Source Replication of Anthropic’s Crosscoder paper for model-diffing

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

We trained a crosscoder of width 16,384 on the residual stream activations from the middle layer of the Gemma-2 2B base and IT models.

I don't understand the training process here, as well as the mini-paper from Anthropic. How do you train one crosscoder on the residual stream from two different models?

2

1

0