Chris Krapu — LessWrong

Simple, visual, and lends another data point to what many of us suspected on GPT4 in comparison to other frontier labs' models. Even now, it still had something special to it that is yet to be replicated by many others.

Great work and thank you for sharing.

interpreting GPT: the logit lens

Chris Krapu1y10

Ah, got it. Thanks a ton!

interpreting GPT: the logit lens

Chris Krapu1y10

In all of this, there seems to be an implicit assumption that the ordering of the embedding dimensions is consistent across layers, in the sense that "dog" is more strongly associated with dimension 12 in layers 2, 3, 4, etc.

I don't see any reason why this should be the case from either a training or model structure perspective. How, then, does the logit lens (which should clearly not be invariant with regard to a permutation of its inputs) still produce valid results for some intermediate layers?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments