User Comment Replies

SAE feature geometry is outside the superposition hypothesis

This reminded me of how GPT-2-small uses a cosine/sine spiral for its learned positional embeddings embeddings, and I don't think I've seen a mechanistic/dynamical explanation for this (just the post-hoc explanation that attention can use cosine similarity to encode distance in R^n, not that it should happen this way).

jake_mendel10mo142

Yeah this does seem like its another good example of what I'm trying to gesture at. More generally, I think the embedding at layer 0 is a good place for thinking about the kind of structure that the superposition hypothesis is blind to. If the vocab size is smaller than the SAE dictionary size, an SAE is likely to get perfect reconstruction and $L_{0} = 1$ by just learning the vocab_size many embeddings. But those embeddings aren't random! They have been carefully learned and contain lots of useful information. I think trying to explain the structure in... (read more)

Formalization as suspension of intuition

Eric Winsor2y98

I like this perspective! The idea of formalization as suspension of intuition reminds me of the story of the "Gruppenpest" in the development of quantum mechanics. The abstraction of groups (as well as representations and matrices) was seen by many as non-physical and unintuitive. But it turned out the resulting abstractions of gauge theories and symmetries were more fundamental objects than their predecessors.^[1]^[2]^[3]^[4]

It also reminds me of a view I've been told many times that mathematical formalization/modeling is the process of forgetting details abo... (read more)

Re-Examining LayerNorm

Eric Winsor2y40

Thanks for the catch!

LESSWRONG
LW

All of Eric Winsor's Comments + Replies