I like this perspective! The idea of formalization as suspension of intuition reminds me of the story of the "Gruppenpest" in the development of quantum mechanics. The abstraction of groups (as well as representations and matrices) was seen by many as non-physical and unintuitive. But it turned out the resulting abstractions of gauge theories and symmetries were more fundamental objects than their predecessors.[1][2][3][4]
It also reminds me of a view I've been told many times that mathematical formalization/modeling is the process of forgetting details abo...
Thanks for the catch!
This reminded me of how GPT-2-small uses a cosine/sine spiral for its learned positional embeddings embeddings, and I don't think I've seen a mechanistic/dynamical explanation for this (just the post-hoc explanation that attention can use cosine similarity to encode distance in R^n, not that it should happen this way).
Yeah this does seem like its another good example of what I'm trying to gesture at. More generally, I think the embedding at layer 0 is a good place for thinking about the kind of structure that the superposition hypothesis is blind to. If the vocab size is smaller than the SAE dictionary size, an SAE is likely to get perfect reconstruction and L0=1 by just learning the vocab_size many embeddings. But those embeddings aren't random! They have been carefully learned and contain lots of useful information. I think trying to explain the structure in... (read more)