We would have loved to see more motivation for why you are making the assumptions you are making when generating the toy data.
Relatedly, it would be great to see an analysis of the distribution of the MLP activations. This could give you some info where your assumptions in the toy model fall short.
This is valid; they're not well fleshed out above. I'll take a stab at it here below, and I discussed it a bit with Ryan below his comment. Meta-q: Are you primarily asking for better assumptions or that they be made more explicit?
RE MLP activations distrib...
Thanks!
This equation describes (almost) linear regression on a particular feature space :
This approximation isn't obvious to me. It holds if $ f(x, \theta_0) \approx 0 $ and $ \theta_0 \approx 0 $, but these aren't stated. Are they true?
No, they exist in different spaces: Polytopes in our work are in activation space whereas in their work the polytopes are in the model weights (if I understand their work correctly).
Thanks for your interest in our post and your questions!
Correct me if I'm wrong, but it struck while reading this that you can think of a neural network as learning two things at once…
That seems right!
Can the functions and classes be decoupled? … Could you come up with some other scheme for choosing between a whole bunch of different linear transformations?
It seems possible to come up with other schemes that do this; it just doesn’t seem easy to come up with something that is competitive with neural nets. If I recall correctly, there’s work in prev...
Thanks for your comment!
However, I don't really see how you'd easily extend the polytope formulation to activation functions that aren't piecewise linear, like tanh or logits, while the functional analysis perspective can handle that pretty easily. Your functions just become smoother.
Extending the polytope lens to activation functions such as sigmoids, softmax, or GELU is the subject of a paper by Baleistriero & Baraniuk (2018) https://arxiv.org/abs/1810.09274
In the case of GELU and some similar activation functions, you'd need to replace the bina...
For GPT2-small, we selected 6/1024 tokens in each sequence (evenly spaced apart and not including the first 100 tokens), and clustered on the entire MLP hidden dimension (4 * 768).
For InceptionV1, we clustered the vectors corresponding to all the channel dimensions for a single fixed spatial dimension (i.e. one example of size [n_channels] per image).
Thanks for your comment!
RE non-ReLU activation functions:
Extending the polytope lens to Swish or GELU activation functions is, fortunately, the subject of a paper by Baleistriero & Baraniuk (2018) https://arxiv.org/abs/1810.09274
We wrote a few sentence about this at the end of Appendix C:
"In summary - smooth activation functions must be represented with a probabilistic spline code rather than a one-hot binary code. The corresponding affine transformation at the input point is then a linear interpolation of the entire set of affine transformations, weig...
This is one of the major research questions that will be important to answer before polytopes can be really useful in mechanistic descriptions.
By choosing to use clustering rather than dimensionality reduction methods, we took a non-decompositional approach here. Clustering was motivated primarily by wanting to capture the monosemanticity of local regions in neural networks. But the ‘monosemanticity’ that I’m talking about here refers to the fact that small regions of activation mean one thing on one level of abstraction; this ‘one thing’ could be a combin...
Thanks for your interest!
Shouldn't this create strong regularisation favouring using meaningful directions over meaningful polytopes?
Yes, that seems reasonable!
One thing we want to emphasize is that it's perfectly possible to have both meaningful directions and meaningful polytopes. For instance, if all polytope boudaries intersect the origin, then all polytopes will be unbounded. In that case, polytopes will essentially be directions!
The polytope lens only becomes relevant when trying to explain what perfectly linear models can't account for. Although...
Very interesting to hear that you've been working on similar things! Excited to see results when they're ready.
RE synthetic data: I'm a bit less confident in this method of data generation after the feedback below (see Tom Lieberum's and Ryan Greenblatt's comments). It may lose some 'naturalness' compared with the way the encoder in the 'toy models of superposition' puts one-hot features in superposition. It's unclear whether that matters for the aims of this particular set of experiments, though.
RE metrics: It's interesting to hear about your altern... (read more)