Applying SVD to neural nets in general is not a new idea. It’s been used a bunch in the field (Saxe, Olah) but mostly with relation to some input data - either you run SVD on the activations, or some input-output correlation matrix or something.
You generally need to have some data to compare against in order to understand what each vector of your factorization represents exactly. What’s interesting with this technique (imo - and this is mostly Beren’s work so not trying to toot my own horn here) is twofold:
You don’t have to run your model over a whole eval
Applying SVD to neural nets in general is not a new idea. It’s been used a bunch in the field (Saxe, Olah) but mostly with relation to some input data - either you run SVD on the activations, or some input-output correlation matrix or something.
You generally need to have some data to compare against in order to understand what each vector of your factorization represents exactly. What’s interesting with this technique (imo - and this is mostly Beren’s work so not trying to toot my own horn here) is twofold:
- You don’t have to run your model over a whole eval
... (read more)