User Comment Replies

Taking features out of superposition with sparse autoencoders more quickly with informed initialization

Thanks Logan,
1) About re-initialization:
I think your idea of re-initializing dead features of the sparse dictionary with the input data the model struggle reconstructing could work. It seems a great idea!
This probably imply extracting rare features vectors out of such datapoints before using them for initialization.

I intuitively suspect that the datapoints the model is bad at predicting contain rare features and potentially common rare features. Therefore I would bet on performing some rare feature extraction out of batches of poorly reco... (read more)

2Logan Riggs2y

Oh no, my idea was to do the top-sorted worse reconstructed datapoints when re-initializing (or alternatively, worse perplexity when run through the full model). Since we'll likely be re-initializing many dead features at a time, this might pick up on the same feature multiple times. Would you cluster & then sample uniformly from the worst-k-reconstructed clusters? 2) Not being compute bottlenecked - I do assign decent probability that we will eventually be compute bottlenecked; my point here is the current bottleneck I see is the current number of people working on it. This means, for me personally, focusing on flashy, fun applications of sparse autoencoders. [As a relative measure, we're not compute-bottlenecked enough to learn dictionaries in the smaller Pythia-model]

Pierre Peigné's Shortform

Pierre Peigné2y63

Polysemanticity is bad

LESSWRONG
LW

All of Pierre Peigné's Comments + Replies