Finding Sparse Linear Connections between Features in LLMs
TL;DR: We use SGD to find sparse connections between features; additionally a large fraction of features between the residual stream & MLP can be modeled as linearly computed despite the non-linearity in the MLP. See linear feature section for examples. Special thanks to fellow AISST member, Adam Kaufman, who originally...