Summary by OpenAI: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.
Link: https://openai.com/research/language-models-can-explain-neurons-in-language-models
Please share your thoughts in the the comments!
This seems to assume the task (writing explanations for all neurons with >0.5 scores) is possible at all, which is doubtful. Superposition and polysemanticity are certainly things that actually happen.