Summary by OpenAI: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.
Link: https://openai.com/research/language-models-can-explain-neurons-in-language-models
Please share your thoughts in the the comments!
This feel reminiscent of:
And while it's a well-constructed pithy quote, I don't think it's true. Can a system understand itself? Can a quining computer program exist? Where is the line between being able to recite itself and understand itself?
Agreed. A quine needs some minimum complexity and/or language / environment support, but once you have one it's usually easy to expand it. Things could go either way, and the question is an interesting one needing investigation, not bare assertion.
And the answer might depend fairly strongly on whether you take steps to make the model interpretable or a spaghetti-code turing-tar-pit mess.