Summary by OpenAI: We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations. We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.
Link: https://openai.com/research/language-models-can-explain-neurons-in-language-models
Please share your thoughts in the the comments!
From my point of view, I could say the opposite is rather a "non-trivial claim in need of support". My (not particularly motivated) intuition is that a larger, smarter mind employs more sophisticate cognitive algorithms, and so analyzing its workings requires proportionally more intelligence.
Example: I have the experience that, if I argue with someone less intelligent and used to debate than me, it is likely that they'll perceive what I say in pieces instead of looking at the whole reasoning tree, and it is very difficult to have them understand the "big picture". For example, if I say "A then B", they might understand "A and B", or "A or B", or "A", or "B". In the domain of argument, they're not able to understand how I put all the pieces together, by looking at the pieces in isolation, and it is difficult to them to even contemplate the rules I use.
What are the intuitions that you use to feel the default case is the other way around?
Ok, I think I lacked clarity. I did not mean that doing this particular research bit was not safe. I meant that the kind of paradigm that I see here, as I extrapolate it, is not safe.