x
Barriers to Mechanistic Interpretability for AGI Safety — LessWrong