AGI-Automated Interpretability is Suicide
Backstory: I wrote this post about 12d ago, then a user here pointed out that this could be capability exfohazard since it could give the bad idea of having the AGI look at itself, so I took it down. Well I don’t have to worry about that anymore since now...
May 10, 202325