This is a linkpost for https://youtu.be/bpp6Dz8N2zY?si=RC20soJLynXxNOfv
This is perhaps the best interpretability work I've seen outside of Chris Olah's team.
This is perhaps the best interpretability work I've seen outside of Chris Olah's team.
Paper link: https://arxiv.org/abs/2407.20311
(I have neither watched the video nor read the paper yet, just in case someone else was looking for the non-video version)
Paper link: https://arxiv.org/abs/2407.20311
(I have neither watched the video nor read the paper yet, just in case someone else was looking for the non-video version)