I greatly appreciated the time invested in coding the interactive demos, they help clarifying the insights into the underlying concepts – it reminds me of Colah's posts.
Questions:
Are you going to release tools for the interpretation of other models?
How might one visualize other modalities? like audio or web actions?
Have you considered developing a generalized interpretability framework that could scale these techniques across different architectures and modalities? A unified "interpretability platform" could help broaden access and grow a dedicated community around your work.
I greatly appreciated the time invested in coding the interactive demos, they help clarifying the insights into the underlying concepts – it reminds me of Colah's posts.
Questions: