Different attribution methods can be placed on a scale, with the X-axis being the reflection of grouth truth (at least for the interpretation of the image task, reflecting how humans process information) and the Y-axis being how the model processes information in its way. Attribution methods can highlight most truths, but do not necessarily accurately reflect how the model learns things. The attribution method is a representation of the model, and the model is a representation of the data. Different levels of accuracy imply different levels of uncertainty ...
I have been wondering if neural networks (or more specifically, transformers) will become the ultimate form of AGI. If not, will the existing research on Interpretability, become obsolete?
hey Neel,
Great post!
I am trying to look into the code here
But the links dont work anymore! It would be nice if you could help update them!
I dont know if this link works for the original content: https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob...
So far the best summary I have seen!