A Mathematical Framework for Transformer Circuits is, in my opinion, the coolest paper I've ever had the privilege of working on. But it's also very long and dense and at times confusing, and this makes me sad! So I've run an experiment, where I recorded myself reading through the paper and narrated a stream of conscious as I go - which bits are particularly cool but under-appreciated, which bits are a bit of a waste of time, which bits do I think do or do not replicate, attempting to explain the parts I think are particularly confusing, etc. You can watch it here. Sadly, it turns out I have a lot of things to say about Transformer Circuits and this turned into a 3 hour monologue, but I hope it's still useful! This is an experimental format for me for good research communication, and I'd love to hear feedback on how well it works for you! This was much easier to make than writing an entire paper, but could easily be a total waste of time if it's not clear enough to be useful!
Disclaimer: The views in this video are entirely my personal takes - the paper was a team effort from everyone at Anthropic, especially Chris Olah, Nelson Elhage and Catherine Olsson, and I am no longer employed by Anthropic. I do not necessarily expect that any of the other authors would agree with any specific thing that I've said, but hope an unfiltered series of takes is useful!
I went through the paper for a reading group the other day, and I think the video really helped me to understand what is going on in the paper. Parts I found most useful were indications which parts of the paper / maths were most important to be able to understand, and which were not (tensor products).
I had made some effort to read the paper before with little success, but now feel like I understand the overall results of the paper pretty well. I’m very positive about this video, and similar things like this being made in the future!
Personal context: I also found the intro to IB video series similarly useful. I’m an AI masters student who has some pre-existing knowledge about AI alignment. I have a maths background.
Understanding Infra-Bayesianism :))