New experiment: Recording myself real-time as I do mechanistic interpretability research! I try to answer the question of what happens if you train a toy transformer without positional embeddings on the task of "predict the previous token" - turns out that a two layer model can rederive them! You can watch me do it here, and you can follow along with my code here. This uses a transformer mechanistic interpretability library I'm writing called EasyTransformer, and this was a good excuse to test it out and create a demo!
This is an experiment in recording and publishing myself doing "warts and all" research - figuring out how to train the model and operationalising an experiment (including 15 mins debugging loss spikes...), real-time coding and tensor fuckery, and using my go-to toolkit. My hope is to give a flavour of what actual research can look like - how long do things actually take, how often do things go wrong, what is my thought process and what am I keeping in my head as I go, what being confused looks like, and how I try to make progress. I'd love to hear whether you found this useful, and whether I should bother making a second half!
Though I don't want to overstate this - this was still a small, self-contained toy question that I chose for being a good example task to record (and I wouldn't have published it if it was TOO much of a mess).
Thanks! Yeah, I hadn't seen that but someone pointed it out on Twitter. Feels like fun complimentary work