Great post, thanks!
When creating material, you don't want to be starting from scratch. It's useful to have source code available to browse - bonus points if that takes the form of a Colab or something which is self-contained and has easily visible output.
FYI I put together a list of such Colab starting points early this year which may be useful to some readers.
Oh yeah this is great, thanks! For people reading this, I'll highlight SLT + developmental interp + mamba as areas which I think are large enough to have specific exercise sections but currently don't
TL;DR
In this post, I describe my methodology for building new material for ARENA. I'll mostly be referring to the exercises on IOI, Superposition and Function Vectors as case studies. I expect this to be useful for people who are interested in designing material for ARENA or ARENA-like courses, as well as people who are interested in pedagogy or ML paper replications.
The process has 3 steps:
Summary
I'm mostly basing this on the following 3 sets of exercises:
The steps I go through are listed below. I'm indexing from zero because I'm a software engineer so of course I am. The steps assume you already have an idea of what exercises you want to create; in Appendix (1) you can read some thoughts on what makes for a good exercise set.
1. Start with something concrete
When creating material, you don't want to be starting from scratch. It's useful to have source code available to browse - bonus points if that takes the form of a Colab or something which is self-contained and has easily visible output.
2. First-pass: replicate, and understand
The first thing I'd done in each of these cases was go through the material I started with, and make sure I understood what was going on. Paper replication is a deep enough topic for its own series of blog posts (many already exist), although I'll emphasise that I'm not usually talking about full paper replication here, because ideally you'll be starting from something a it further along, be that a Colab, a different tutorial, or something else. And even when you are just working directly from a paper, you shouldn't make the replication any harder for yourself than you need to. If there's code you can take from somewhere else, then do.
My replication usually takes the form of working through a notebook in VSCode. I'll either start from scratch, or from a downloaded Colab if I'm using one as a reference. This notebook will eventually become the exercises. My replication will include a lot of markdown cells explaining what's going on, between the code cells. I usually frame these as explanations to myself, in other words if I don't understand something then I'll figure it out and write it as an explanation to myself. Mostly it's fine if these are written in shorthand; they'll go through a lot of polishing in subsequent steps. For example, here's a cell from an early version of the function vectors exercises, compared to what it ended up turning into:
When it comes to actually writing code, I usually like everything to be packaged in easy-to-run functions. Ideally each cell of code that I run should only have ~1-4 lines of actual code outside of functions (although this isn't a strict rule). I try to keep my functions excessively annotated - this includes type indications, docstrings, and a large number of annotations along with plenty of space between lines. This will be helpful for the final exercises because students will need to understand what the code does when they look at the answers, but it's also helpful in the exercise-writing process because it helps me take a step back and distill the high-level things that are going on in each chunk of code. This helps me pull out modular chunks of code to turn into exercises, to make sure that students aren't being asked to do too many things at once (more discussion of this in step 2). Here's an example of the kind of documentation I usually have:
While I'm doing this replication, I'm usually thinking about how to construct exercises at the same time. It helps that the position students will be in while going through the exercises isn't totally different to the position I'm in while writing the functions in the first place. I'll save the discussion of exercise-ification for the next section, however do bear in mind that I'm doing a lot of the exercise structuring as I go rather than all at once after the replication is complete.
One last note - your mileage may vary on this, because it's more of me sharing a productivity tip which helped me - with all 3 of these case studies, this first-pass replication was done (at least in an 80/20 sense) over one very intense weekend where I focused on nothing else other than the replication. I find that framing these as exciting hackathon-type events is a pretty effective motivational tool (even though having them be an actual hackathon with multiple people would probably amplify this benefit).
3. Second-pass: exercise-ify
Once I've replicated the core results, I'll go back through the notebook and convert it into exercises. As I alluded to, some of this will already have been done in the process of the previous step, for example in notes to myself or in the docstrings I've given to functions. Here's an example, taken from an early draft of the function vector exercises:
With that said, in this section I'll still write as if it's a fully self-contained step.
When I go through my notebook in this stage, I'm trying to put myself in the mind of a student who is going through these exercises. As I've mentioned, it's helpful that the perspective of a student going through the exercises isn't totally different to the perspective I had while doing the initial replication. So often I'll be able to take a question about the exercises, and answer it by first translating it into a question about my own experience doing the replication. Some examples:
Here are some concrete examples of what this looked like, for each of the 3 exercise sets.
I'll conclude this section with a bit of an unstructured brain dump. There's probably a lot more that I'm forgetting and I might end up editing this post, but hopefully this covers a good amount!
Call to action for ARENA exercises
The development of more ARENA chapters is underway! We'd love for you to contribute to the design of ARENA curriculums and suggest content you'd want to see in ARENA using this Content Suggestion Form. If you want to be actively involved as a contributor, you can reach out via this Collaborator Interest Form or email Chloe Li at chloeli561@gmail.com.
Appendix
A1 - what makes for good exercise sets?
Should be a currently active area of research - at least, if we're talking about things like the the interpretability section rather than e.g. some sections of RL or the first chapter which are purely meant to establish theory and lay groundwork. For example, although the ideas behind causal tracing have been influential, it's not currently a particularly active area of research, which is one of the reasons I chose to not make exercises on it.[1]
A2 - what tools do I use?
<img>
element. I would not be surprised if there was a better method than this.Acknowledgements
Note - although I've written or synthesized a lot of the ARENA material, I don't want to give the impression that I created all of it, since so much of it existed before I started adding to it. I've focused on examples where I wrote most of the core exercises, but I'd also like to thank the following people who have also made invaluable contributions to the ARENA material, either directly or indirectly:
There were other reasons I chose not to, e.g. it didn't seem very satisfying for students to implement because it's high on rigour and careful execution of an exact algorithm, and doesn't really contain that many unique ideas. Also, you'd have to be doing causal tracing on some model, the obvious choice would be the bracket classifier model because Redwood has already done work on it, but their published work on it is very long and contingent on specific features of the bracket classifier model rather than generalizable ideas.