A few relevant comments for anybody trying some of the workshops...
The applied linear algebra lecture series covers some material directly relevant to the parts people found difficult in Experiment Week. Lectures 2 and 3 are particularly relevant. (Unfortunately I didn't record those lectures in time for this MATS cohort, but their confusions did inform the lecture content.)
As Rohan noticed, a lot of the exercises probably needed more time/attention than I gave for people to figure things out. Also, different workshops will connect for different people, mostly depending on what background skills/knowledge people already have and how much they've already thought about alignment. Unfortunately, if you just read through the list, there are exercises which you will probably not expect to be very relevant to you but which would in fact be high-value if you did them, so it's hard to avoid just trying them all and seeing what works.
Other than additional time/attention and some expected variation in the extent to which different workshops connect for different people, I think the conjecture workshop was the only workshop where I qualitatively messed up the implementation. I'd previously run conjecture workshops with differently-selected people, and it turns out the things they need were very different from the things this cohort needed. In particular, I should have put much more emphasis on the fact that a conjecture usually needs two sets of properties - one set of properties are assumed, and then the other properties are derived from those. Lots of people in this cohort ended up coming up with an operationalization of some intuitive concept, but never got around to conjecturing what properties were implied by that operationaliation; they didn't have an actual claim.
In addition to the workshops, two of the weeks had optional bonus exercises, which I expect are high-expected-value but didn't really fit in the schedule. Experiment week:
Optional Bonus Exercise for this week: go through the code for both your MNIST classifier, and the hessian/behavioral gradient eigenstuff calculation. First, without running the code, say what the shape is of each variable (i.e. scalar, vector of length 40k, 100 by 1000 matrix, etc), then run the code and check the actual shapes match what you expected. Second, again without running the code, do a fermi estimate of the runtime of each part of the code, then run the code and check how close your fermi estimates were. (For the fermi estimates, assuming you're not running on a GPU, a reasonable estimate for your CPU's speed is 1-10 billion operations per second, and you should aim to get your estimate within a factor of 10.)
If you had trouble following what was going on in any of the coding during today's exercise, then the bonus exercise will probably help; shapes and runtime fermi estimates are things which I usually track in my head when writing numerical code. It's also a relatively fun exercise, since the feedback loop is very tight.
... and writing week:
Optional Bonus Exercise for this coming week: look up either Shannon's paper introducing information theory, Turing's paper on morphogenesis, or any of Einstein's four annus mirabilis papers. Read through it, paying attention mainly to writing style/techniques. These were all highly influential papers on complicated technical topics which nonetheless had a lot of reach. How did the author make things understandable? How does the style differ from e.g. a typical paper today? What takeaways could you incorporate into your own writing, to write more like Shannon/Turing/Einstein?
I tried the Shannon/Turing/Einstein writing style exercise in the Distillation for Alignment Practicum and didn't find it very useful. The Einstein paper I read seemed reasonably good at communicating its ideas, but I didn't find many useful techniques besides obvious things like "describe one idea per paragraph" and "define the symbols in your equations."
I bet there are some better papers for learning communication techniques? Maybe from What is the best scientific paper you have read? or Any fun, easy to read scientific papers you’d suggest? or Lists of important publications in science. (The first link has a lot of Shannon/Turing/Einstein fans, so maybe I'm crazy.)
Another idea I'm thinking is that scientific papers are fundamentally worse for communicating ideas than other mediums like textbooks, videos, or more casual writing.
Thanks for putting this together. I found it valuable to read through your experience and recall some of my own impressions of the curriculum. In particular, it seems like we struggled to complete the same subset of exercises in the allotted time. Hopefully, this will be incorporated in future runs of the workshop.
Introduction
I participated in the training program for John Wentworth’s stream of SERI MATS, and overall, it was a very good experience! This post is meant to convey the content of the workshops, so that others can carry out the exercises on their own if they would like, along with my thoughts and takeaways from them. Most of these are in the day-by-day breakdown below, but here is a high-level summary of my experience:
Day-by-Day Breakdown
Participants in the training program met 4 days each week on Gathertown, usually for about 1 hour per workshop (but sometimes up to about 2), for 6 weeks. Below is a day-by-day breakdown of what the workshops involved, with comments about my experiences with them. Naturally, not everyone’s experiences were the same as mine, so take what I say with a grain of salt. Many of the exercises can be carried out individually or in small groups, so some readers of this post may want to try them out! I’m happy to try to provide more details if any of the workshop descriptions are confusing.
(Italicized text describes the content of the workshop (usually John gave us these instructions at the start), regular text describes my thoughts and takeaways.)
Week 1 - Intro Week
Week 1, Day 1 - Intros and Alignment Disagreements
Week 1, Day 2 - Alignment Game Tree
Week 1, Day 3 - Hypercomputer Exercise
Suppose you have a hypercomputer on a USB stick. You send it a program in your favorite programming language, it sends you back the output of that program within microseconds, no matter how many steps the program runs for. (Note that I/O is still normal speed.)
Assuming access to this magical hypercomputer, write a program which… [pick one]
Notes on the exercise:
I thought this was an ok exercise at my current stage. I didn’t really have a good sense of what a hypercomputer could help with, so it wasn’t a great generator of new ideas for me. I felt like I couldn’t come up with any concrete ideas for how to start writing these programs, so I couldn’t even run into bottlenecks that arise from realistic computer constraints. If I had been able to do that, I can imagine the exercise would have been useful for getting past those bottlenecks and finding other interesting barriers. I’m not sure how to get to that stage though.
Week 1, Day 4 - Ball and Cup Exercise
Week 2 - Experiment Week
Week 2, Day 1 - Basin Broadness Discussion and Experiment Prep
Week 2, Day 2 - Main Experiment
Week 2, Day 3 - You Are Not Measuring What You Think You Are Measuring
Week 2, Day 4 - Interpreting Results
Week 3 - Writing Week
Week 3, Day 1 - Prototypical examples
Week 3, Day 2 - Distillation Workshop
Week 3, Day 3 - Reader’s Mental Picture
Week 3, Day 4 - Hook Workshop
Week 4: Theory Week
Week 4, Day 1 - Boundaries Exercises
Before the meeting:
Task 1: In a group of 2-3, come up with a mathematical formulation for what a boundary is (we may have focused on the boundaries of agents in particular).
Task 2: Pick one of the groups’ definitions for everyone to work with now. Make that definition more precise, or find flat out mistakes.
Task 3: Come up with a mathematical notion of power / control.
One potential use of the concept of boundaries is something like, “If boundaries can be formalized, maybe we could make safe AI by having it respect boundaries, and we can set those boundaries to prevent the AI from doing harm to things we care about.” “Boxed AI” is AI that only tries to interact with the world through specific I/O channels, and doesn’t try to acquire the ability to interact via other channels. What is the relationship between Boxed AI and the definition of boundaries you came up with? What does your definition of boundaries tell you about how to develop boxed AI?
What will go wrong if you rely heavily on this definition of boundaries?
Week 4, Day 2 - Framing & Active Ducking Exercises
Week 4, Day 3 - Prototypical Example Degrees of Freedom & Type Signatures
Week 4, Day 4 - Conjecture Workshop
Week 5: Big Picture & Strategy Week
Week 5, Day 1 - Alignment Game Tree II: Problem Tree + slackness/tautness
Week 5, Day 2 - Existing Evidence Exercises
Week 5, Day 3 - X-o-scope
Week 5, Day 4 - Nate’s Giant Text File Technique
Week 6: Wrap-up
Week 6, Day 1 - Hamming Questions
Week 6, Day 2 - Groups and Plans
Week 6, Day 3 - Idea Feedback
Week 6, Day 4 - Ask Me Anything