Generalization, from thermodynamics to statistical physics (Hoogland 2023) — A review of and introduction to generalization theory from classical to contemporary.
FYI this link is broken
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Timaeus was announced in late October 2023, with the mission of making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. This is our first progress update.
In service of the mission, our first priority has been to support and contribute to ongoing work in Singular Learning Theory (SLT) and developmental interpretability, with the aim of laying theoretical and empirical foundations for a science of deep learning and neural network interpretability.
Our main uncertainties in this research were:
The research Timaeus has conducted over the past four months, in collaboration with Daniel Murfet's group at the University of Melbourne and several independent AI safety researchers, has significantly reduced these uncertainties, as we explain below. As a result we are now substantially more confident in the research agenda.
While we view this fundamental work in deep learning science and interpretability as critical, the mission of Timaeus is to make fundamental contributions to alignment and these investments in basic science are to be judged relative to that end goal. We are impatient about making direct contact between these ideas and central problems in alignment. This impatience has resulted in the research directions outlined at the end of this post.
Contributions
Timaeus conducts two main activities: (1) research, that is, developing new tools for interpretability, mechanistic anomaly detection, etc., and (2) outreach, that is, introducing and advocating for these techniques to other researchers and organizations.
Research Contributions
What we learned
Regarding the question of whether SLT is useful in deep learning we have learned that
Regarding whether structure in neural networks forms in phase transitions, we have learned that
Though we do not predict that all structure forms in SLT-defined developmental stages, we now expect that enough important structure forms in discrete stages for developmental interpretability to be a viable perspective in interpretability.
Papers
The aforementioned work has been undertaken by members of the Timaeus core team (Jesse Hoogland), Timaeus RAs (George Wang and Zach Furman), a group at the University of Melbourne, and independent AI safety researchers:
Note that the first two papers above were mostly/entirely completed before Timaeus was launched in October.
Outreach Contributions
Posts
Posts authored by Timaeus core team members Jesse Hoogland and Stan van Wingerden:
We've developed other resources such as a list of learning materials for SLT and a list of open problems in DevInterp & SLT. Our collaborators distilled Chen et al. (2023) in Growth and Form in a Toy Model of Superposition. Also of interest (but not written or supported by Timaeus) is Joar Skalse’s criticism of Singular Learning Theory with extensive discussion in the comments.
Code
We published a devinterp repo and Python package for using the techniques we've introduced. This is supported by documentation and introductory notebooks to help newcomers get started.
Talks
We've given talks at DeepMind, Anthropic, OpenAI, 80,000 Hours, Constellation, MATS, the Topos Institute, Carnegie Mellon University, Monash University, and the University of Melbourne. We have planned talks with FAR, the Tokyo Technical AI Safety Conference, and the Foresight Institute.
Events
Organization
Our team currently consists of six people: three core team members, two research assistants, and one research lead. We continue to collaborate closely with Daniel Murfet's research group at the University of Melbourne.
What we need
What's next
Six months ago, we first put our minds to the question of what SLT could have to say about alignment. This led to developmental interpretability. But this was always a first step, not the end goal. Having now validated the basic premises of this research agenda we are starting to work on additional points of contact with alignment.
Our current research priorities:
We will provide an outline of this new research agenda (as we did with our original post announcing developmental interpretability) in the coming months. Structural generalization and geometry of program synthesis are ambitious multi-year research programs in their own right. However, we will move aggressively to empirically validate the theoretical ideas, maintain tight feedback loops between empirical and theoretical progress, and publish incremental progress on a regular basis. Our capacity to live up to these principles has been demonstrated in the developmental interpretability agenda over the past six months.
We will have a next round of updates and results to report in late May.
To stay up-to-date, join the DevInterp discord. If you're interested in contributing or learning more, don't hesitate to reach out.
George joined this project later and is thus not yet included on the current preprint version.