Zach Furman

Deep learning as program synthesis

Epistemic status: This post is a synthesis of ideas that are, in my experience, widespread among researchers at frontier labs and in mechanistic interpretability, but rarely written down comprehensively in one place - different communities tend to know different pieces of evidence. The core hypothesis - that deep learning is performing something like tractable program synthesis - is not original to me (even to me, the ideas are ~3 years old), and I suspect it has been arrived at independently many times. (See the appendix on related work). This is also far from finished research - more a snapshot of a hypothesis that seems increasingly hard to avoid, and a case for why formalization is worth pursuing. I discuss the key barriers and how tools like singular learning theory might address them towards the end of the post. Thanks to Dan Murfet, Jesse Hoogland, Max Hennick, and Rumi Salazar for feedback on this post. > Sam Altman: Why does unsupervised learning work? > > Dan Selsam: Compression. So, the ideal intelligence is called Solomonoff induction…[1] The central hypothesis of this post is that deep learning succeeds because it's performing a tractable form of program synthesis - searching for simple, compositional algorithms that explain the data. If correct, this would reframe deep learning's success as an instance of something we understand in principle, while pointing toward what we would need to formalize to make the connection rigorous. I first review the theoretical ideal of Solomonoff induction and the empirical surprise of deep learning's success. Next, mechanistic interpretability provides direct evidence that networks learn algorithm-like structures; I examine the cases of grokking and vision circuits in detail. Broader patterns provide indirect support: how networks evade the curse of dimensionality, generalize despite overparameterization, and converge on similar representations. Finally, I discuss what formalization would require, why it's hard, a

133Jan 20

Zach Furman

Message

zachfurman.com

769

Deep learning as program synthesis

Jan 20133

Zach Furman's Shortform

May 7, 20254

Singular learning theory: exercises

Thanks to Jesse Hoogland and George Wang for feedback on these exercises. In learning singular learning theory (SLT), I found it was often much easier to understand by working through examples, rather than try to work through the (fairly technical) theorems in their full generality. These exercises are an attempt...

Aug 30, 202497

Learning coefficient estimation: the details

What this is for The learning coefficient (LC), or RLCT, is a quantity from singular learning theory that can help to quantify the "complexity" of deep learning models, among other things. This guide is primarily intended to help people interested in improving learning coefficient estimation get up to speed with...

Nov 16, 202337

Neural network polytopes (Colab notebook)

The polytope theory of neural networks (also known as the "spline theory of deep learning") seeks to explain (ReLU) neural networks based on their piecewise linear regions. This gives a helpful intuition for how neural networks approximate functions, as well as a potential avenue for interpretability research. For anyone who...

Apr 21, 202311

Approximation is expensive, but the lunch is cheap

> Produced under the mentorship of Evan Hubinger as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. > > > Thank you to @Mark Chiu and @Quintin Pope for feedback. Machine learning is about finding good models: of the world and the things in it;...

Apr 19, 202377

LESSWRONG
LW

LESSWRONG
LW

Zach Furman

Zach Furman

Zach Furman

Deep learning as program synthesis

Singular learning theory: exercises

Approximation is expensive, but the lunch is cheap

Learning coefficient estimation: the details

Zach Furman

Deep learning as program synthesis

Zach Furman's Shortform

Singular learning theory: exercises

Learning coefficient estimation: the details

Neural network polytopes (Colab notebook)

Approximation is expensive, but the lunch is cheap

Deep learning as program synthesis

Zach Furman's Shortform

Singular learning theory: exercises

Learning coefficient estimation: the details

Neural network polytopes (Colab notebook)

Approximation is expensive, but the lunch is cheap

Deep learning as program synthesis

Singular learning theory: exercises

Approximation is expensive, but the lunch is cheap

Learning coefficient estimation: the details