Currently, we do not have a good theoretical understanding of how or why neural networks actually work. For example, we know that large neural networks are sufficiently expressive to compute almost any kind of function. Moreover, most functions that fit a given set of training data will not generalise well to new data. And yet, if we train a neural network we will usually obtain a function that gives good generalisation. What is the mechanism behind this phenomenon?
There has been some recent research which (I believe) sheds some light on this issue. I would like to call attention to this blog post:
Neural Networks Are Fundamentally Bayesian
This post provides a summary of the research in these three papers, which provide a candidate for a theory of generalisation:
https://arxiv.org/abs/2006.15191
https://arxiv.org/abs/1909.11522
https://arxiv.org/abs/1805.08522
(You may notice that I had some involvement with this research, but the main credit should go to Chris Mingard and Guillermo Valle-Perez!)
I believe that research of this type is very relevant for AI alignment. It seems quite plausible that neural networks, or something similar to them, will be used as a component of AGI. If that is the case, then we want to be able to reliably predict and reason about how neural networks behave in new situations, and how they interact with other systems, and it is hard to imagine how that would be possible without a deep understanding of the dynamics at play when neural networks learn from data. Understanding their inductive bias seems particularly important, since this is the key to understanding everything from why they work in the first place, to phenomena such as adversarial examples, to the risk of mesa-optimisation. I hence believe that it makes sense for alignment researchers to keep an eye on what is happening in this space.
If you want some more stuff to read in this genre, I can also recommend these two posts:
Recent Progress in the Theory of Neural Networks
Understanding "Deep Double Descent"
EDIT: Here is a second post, which talks more about the "prior" of neural networks:
Deep Neural Networks are biased, at initialisation, towards simple functions
Like Rohin, I'm not impressed with the information theoretic side of this work.
Specifically, I'm wary of the focus on measuring complexity for functions between finite sets, such as binary functions.
Mostly, we care about NN generalization on problems where the input space is continuous, generally R^n. The authors argue that the finite-set results are relevant to these problems, because one can always discretize R^n to get a finite set. I don't think this captures the kinds of function complexity we care about for NNs.
Consider:
This is much too coarse a lens for distinguishing NNs from other statistical learning techniques, since all of them are generally going to involve putting a metric on the input space.
Let's see how this goes wrong in the Shannon entropy argument from this paper.
Sort of similar remarks apply to the other complexity measure used by authors, LZ complexity. Unlike the complexity measure discussed above, this one does implicitly put a structure on the input space (by fixing an enumeration of it, where the inputs are taken to be bit vectors, and the enumeration reads them off in binary).
"Simple" functions in the LZ sense are thus ones that respond to binary vectors in (roughly) a predictable way,. What does it mean for a function to respond to binary vectors in a predictable way? It means that knowing the values of some of the bits provides information about the output, even if you don't know all of them. But since our models are encoding the inputs as binary vectors, we are already setting them up to have properties like this.
I'll write mostly about this statement, as I think it's the crux of our disagreement.
The statement may be true as long as we hold the meaning of "objects" constant as we vary the complexity measure.
However, if we translate objects from one mathematical space to another (say by discretizing, or adding/removing a metric structure), we can't simply say the complexity measures for space A on the original A-objects inevitably agree with those space B on the t... (read more)