Currently, we do not have a good theoretical understanding of how or why neural networks actually work. For example, we know that large neural networks are sufficiently expressive to compute almost any kind of function. Moreover, most functions that fit a given set of training data will not generalise well to new data. And yet, if we train a neural network we will usually obtain a function that gives good generalisation. What is the mechanism behind this phenomenon?
There has been some recent research which (I believe) sheds some light on this issue. I would like to call attention to this blog post:
Neural Networks Are Fundamentally Bayesian
This post provides a summary of the research in these three papers, which provide a candidate for a theory of generalisation:
https://arxiv.org/abs/2006.15191
https://arxiv.org/abs/1909.11522
https://arxiv.org/abs/1805.08522
(You may notice that I had some involvement with this research, but the main credit should go to Chris Mingard and Guillermo Valle-Perez!)
I believe that research of this type is very relevant for AI alignment. It seems quite plausible that neural networks, or something similar to them, will be used as a component of AGI. If that is the case, then we want to be able to reliably predict and reason about how neural networks behave in new situations, and how they interact with other systems, and it is hard to imagine how that would be possible without a deep understanding of the dynamics at play when neural networks learn from data. Understanding their inductive bias seems particularly important, since this is the key to understanding everything from why they work in the first place, to phenomena such as adversarial examples, to the risk of mesa-optimisation. I hence believe that it makes sense for alignment researchers to keep an eye on what is happening in this space.
If you want some more stuff to read in this genre, I can also recommend these two posts:
Recent Progress in the Theory of Neural Networks
Understanding "Deep Double Descent"
EDIT: Here is a second post, which talks more about the "prior" of neural networks:
Deep Neural Networks are biased, at initialisation, towards simple functions
This is great! This definitely does seem to me like strong evidence that SGD is the wrong place to look for understanding neural networks' inductive biases and that we should be focusing more on the architecture instead. I do wonder the extent to which that insight is likely to scale, though—perhaps the more gradient descent you do, the more it starts to look different from random search and the more you see its quirks.
Scott's “How does Gradient Descent Interact with Goodhart?” seems highly relevant here. Perhaps these results could serve as a partial answer to that question, in the sense that SGD doesn't seem to differ very much from random search (with a Gaussian prior on the weights) for deep neural networks on MNIST. I'm not sure how to reconcile that with the other arguments in that post for why it should be different, though, such as the experiments that Peter and I did.
Yeah, exactly -- the problem is that there are some small-volume functions which are actually simple. The argument for small-volume --> complex doesn't go through since there could be other ways of specifying the function.
Other senses of simplicity include various circuit complexities and Levin complexity. There's no argument that parameter-space volume corresponds to either of them AFAIK(you might think parameter-space volume would correspond to "neural net complexity", the number of neurons in a minimal-size neural net needed to compute the function, ... (read more)