Singular Learning Theory for Dummies
In this post, I will cover Jesse Hoogland's work on Singular Learning Theory. The post is mostly meant as a dummies guide, and therefore won't be adding anything meaningfully new to his work. As a helpful guide at each point I try to mark the math difficulty of each section, some of it is trivial (I), some of it can be with followed with enough effort (II) and some of it out of our(my) scope and we blindly believe them to be true (III). Background > Statistical learning theory is lying to you: "overparametrized" models actually aren't overparametrized, and generalization is not just a question of broad basins Singular Learning Theory (SLT) tries to explain how neural networks generalize. One motivation for this is that the previous understanding of neural network generalization isn't quite correct. So what is the previous theory and why is it wrong? Broad Basins Below are excerpts from "Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes"[1] that summarize the logic behind broad basins theory of generalization According to SLT there are two issues here. First, finding volume of a basin in the loss landscape by approximating it via hessian isn't accurate and second, the reason why models generalize has more to do with their symmetries than with with the fact that intialization drops the NNs in a good "broad basin" area. Starting with SLT A lot of basic maths for SLT is already covered in the original blog post. Here I will try to go over small questions that one might have (atleast i had) to get a clearer picture What we know so far? (Math difficulty I) Screenshot from the blog 1. We define the truth, model and prior. The model is just the likelihood. These are standard bayesian terms. 2. Based on the above we know how to formalize the posterior & model evidence, p(w|D_n) & p(D_n) in terms of the likelihood and prior. Usually we stop at the posterior part, we are happy to get the p(w|D_n) for the model, but here we are