We recently authored a paper titled, Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity. Below, we provide an abstract of the article along with key take-aways from our experiments.
Abstract
In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing... (read 508 more words →)