Good stuff. A few thoughts:

1. Assuming a model has memorized the training data, and still have enough "spare capacity" to play lottery ticket hypothesis to find generalizing solutions to a subset of the memorized data, you'll eventually end up with a number of partial solutions that generalize to a subset of the memorized data (obviously assuming some form of regularization towards simplicity). So this may be where the "underparametrized" regime of ML of the past went wrong: That approach tried to force the model into generalization without memorization, but by being stingy with parameters, forced the model to first and foremost memorize -- there was no spare capacity to "play / experiment... (read more)

1

0