Osher Lerner
Osher Lerner has not written any posts yet.

Osher Lerner has not written any posts yet.

Yes, that's precisely what I'm claiming!
Sorry if that wasn't clear. As for how to establish that, I proposed an intuitive justification:
There is no mechanism fitting the model to the linear approximation of the data around the training points.
And an outline for a proof:
Take two problems which have the same value at the training points but with wildly different linear terms around them. A model perfectly fit to the training points would not be able to distinguish the two.
Let's walk through an example
Since we have perfectly fit the training data, at the training data point, the loss is zero; and since the loss is minimized, the gradient is also zero.
The linear term in this is not actually 0. There is no mechanism fitting the model to the linear approximation of the data around the training points. The model is only fit to the (0th order) value at the training points.
To correctly state the above sentence:
To prove this point:
Oh hi! I linked your video in another comment without noticing this one. Great visual explanation!