RiversHaveWings — LessWrong

Doesn't weight decay/L2 regularization tend to get rid of the "singularities", though? There are no longer directions you can move in that change your model weights and leave the loss the same because you are altering your loss function to prefer weights of lower norm. A classic example of L2 regularization removing the "singularities"/directions you can move leaving your loss the same is the L2 regularized support vector machine w/ hinge loss, which motivated me to check it for neural nets. I tried some numerical experiments and found zero eigenvalues of the Hessian at the minimum (of a one hidden layer tanh net) but when I added L2 regularization to the loss these went away. We use weight decay in practice to train most things and it improves generalization so, if your results are dependent on the zero eigenvalues of the Hessian, wouldn't that falsify them?

I think Eliezer's presentation of the Bayesianism vs frequentism arguments in science came from E. T. Jaynes' posthumous book Probability Theory: The Logic of Science, which was written about arguments that took place over Jaynes' lifetime, well before the Sequences were written.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments