AviS

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

If the ability of neural networks to generalise comes from volume/simplicity property and not optimiser properties, then why do different optimisers have different generalisation properties? E.g. Adam being better than SGD for transformers. (Or maybe I'm misremembering, and the reason that Adam outperforms SGD for transformers is mediated by Adam achieving better training loss and not Adam being better than SGD for a given training loss value.)

Reply

Counting arguments provide no evidence for AI doom

AviS1y10

I agree that, overall, counting arguments are weak.

But even if you expect SGD to be used for TAI, generalisation is not a good counterexample, because maybe most counting arguments about SGd do work except for generalisation (which would not be surprising, because we selected SGD precisely because it generalises well).

Reply

Counting arguments provide no evidence for AI doom

AviS1y1-3

A point about counting arguments that I have not seen made elsewhere (although I may have missed it!).

The failure of the counting argument that SGD should result in overfitting is not a valid countexample! There is a selection bias here - the only reason we are talking about SGD is *because* it is a good learning algorithm that does not overfit. It could well still be true that almost all counting arguments are true about almost all learning algorithms. The fact that SGD does generalises well is an exception *by design*.

Reply