you can solve MNIST with a LayerNorm MLP, for example
is there a paper for this or is this unpublished common knowledge result?
is there a paper for this or is this unpublished common knowledge result?