Grokking revisited: reverse engineering grokking modulo addition in LSTM
By Daniil Yurshevich, Nikita Khomich TLDR: we train LSTM model on algorithmic task of modulo addition and observe grokking. We fully reverse engeneer the algorithm learned and propose a way simpler equivalent version of the model that groks as well. Reproducibility statement: all the code is available at the repo....