Should we kill recurrent memory over self attention ❓
Spending most of my time on time series problems, I often think about the consequence of memory and the sequential nature we are exposed to in the physical world.
Memory is the idea for a learning algorithm to store a representation of the system's state over time. Think of how much you remember from what you learned a week ago. Memory is partially observable today and naturally, in our conscious experience, large parts of the experience tend to fade.
Im going to discuss two main differentiable programming (deep learning) paradigms: Recurrent Neural Network and Transformers. Then discuss the premise for my question.
Recurrent differential programs (RNN's) enable... (read 272 more words →)