The recent publication of Gato spurred a lot of discussion on wheter we may be witnessingth the first example of AGI. Regardless of this debate, Gato's makes use of recent developments in reinforcement learning, that is using supervised learning on reinforcement learning trajectories by exploiting the ability of transformer architectures to proficiently handle sequential data.
Reading the comments it seems that this point created some confusion to readers not familiar with these techniques. Some time ago I wrote an introductory article to how transformers can be used in reinforcement learning which may be helpful to clarify some of these doubts: https://lorenzopieri.com/rl_transformers/
In research there are a lot of publications, but few stand the test of time. I would suggest to you to look at the architectures which brought significant changes and ideas, those are still very relevant as they:
- often form the building block of current solutions
- they help you build intuition on how architectures can be improved
- it is often assumed in the field that you know about them
- they are often still useful, especially when having low resources
You should not need to look at more than 1-2 architectures per year in each field (computer vision, NLP, RL). Only then I would focus on SOTA.
You may want to check https://fullstackdeeplearning.com/spring2021/ it should have enough historic material to get you covered and expand from there, while also going quickly to modern topics.