One thing that I'm slightly puzzled by is that an obvious improvement to LLMs would be adding some kind of long-term memory that would allow them to retain more information than fits their context window. Naively, I would imagine that even just throwing some recurrent neural net layers in there would be better than nothing?
But while I've seen LLM papers that talk about how they're multimodal or smarter than before, I don't recall seeing any widely-publicized model that would have extended the memory beyond the immediate context window, and that confuses me.
I think there has been a lot of research in the past in this space. The first thing that popped into my mind was https://huggingface.co/docs/transformers/model_doc/rag
Currently, there are some approaches using langchain to persist the history of a conversation into an embeddings database, and retrieve the relevant parts performing a similar query / task.