OpenAI has hired a lot of software engineers to code simple tasks, maybe these quines were a part of the fine tuning set?
Isn't ELO a reference metric that changes with time? I would assume that 2800 ELO in the 90s is a different level to today's 2800. Can we still make the same conclusions with that in mind?
I think there has been a lot of research in the past in this space. The first thing that popped into my mind was https://huggingface.co/docs/transformers/model_doc/rag
Currently, there are some approaches using langchain to persist the history of a conversation into an embeddings database, and retrieve the relevant parts performing a similar query / task.