The aim of the Hutter Prize is to compress the first 1GB of Wikipedia to the smallest possible size. From the AIXI standpoint, compression is equal to AI, and if we can compress this to the ideal size (75MB according to Shannon's lower estimate), then the compression algorithm is equivalent to AIXI.
However, all the winning solutions so far are based on arithmetic encoding, context mixing. These solutions hold little relevance in mainstream AGI research.
Current day LLMs are really powerful models of our world.And it is highly possible that most LLMs are trained on data that consist of the first 1GB of Wikipedia. Therefore, with appropriate prompting, it should be possible to extract most or all of this data from the LLMs.
I am looking for ideas/papers which would help me validate whether it is possible to extract pieces of wikipedia using prompts with any publicly available LLMs.
update: i have recently learnt regarding gpt-4's compression abilities via prompting, i am very keen in testing this using this method. If anyone is willing to work with me (as i do not have access to gpt-4) it would be great.
But that's a different question from trying to generate a WP article's next token, which is what you defined as failure. You said even with the full context, it couldn't predict the next token. I'm pointing out that you almost certainly can find an adversarial prompt (plus shortened prefix) which programs it to predict the right next token. It's just that you would need a different adversarial prompt to program for the next token after that, and then the next, and so on, in what is presumably some sort of sliding window over all of Wikipedia, while to be interesting in any substantive sense you'd need it to be a single fixed prompt which elicits accurate predictions of every token. It's whether there is one fixed prompt, not the question of is there any prompt which would've correctly predicted the next token for your specific article example. The latter is almost certainly true (demonstrating a weakness in your counterexample), and the former almost certainly false.