The aim of the Hutter Prize is to compress the first 1GB of Wikipedia to the smallest possible size. From the AIXI standpoint, compression is equal to AI, and if we can compress this to the ideal size (75MB according to Shannon's lower estimate), then the compression algorithm is equivalent to AIXI.
However, all the winning solutions so far are based on arithmetic encoding, context mixing. These solutions hold little relevance in mainstream AGI research.
Current day LLMs are really powerful models of our world.And it is highly possible that most LLMs are trained on data that consist of the first 1GB of Wikipedia. Therefore, with appropriate prompting, it should be possible to extract most or all of this data from the LLMs.
I am looking for ideas/papers which would help me validate whether it is possible to extract pieces of wikipedia using prompts with any publicly available LLMs.
update: i have recently learnt regarding gpt-4's compression abilities via prompting, i am very keen in testing this using this method. If anyone is willing to work with me (as i do not have access to gpt-4) it would be great.
Strictly speaking, it would be surprising, because it would mean that there are no adversarial examples or possible prompt-tuning which can produce specified arbitrary text, even text that is of a very high starting likelihood like a WP entry. And this despite being able to work with ~50k^2048 possible inputs.
(It would be like saying that a face-generating GAN can't produce a specific face that it was trained on, no matter how hard you optimized the z or how much vastly larger the dimensionality of the z is compared to the true face manifold. If you showed me a StyleGAN face-generator which had been trained on Obama, and that it completed half of Obama's face with a different face, I would not be surprised; I would be very surprised if you told me you had somehow proven that not only did it not infill Obama on that specific example (fine, unsurprising), there was no infill or possible z whatsoever that it could generate Obama's face (extremely surprising).)