The aim of the Hutter Prize is to compress the first 1GB of Wikipedia to the smallest possible size. From the AIXI standpoint, compression is equal to AI, and if we can compress this to the ideal size (75MB according to Shannon's lower estimate), then the compression algorithm is equivalent to AIXI.
However, all the winning solutions so far are based on arithmetic encoding, context mixing. These solutions hold little relevance in mainstream AGI research.
Current day LLMs are really powerful models of our world.And it is highly possible that most LLMs are trained on data that consist of the first 1GB of Wikipedia. Therefore, with appropriate prompting, it should be possible to extract most or all of this data from the LLMs.
I am looking for ideas/papers which would help me validate whether it is possible to extract pieces of wikipedia using prompts with any publicly available LLMs.
update: i have recently learnt regarding gpt-4's compression abilities via prompting, i am very keen in testing this using this method. If anyone is willing to work with me (as i do not have access to gpt-4) it would be great.
The machine learning world is doing a lot of damage to society by confusing "is" with "ought" which, within AIXI, is equivalent to confusing its two unified components: Algorithmic Information Theory (compression) with Sequential Decision Theory (conditional decompression). This is a primary reason the machine learning world has failed to provide anything remotely approaching the level of funding for The Hutter Prize that would be required to attract talent away from grabbing all of the low hanging fruit in the matrix multiply hardware lottery branches, while failing to water the roots of the AGI tree. So the failure is in the machine learning world -- not the Hutter Prize criteria. There is simply no greater potential risk adjusted return on investment to the machine learning world than is increasing the size of the prize purse for the Hutter Prize. To the extent that clearing up confusion about AGI in politics would benefit society, there is a good argument to be made that the same can be said for the world in general.
This is because 1) The judging criteria are completely objective (and probably should be automated) and 2) The judging criteria is closely tied to the ideal "loss function" for epistemology: the science of human knowledge.
The proper funding level would be at least 1% of the technology development investments in machine learning.