What can we say about the K-complexity of a non-random string from our universe, e.g. the text of Finnegans Wake? It contains lots of patterns making it easy to compress using a regular archiver, but can we do much better than that?
On one hand, the laws of physics in our universe seem to be simple, and generating the text is just a matter of generating our universe then pointing to the text. On the other hand, our evolution involved a lot of quantum randomness, so pointing to humans within the universe could require a whole lot of additional bits above and beyond the laws of physics. So does anyone have good arguments whether the K-complexity of Finnegans Wake is closer to 10% or 0.1% of its length?
If anyone is curious about regular archivers, Joyce became less compressible throughout his life. The compression ratios (bytes per 100 characters) for Dubliners, Portrait, Ulysses, and Wake are: by gzip -9: 38, 38, 42, 47; for paq8l -7: 24, 24, 26, 33. LZMA and PPMd interpolate these numbers in unsurprising ways. Dubliners and Portrait seem about as compressible as other fiction in English.
Of course, I performed these calculations using a server in Australia, where Finnegans Wake is in the public domain.
This comment was prompted by Finnegans Wake seeming like an odd choice of a novel. War and Peace is a more prototypical novel, so you probably didn't mean anything by the choice.
Can anyone suggest other hard to compress novels?
Thanks for the pointer to paq8l. And it won the Hutter Prize too! That's funny because my post can be viewed as a comment on the relevance of the Hutter Prize.
Finnegans Wake was just my first idea for "novel that's hard to compress".