Misha comments on K-complexity of everyday things - Less Wrong

11 Post author: cousin_it 04 December 2011 02:54PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (16)

You are viewing a single comment's thread.

Comment author: [deleted] 05 December 2011 01:00:38AM 3 points [-]

One interesting compression of Finnegans Wake that goes beyond a .zip file is a plot summary. Obviously it is lossy compression, but how lossy?

One way to think about it is to imagine you are given a plot summary of Finnegans Wake, and asked to reconstruct it. What additional information would you want? A reasonably extensive knowledge of the English language and its grammar, certainly. Most likely a description of Joyce's writing style. Knowledge of human psychology and of the setting.

Obviously, we only need to include as much information as is actually used. If Finnegans Wake never contains the word "indubitably" then we don't need its definition. Also, ideally, all of this is written in some sort of natural representation with no redundancy, rather than in English, but we can think about writing the above in English as an approximation.

Knowing the algorithm, we can then add corrections. Suppose, when we take the above information, and try to reproduce the text, we end up putting a semicolon instead of a period 600 characters in (perhaps semicolons are usually more consistent with Joyce's style, but here he was feeling capricious). We could add a note to the effect of "600 characters: period, not semicolon". A bunch of these notes (which don't really take up much space) together with the information above make up our perfectly compressed string.

Comment author: Technoguyrob 05 December 2011 01:14:34AM *  2 points [-]

I cannot see how you could reconstruct a novel from a plot summary, regardless of additional information provided. Do you mean a text such that if a person read it, then read the actual Finnigan's Wake say a year later, 95% of the time he would not notice the difference? In any case, this scheme clearly has greater K-complexity than a .zip file.