You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

linkhyrule5 comments on Open thread, August 19-25, 2013 - Less Wrong Discussion

2 Post author: David_Gerard 19 August 2013 06:58AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (325)

You are viewing a single comment's thread.

Comment author: linkhyrule5 21 August 2013 09:15:53PM 2 points [-]

Has anyone done a study on redundant information in languages?

I'm just mildly curious, because a back-of-the-envelope calculation suggests that English is about 4.7x redundant - which on a side note explains how we can esiayl regnovze eevn hrriofclly msispled wrods.

(Actually, that would be an interesting experiment - remove or replace fraction x of the letters in a paragraph and see at what average x participants can no longer make a "corrected" copy.)

I'd predict that Chinese is much less redundant in its spoken form, and that I have no idea how to measure redundancy in its written form. (By stroke? By radical?)

Comment author: gwern 21 August 2013 10:05:32PM 4 points [-]

Yes, it's been studied quite a bit by linguists. You can find some pointers in http://www.gwern.net/Notes#efficient-natural-language which may be helpful.

Comment author: linkhyrule5 21 August 2013 10:51:54PM 1 point [-]

Thanks.

... huh. Now I'm thinking about actually doing that experiment...

Comment author: gwern 22 August 2013 09:47:32PM 3 points [-]

I ran into another thing in that vein:

To measure the artistic merit of texts, Kolmogorov also employed a letter-guessing method to evaluate the entropy of natural language. In information theory, entropy is a measure of uncertainty or unpredictability, corresponding to the information content of a message: the more unpredictable the message, the more information it carries. Kolmogorov turned entropy into a measure of artistic originality. His group conducted a series of experiments, showing volunteers a fragment of Russian prose or poetry and asking them to guess the next letter, then the next, and so on. Kolmogorov privately remarked that, from the viewpoint of information theory, Soviet newspapers were less informative than poetry, since political discourse employed a large number of stock phrases and was highly predictable in its content. The verses of great poets, on the other hand, were much more difficult to predict, despite the strict limitations imposed on them by the poetic form. According to Kolmogorov, this was a mark of their originality. True art was unlikely, a quality probability theory could help to measure.

--The Man Who Invented Modern Probability - Issue 4: The Unlikely - Nautilus

Comment author: JQuinton 23 August 2013 08:41:16PM 0 points [-]

The verses of great poets, on the other hand, were much more difficult to predict, despite the strict limitations imposed on them by the poetic form. According to Kolmogorov, this was a mark of their originality. True art was unlikely, a quality probability theory could help to measure.

This also happens to me with music. I enjoy "unpredictable" music more than predictable music. Knowing music theory I know which notes are supposed to be played -- if a song is in a certain key -- and if a note or chord isn't predicted then it feels a bit more enjoyable. I wonder if the same technique could be applied to different genres of music with the same result, i.e. radio-friendly pop music vs non-mainstream music.

Comment author: linkhyrule5 22 August 2013 11:21:46PM 0 points [-]

I wonder what that metric has to say about Finnigan's Wake...

Comment author: Douglas_Knight 23 August 2013 07:47:53AM 0 points [-]

By other metrics, Joyce became less compressible throughout his life. Going closer to the original metric, you demonstrate that the title is hard to compress (especially the lack of apostrophe).

Comment author: palladias 25 August 2013 06:37:46PM 0 points [-]

If you do, please post about it!

Comment author: wedrifid 22 August 2013 02:33:55AM *  1 point [-]

(Actually, that would be an interesting experiment - remove or replace fraction x of the letters in a paragraph and see at what average x participants can no longer make a "corrected" copy.)

Studies of this form have been done at least on the edge case where all the material removed is from the end (ie. tests of the ability of subjects to predict the next letter in an English text). I'd be interested to see your more general test but am not sure if it has been done. (Except, perhaps, as a game show).