linkhyrule5 comments on Open thread, August 19-25, 2013 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (325)
Has anyone done a study on redundant information in languages?
I'm just mildly curious, because a back-of-the-envelope calculation suggests that English is about 4.7x redundant - which on a side note explains how we can esiayl regnovze eevn hrriofclly msispled wrods.
(Actually, that would be an interesting experiment - remove or replace fraction x of the letters in a paragraph and see at what average x participants can no longer make a "corrected" copy.)
I'd predict that Chinese is much less redundant in its spoken form, and that I have no idea how to measure redundancy in its written form. (By stroke? By radical?)
Yes, it's been studied quite a bit by linguists. You can find some pointers in http://www.gwern.net/Notes#efficient-natural-language which may be helpful.
Thanks.
... huh. Now I'm thinking about actually doing that experiment...
I ran into another thing in that vein:
--The Man Who Invented Modern Probability - Issue 4: The Unlikely - Nautilus
This also happens to me with music. I enjoy "unpredictable" music more than predictable music. Knowing music theory I know which notes are supposed to be played -- if a song is in a certain key -- and if a note or chord isn't predicted then it feels a bit more enjoyable. I wonder if the same technique could be applied to different genres of music with the same result, i.e. radio-friendly pop music vs non-mainstream music.
I wonder what that metric has to say about Finnigan's Wake...
By other metrics, Joyce became less compressible throughout his life. Going closer to the original metric, you demonstrate that the title is hard to compress (especially the lack of apostrophe).
If you do, please post about it!
Studies of this form have been done at least on the edge case where all the material removed is from the end (ie. tests of the ability of subjects to predict the next letter in an English text). I'd be interested to see your more general test but am not sure if it has been done. (Except, perhaps, as a game show).