Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:
- too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
- walls of text;
- little concrete data/links/references;
- too much irrelevant data and references;
- poor spelling and grammar;
- obvious half-truths and misinformation.
Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?
Remember, there's unlimited reading material to choose from; your not-worth-reading detector should be sensitive, because false negatives cost much more than false positives. When reading an author for the first time, unless I have a strong recommendation or other quality signal, I will stop if the first incidence of stupidity precedes the first insight, or if there are no good insights in the first 500 words or so.
For superficial signals like spelling and overuse of emphasis, I divide them into two categories: things a good writer would do if they were rushed, and things a good writer wouldn't ever do. Typos, missing words, few citations? You're looking at an unedited draft; whether that's okay or not depends on the context. Bold italic all-caps large font? Crackpot.