Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:
- too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
- walls of text;
- little concrete data/links/references;
- too much irrelevant data and references;
- poor spelling and grammar;
- obvious half-truths and misinformation.
Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?
"Proper" spelling and grammar are some sort of indication of conscientiousness that the writer has put into ① their education, and ② the text itself. However, it's a pretty noisy signal; there are plenty of properly-spelled Bible study guides out there.
Also, there are a lot of insightful people who focused on learning other things (its amazing how little non-code writing even good cs programs will let you get away with) and/or who write in english because its common rather than because its what they were educated in.