Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:
- too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
- walls of text;
- little concrete data/links/references;
- too much irrelevant data and references;
- poor spelling and grammar;
- obvious half-truths and misinformation.
Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?
I wouldn't call Eliezer's emphasis excessive, nor would I call the sequences "walls of text". This is an example of both: http://files.abovetopsecret.com/files/img/yj5053f092.png
My question is: if you didn't know any English, could you still infer that this is more likely to be baloney, or not?
Without knowing English, I would suggest that only the excessive repeated bangs and interrogation marks are high-value. The excessive ALL CAPS is likely mid-value, and the lack of paragraph breaks is low-value.