Detecting Web baloney with your nose?

-3 Post author: uzalud 10 November 2012 03:50PM

Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:

  1. too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
  2. walls of text;
  3. little concrete data/links/references;
  4. too much irrelevant data and references;
  5. poor spelling and grammar;
  6. obvious half-truths and misinformation.

Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?

 

Comments (21)

Comment author: jimrandomh 10 November 2012 06:41:01PM 12 points [-]

Remember, there's unlimited reading material to choose from; your not-worth-reading detector should be sensitive, because false negatives cost much more than false positives. When reading an author for the first time, unless I have a strong recommendation or other quality signal, I will stop if the first incidence of stupidity precedes the first insight, or if there are no good insights in the first 500 words or so.

For superficial signals like spelling and overuse of emphasis, I divide them into two categories: things a good writer would do if they were rushed, and things a good writer wouldn't ever do. Typos, missing words, few citations? You're looking at an unedited draft; whether that's okay or not depends on the context. Bold italic all-caps large font? Crackpot.

Comment author: dbaupp 10 November 2012 10:50:55PM 3 points [-]

There are a few other "crackpot indices" around. John Baez has a famous one, and Scott Aaronson has one in that vein (mostly specific to mathematics papers though).

Comment author: fubarobfusco 10 November 2012 05:04:34PM 5 points [-]

"Proper" spelling and grammar are some sort of indication of conscientiousness that the writer has put into ① their education, and ② the text itself. However, it's a pretty noisy signal; there are plenty of properly-spelled Bible study guides out there.

Comment author: beoShaffer 10 November 2012 07:53:18PM 1 point [-]

Also, there are a lot of insightful people who focused on learning other things (its amazing how little non-code writing even good cs programs will let you get away with) and/or who write in english because its common rather than because its what they were educated in.

Comment author: thomblake 12 November 2012 10:01:42PM 0 points [-]

its amazing how little non-code writing even good cs programs will let you get away with

And yet the good ones leave one with an appreciation for syntax that transfers itself naturally to the written word.

Comment author: buybuydandavis 10 November 2012 07:27:45PM 2 points [-]

In defense of crackpots, many of the canonical writers here would ping the crackpot meter of most people, as would most of the LW contributors.

Korzybski is a prime example. If I hadn't had a very strong prior from personal discussions, there is no way I would have made it 10 pages into Science and Sanity.

For serious reading, my priors are more important than typesetting. For web blogs and filtering forums, it's a decent way to filter complete unknowns.

Comment author: [deleted] 10 November 2012 04:59:12PM *  1 point [-]
  • too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
  • walls of text;
  • little concrete data/links/references;
  • too much irrelevant data and references;
  • poor spelling and grammar;
  • obvious half-truths and misinformation.

I count three that apply to Eliezer's sequences and another that can be applied to lukeprog's posts. And in addition to all four of these a fifth (poor spelling) that apply to my own posts.

Comment author: gjm 11 November 2012 03:14:35AM 2 points [-]

Would you care to clarify how much you mean "... so Eliezer and Luke are crackpotty" and how much you mean "... so these aren't a very good guide"? (For the avoidance of doubt, I don't think either argument is obviously crazy, though actually I think Eliezer and Luke aren't crackpots and those are useful crackpot indicators.)

Comment author: TrE 10 November 2012 05:10:55PM *  2 points [-]
  • two types of emphasis at once, such as underlined italic bold text
  • a product to be sold, such as a book written by a mistaken genius
Comment author: BerryPick6 10 November 2012 07:30:27PM 1 point [-]

Which one applies to Luke?

Comment author: [deleted] 10 November 2012 07:35:22PM 3 points [-]

Too much irrelevant data and references.

Comment author: prase 10 November 2012 06:44:47PM 0 points [-]

Which three?

Comment author: [deleted] 10 November 2012 06:46:32PM *  2 points [-]

Well Eliezer was found of italicizing words in his text, doesn't provide references for most of his statements and wrote quit a few walls of text. I mean the sequences are huge.

Comment author: uzalud 10 November 2012 07:23:13PM *  4 points [-]

I wouldn't call Eliezer's emphasis excessive, nor would I call the sequences "walls of text". This is an example of both: http://files.abovetopsecret.com/files/img/yj5053f092.png

My question is: if you didn't know any English, could you still infer that this is more likely to be baloney, or not?

Comment author: [deleted] 10 November 2012 07:36:50PM 1 point [-]

That extreme? Yes it is evidence that the author has low competence and that is evidence of being false.

Comment author: Decius 10 November 2012 09:34:52PM 1 point [-]

Without knowing English, I would suggest that only the excessive repeated bangs and interrogation marks are high-value. The excessive ALL CAPS is likely mid-value, and the lack of paragraph breaks is low-value.

Comment author: [deleted] 11 November 2012 02:22:56PM *  0 points [-]

Number 7: comic sans