Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:
- too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
- walls of text;
- little concrete data/links/references;
- too much irrelevant data and references;
- poor spelling and grammar;
- obvious half-truths and misinformation.
Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?
Sometimes I run into people that have rather strong opinions on some topic, and it turns out that they are basing them on quite shallow and biased information. They are aware that their knowledge is quite limited compared to mine, and they admit that they don't want to put in the effort needed to learn enough to level the field.
But that's not really a problem. What is bothering me is that, sometimes, that declaration of ignorance is expressed with some kind of pride.
This behaviour is noticeable on other levels too, in politics or in the sciences-humanities culture clash.
I came up with several hypotheses which might account for this:
- Being opinionated on a topic you know little about is a sign of confidence and bravery. Any fool can play it safe and carefully form opinions based on solid knowledge, but it takes a real man to do it quickly and decidedly, with only partial information.
- Knowing something is an identity badge. In-depth knowledge of science, or computers, or any number of other fields is a sign that you are a geek. People are proud of not being geeks, or are a proud member of some other group that does not care for that particular knowledge.
- Knowledge is relative and/or unimportant. Not caring about concrete knowledge is a sign of post-modernist sophistication, or an avant-garde, non-mainstream thinking, which is something to be proud of.
- Displaying pride overcompensates for shame one normally feels when forced to acknowledge one's ignorance.
Do you notice this behaviour too? What do you think causes it?
EDIT: formatting, style, grammar
I can't seem to get my head around a simple issue of judging probability. Perhaps someone here can point to an obvious flaw in my thinking.
Let's say we have a binary generator, a machine that outputs a required sequence of ones and zeros according to some internally encapsulated rule (deterministic or probabilistic). All binary generators look alike and you can only infer (a probability of) a rule by looking at its output.
You have two binary generators: A and B. One of these is a true random generator (fair coin tosser). The other one is a biased random generator: stateless (each digit is independently calculated from those given before), with probability of outputting zero p(0) somewhere between zero and one, but NOT 0.5 - let's say it's uniformly distributed in the range [0; .5) U (.5; 1]. At this point, chances that A is a true random generator are 50%.
Now you read the output of first ten digits generated by these machines. Machine A outputs 0000000000. Machine B outputs 0010111101. Knowing this, is the probability of machine A being a true random generator now less than 50%?
My intuition says yes.
But the probability that a true random generator will output 0000000000 should be the same as the probability that it will output 0010111101, because all sequences of equal length are equally likely. The biased random generator is also just as likely to output 0000000000 as it is 0010111101.
So there seems to be no reason to think that a machine outputting a sequence of zeros of any size is any more likely to be a biased stateless random generator than it is to be a true random generator.
I know that you can never know that the generator is truly random. But surely you can statistically discern between random and non-random generators?
View more: Next