# So You Think You're a Bayesian? The Natural Mode of Probabilistic Reasoning

**Related to: **The Conjunction Fallacy,** **Conjunction Controversy

*The **heuristics and biases** research program in psychology has discovered many different ways that humans fail to reason correctly under uncertainty. In experiment after experiment, they show that we use heuristics to approximate probabilities rather than making the appropriate calculation, and that these heuristics are systematically biased. However, a tweak in the experiment protocols seems to remove the biases altogether and shed doubt on whether we are actually using heuristics. Instead, it appears that the errors are simply an artifact of how our brains internally store information about uncertainty. Theoretical considerations support this view.*

**EDIT**: The view presented here is controversial in the heuristics and biases literature; see Unnamed's comment on this post below.

**EDIT 2: **The author no longer holds the views presented in this post. See this comment.

A common example of the failure of humans to reason correctly under uncertainty is the conjunction fallacy. Consider the following question:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

What is the probability that Linda is:

(a) a bank teller

(b) a bank teller and active in the feminist movement

In a replication by Gigerenzer, 91% of subjects rank (b) as more probable than (a), saying that it is more likely that Linda is active in the feminist movement AND a bank teller than that Linda is simply a bank teller (1993). The conjunction rule of probability states that the probability of two things being true is less than or equal to the probability of one of those things being true. Formally, P(A & B) ≤ P(A). So this experiment shows that people violate the conjunction rule, and thus fail to reason correctly under uncertainty. The representative heuristic has been proposed as an explanation for this phenomenon. To use this heuristic, you evaluate the probability of a hypothesis by comparing how "alike" it is to the data. Someone using the representative heuristic looks at the Linda question and sees that Linda's characteristics resemble those of a feminist bank teller much more closely than that of just a bank teller, and so they conclude that Linda is more likely to be a feminist bank teller than a bank teller.

This is the standard story, but are people really using the representative heuristic in the Linda problem? Consider the following rewording of the question:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

There are 100 people who fit the description above. How many of them are:

(a) bank tellers

(b) bank tellers and active in the feminist movement

Notice that the question is now strictly in terms of frequencies. Under this version, only 22% of subjects rank (b) as more probable than (a) (Gigerenzer, 1993). The only thing that changed is the question that is asked; the description of Linda (and the 100 people) remains unchanged, so the representativeness of the description for the two groups should remain unchanged. Thus people are *not* using the representative heuristic - at least not in general.

Tversky and Kahneman, champions and founders of the heuristics and biases research program, acknowledged that the conjunction fallacy can be mitigated by changing the wording of the question (1983, pg 309), but this isn't the only anomaly. Consider another problem:

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person's symptoms or signs?

Using Bayes' theorem, the correct answer is .02, or 2%. In one replication, only 12% of subjects correctly calculated this probability. In these experiments, the most common wrong answer given is usually .95, or 95% (Gigerenzer, 1993). This is what's known as the base rate fallacy because the error comes from ignoring the "base rate" of the disease in the population. Intuitively, if absolutely no one has the disease, it doesn't matter what the test says - you still wouldn't think you had the disease.

Now consider the same question framed in terms of relative frequencies.

One out of 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people. How many people who test positive for the disease will actually have the disease?

_____ out of _____.

Using this version of the question, 76% of subjects answered correctly with 1 out of 50. Instructing subjects to visualize frequencies in graphs increases this percentage to 92% (Gigerenzer, 1993). Again, re-framing the question in terms of relative frequencies rather than (subjective) probabilities results in improved performance on the test.

Consider yet another typical question in these experiments:

Which city has more inhabitants?

(a) Hyderabad

(b) Islamabad

How confident are you that your answer is correct?

50%, 60%, 70%, 80%, 90%, 100%

According to Gigerenzer (1993),

The major finding of some two decades of research is the following: In all the cases where subjects said, "I am 100% confident that my answer is correct," the relative frequency of correct answers was only about 80%; in all the cases where subjects said, "I am 90% confident" the relative frequency of correct answers was only about 75%, when subjects said "I am 80% confident" the relative frequency of correct answers was only about 65%, and so on.

This is called overconfidence bias. A Bayesian might say that you aren't calibrated. In any case, it's generally frowned upon by both statistical camps. If when you say you're 90% confident and you're only right 80% of the time, why not just say you're 80% confident? But consider a different experimental setup. Instead of only asking subjects one general knowledge question like the Hyderabad-Islamabad question above, ask them 50; and instead of asking them how confident they are that their answer is correct every time, ask them at the end how many they think they answered correctly. If people are biased in the way that overconfidence bias says they are there should be no difference between the two experiments.

First, Gigerenzer replicated the original experiments, showing an overconfidence bias of 13.8% - that is, subjects were an additional 13.8% more confident than the true relative frequency of correct answers, on average. For example, if they claimed a confidence of 90%, on average they would answer correctly 76.2% of the time. Using the 50 question treatment, overconfidence biased dropped to -2.4%! In a second replication, the control was 15.4% and the treatment was -4.2% (1993). Note that -2.4% and -4.2% are likely not significantly different from 0, so don't interpret that as *under*confidence bias. Once the probability judgment was framed in terms of relative frequencies, the bias basically disappeared.

So in all three experiments, the standard results of the heuristics and biases program fall once the problem is recast in terms of relative frequencies. Humans don't simply use heuristics; something else more complicated is going on. But the important question is, of course, what else? To answer that, we need to take a detour through information representation. Any computer - and the brain is just a very difficult to understand computer - has to represent its information symbolically. The problem is that there are usually many ways to represent the same information. For example, 31, 11111, and XXXI all represent the same number using different systems of representation. Aside from the obvious visual differences, systems of representation also differ based on how easy they are to use for a variety of operations. If this doesn't seem obvious, as Gigerenzer says, try long division using roman numerals (1993). Crucially, this difficulty is relative to the computer attempting to perform the operations. Your calculator works great in binary, but your brain works better when things are represented visually.

What does the representation of information have to do with the experimental results above? Well, let's take another detour - this time through the philosophy of probability. As most of you already know, there the two most common positions are frequentism and Bayesianism. I won't get into the details of either position beyond what is relevant, so if you're unaware of the difference and are interested click the links. According to the Bayesian position, all probabilities are subjective degrees of belief. Don't worry about the sense in which probabilities are subjective, just focus on the degrees of belief part. A Bayesian is comfortable assigning a probability to any proposition you can come up with. Some Bayesians don't even care if the proposition is coherent.

Frequentists are different beasts altogether. For a frequentist, the probability of an event happening is its relative frequency in some well defined reference class. A useful though not entirely accurate way to think about frequentist probability is that there must be a numerator and a denominator in order to get a probability. The reference class of events you are considering provides the denominator (the total number of events), and the particular event you are considering provides the numerator (the number of times that particular event occurs in the reference class). If you flip a coin 100 times and get 37 heads and are interested in heads, the reference class is coin flips. Then the probability of flipping a coin and getting heads is 37/100.^{1 }Key to all of this is that the frequentist thinks there is no such thing as the probability of a single event happening without referring to some reference class. So returning to the Linda problem, there is no such thing as a frequentist probability that Linda is a bank teller, or a bank teller and active in the feminist movement. But there *is* a probability that, out of 100 people who have the same description as Linda, a randomly selected person is a bank teller, or a bank teller and active in the feminist movement.

In addition to the various philosophical differences between the Bayesians and frequentists, the two different schools also naturally lead to two different ways of representing the information contained in probabilities. Since all the frequentist cares about is relative frequencies, the natural way to represent probabilities in her mind is through, well, frequencies. The actual number representing the probability (e.g. p=.23) can always be calculated later as an afterthought. The Bayesian approach, on the other hand, leads to thinking in terms of percentages. If probability is just a degree of belief, why not represent it as such with, say, a number between 0 and 1? A "natural frequentist" would store all probabilistic information as frequencies, carefully counting each time an event occurs, while a "natural Bayesian" would store it as a single number - a percentage - to be updated later using Bayes' theorem as information comes in. It wouldn't be surprising if the natural frequentist had trouble operating with Bayesian probabilities. She thinks in terms of frequencies, but a single number isn't a frequency - it has to be converted to a frequency in some way that allows her to keep counting events accurately if she wants to use this information.

So if it isn't obvious by now, we're natural frequentists! How many of you thought you were Bayesians?^{2} Gigerenzer's experiments show that changing the representation of uncertainty from probabilities to frequencies drastically alters the results, making humans appear much better at statistical reasoning than previously thought. It's not that we use heuristics that are systematically biased, our native architecture for representing uncertainty is just better at working with frequencies. When uncertainty isn't represented using frequencies, our brains have trouble and fail in apparently predictable ways. To anyone who had Bayes' theorem intuitively explained to them, it shouldn't be all that surprising that we're natural frequentists. How does Eliezer intuitively explain Bayes' theorem? By working through examples using *relative frequencies*. This is also a relatively common tactic in undergraduate statistics textbooks, though it may only be because undergraduates typically are taught only the frequentist approach to probability.

So the heuristics and biases program doesn't catalog the various ways that we fail to reason correctly under uncertainty, but it *does* catalog the various ways we reason incorrectly about probabilities *that aren't in our native representation*. This could be because of our native architecture just not handling alternate representations of probability effectively, or it could be because when our native architecture starts having trouble, our brains automatically resort to using the heuristics Tversky and Kahneman were talking about. The latter seems more plausible to me in light of the other ways the brain approximates when it is forced to, but I'm still fairly uncertain. Gigerenzer has his own explanation that unifies the two domains under a specific theory of natural frequentism and has performed further experiments to back it up. He calls his explanation a theory of probabilistic mental models.^{3} I don't completely understand Gigerenzer's theory and his extra evidence seems to equally support the hypothesis that our brains are using heuristics when probabilities aren't represented as frequencies, but I will say that Gigerenzer's theory does have elegance going for it. Capturing both groups of phenomena with a unified theory makes Occam smile.

These experiments aren't the only reason to believe that we're actually pretty good at reasoning under uncertainty or that we're natural frequentists; there are theoretical reasons as well. First, consider evolutionary theory. If lower order animals are decent at statistical reasoning, we would probably expect that humans are good as well since we all evolved from the same source. It is possible that a lower order species developed its statistical reasoning capabilities *after* its evolutionary path diverged from the ancestors of humans, or that statistical reasoning became less important for humans or their recent ancestors and thus evolution committed less resources to the process. But the ability to reason under uncertainty seems so useful, and if any species has the mental capacity to do it, we would expect humans to with their large, adept brains. Gigerenzer summarizes the evidence across species (1993):

Bumblebees, birds, rats, and ants all seem to be good intuitive statisticians, highly sensitive to changes in frequency distributions in their environments, as recent research in foraging behavior indicates (Gallistel, 1990; Real & Caraco, 1986). From sea snails to humans, as John Staddon (1988) argued, the learning mechanisms responsible for habituation, sensitization, and classical and operant conditioning can be described in terms of statistical inference machines. Reading this literature, one wonders why humans seem to do so badly in experiments on statistical reasoning.

Indeed. Should we really expect that bumblebees, birds, rats, and ants are better intuitive statisticians than us? It's certainly possible, but it doesn't appear all that likely, a priori.

Theories of the brain from cognitive science provide another reason why we would be adept at reasoning under uncertainty and a reason why would be natural frequentists. The connectionist approach to the study of the human mind suggests that the brain encodes information by making literal physical connections between neurons, represented on the mental level by connections between concepts. So, for example, if you see a dog and notice that it's black, a connection between the concept "dog" and the concept "black" is made in a very literal sense. If connectionism is basically correct, then probabilistic reasoning shouldn't be all that difficult for us. For example, if the brain needs to calculate the probability that any given dog is black, it can just count the number of connections between "dog" and "black" and the number of connections between "dog" and colors other than black.^{4} Voila! Relative frequencies. As Nobel Prize winning economist Vernon Smith puts it (2008, pg 208):

Hayek's theory

^{5}- that mental categories are based on the experiential relative frequency of coincidence between current and past perceptions - seems to imply that our minds should be good at probability reasoning.

It also suggests that we would be natural frequentists since our brains are quite literally built on relative frequencies.

So both evidence and theory point in the same direction. The research of Tversky and Kahneman, among others, originally showed that humans were fairly bad at reasoning under uncertainty. It turns out much of this is an artifact of how their subjects were asked to think about uncertainty. Having subjects think in terms of frequencies basically eliminates biases in experiments, suggesting that humans are just natural frequentists - their minds are structured to handle probabilities in terms of frequencies rather than in proportions or percentages. Only when we are working with information represented in a form difficult for our native architecture to handle do we appear to be using heuristics. Theoretical considerations from both evolutionary biology and cognitive science buttress both claims - that humans are both natural frequentists and not so bad at handling uncertainty - at least when thinking in terms of frequencies.

**Footnotes**

1: To any of you who raised an eyebrow, I did it on purpose ;).

2: Just to be clear, I am *not* arguing that since we are natural frequentists, the frequentist approach to probability is the correct approach.

3: What seems to be the key paper is the second link in the Google search I linked to. I haven't read it yet, so I won't really get into his theory here.

4: I acknowledge that this is a very simplified example and a gross simplification of the theory.

5: Friedrich Hayek, another Nobel Prize winning economist, independently developed the connectionist paradigm of the mind culminating in his 1952 book *The Sensory Order*. I do recommend reading Hayek's book, but not without a reading group of some sort. It's short but dense and very difficult to parse - let's just say Hayek is not known for his prose.

**References**

Gigerenzer, Gerd. 1993. "The Bounded Rationality of Probabilistic Mental Models." in Manktelow, K. I., & Over, D. E. eds. *Rationality: Psychological and philosophical perspectives*. (pp. 284-313). London: Routledge. Preprint available online.

Smith, Vernon L. 2008. *Rationality in Economics*. Cambridge: Cambridge UP.

Tversky, A., and D. Kahneman. 1983. "Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment." *Psychological Bulletin* 90(4):293-315. Available online.

## Comments (79)

Best*24 points [-]This frequencies vs. probabilities issue is one of the controversies in heuristics & biases research, and Kahneman, Tversky, and others dispute Gigerenzer's take. For instance, here (pdf, see p. 8) is what Gilovich & Griffin have to say in their introduction to the book Heuristics and Biases (emphasis added):

Indeed, there seems to have been something of a feud between Kahneman and Gigerenzer. See Kahneman's response to Gigerenzer, and Gigerenzer's counter-response.

*19 points [-]Thanks for the links!

Reading both of these papers sent me on a trawl through the literature. Kahneman's paper sparked it. He reported an experiment to test Gigerenzer's hypothesis that recasting the problem in terms of frequencies reduces or eliminates the prevalence of the conjunction fallacy (pg 586-7). The results, according to Kahneman, confirm his hypothesis that frequencies just cue people to think in terms of set relations and reject Gigerenzer's hypothesis that people are natural frequentists.

In an unpublished manuscript (that I did not read), Hertwick (1997) challenged Kahneman's interpretation of his experiment, arguing that the language Kahneman used led subjects to misinterpret "and" to be disjunctive - i.e., when Kahneman asked how many out of 1000 women fitting the Linda description are "feminists and active bank tellers," the subjects interpreted this as "how many are feminists and how many bank tellers." Hertwick ran an experiment to test this, confirming his hypothesis.

Then Hertwick, Kahneman, and Mellers wrote an "adversarial collaboration" where Mellers arbitrated the disagreement between Hertwick and Kahneman (pdf). They ran 3 experiments to test the different interpretations that Hertwick and Kahneman were giving the data. The experiments didn't resolve the disagreement completely, but both parties moved closer to the other's position.

Finally, Tentori, Bonini, and Osherson devised a way to directly test whether subjects were misinterpreting the term "and" while they gave their subjects Linda-like problems (pdf). They ran two treatments - one with the probability language and the other with the frequency language. In both treatments, the majority of subjects interpreted "and" conjunctively, and those who interpreted it conjunctively still fell prey to the conjunction fallacy. There was no difference between treatments on the interpretation, and of those who interpreted "and" conjunctively, there was a minor but statistically insignificant difference in the prevalence of conjunction errors between the frequency and probability treatments.

Mea culpa. We aren't natural frequentists. When the frequency language helps, it seems to be due to cuing the set inclusion relation, as Kahneman argued. I don't think the issue is completely settled - in particular I think the last paper doesn't quite get at the crux of the issue - but the evidence is pointing in the direction of Kahneman's hypothesis.

Pigeons are natural frequentists.

Good links. Kahneman's response is worth reading, especially if you find yourself somewhat persuaded by Gigerenzer's point of view.

Thanks for the link. I've edited the post to let people know about the controversy.

The pdf here is well worth reading! Thanks.

Thanks, Matt!

That's a nice educational post.

I want to pick a nit, not with you, but with Gigerenzer and " ... the conjunction fallacy can be mitigated by changing the wording of the question ... " Unfortunately, in real life, the problems come at you the way they do, and you need to learn to deal with it.

I say that rational thinking looks like this: pencil applied to paper. Or a spreadsheet or other decision support program in use. We can't do this stuff in our heads. At least I can't. Evolution didn't deliver arithmetic, much less rationality. We teach arithmetic to kids, slowly and painstakingly. We had better start teaching them rationality. Slowly and painstakingly, not like a 1-hour also-mentioned.

And, since I have my spreadsheet program open, I will indeed convert probabilities into frequencies and look at the world both ways, so my automatic processors can participate. But, I only trust the answers on the screen. My brain lies to me too often.

Once again, thanks Matt. Well done!

I thought that a major point of heuristics and biases program, at least for economics, was that they were systematic and in a sense "baked-in" as default. If these errors are artifacts of tweaks/wording then that really undermines hope of theoretical extension. The value of this kind of knowledge becomes lopsided towards marketers, magicians, and others trying to manipulate or trick people more effectively.

On the other hand, I think the idea of using the error data as clues as to neural architecture and functioning is great! It seems that neuroscience-clustered research is focused mostly bottom-up and rarely takes inspiration from the other direction.

This raises an interesting point. We

cando arithmetic in our heads, some of us more spectacularly than others. Do you mean to say that there is no way to employ/train our brains to do rational thinking more effectively and intuitively?I had always hoped that we could at least shape our intuition enough to give us a sense for situations where it would be better to calculate - though it's costly and slower. We do not always have our tools (although I guess in the future this is less and less likely).

"Do you mean to say that there is no way to employ/train our brains to do rational thinking more effectively and intuitively?"

I don't don't know whether RickJS meant to say that or not. But this blog post suggests to me a way forward: whenever confronted with questions about likelihood or probability, consciously step back and assess whether a frequentist analysis is possible. Use that approach if it is. If not, shift toward Bayesian views. But in either case, also ask: can I really compute this accurately, or is it too complex? Some things you can do well enough in your head, especially when perfect accuracy isn't necessary (or even possible). Some things you can't.

Maybe if you started kids in their junior year in high school, they might be pretty skilled at telling which was which (of the

fourpossibilities inherent in what I outline above) by the end of their senior year.Okay, this is silly, but I can't for the life of me figure out what that number and those systems of representation are.

You get points for being confused by fiction!

*0 points [-]Base 10, binary, and roman numerals - in that order. (The number is 341)

EDIT: the base 10 number was wrong, it's 31.

*1 point [-]But Roman numerals for 341 would be CCCXLI, surely? XXXI is 31.

*0 points [-]...and this is why I should type numbers on a numpad. It's supposed to be 31. Thanks

The binary is also not 341.

*0 points [-]Yep, it should be 31. Thanks. (see my comment to Alicorn's post above, and my comment below... sheesh, this is a lesson in double checking things)

*2 points [-]Am I dense, or is the binary not 31? 31 in binary is 11111. What's in your post is not even binary coded decimal, as the leading zeros would seem to indicate, since there are two extra 3's. I feel like I'm missing something.

0011001100110001 in ASCII is "31".

Aha! here's the converter used the first time.

No, you aren't being dense. I used the first binary calculator I found on the net to change 31 into binary, and either I entered 31 into it incorrectly or it was just flat out wrong for some reason. Thanks again, and fixed again.

*0 points [-]Three possibilities:

I feel a recursive loop coming on... YEEEAAAAARGH!

*4 points [-]This post is very much in accordance with my experience. I've never been able to develop any non-frequentist intuitions about probability, and even simple problems sometimes confuse me until I translate them into explicit frequentist terms. However, once I have the reference classes clearly defined and sketched, I have no difficulty following complex arguments and solving reasonably hard problems in probability. (This includes the numerous supposed paradoxes that disappear as soon as the problem is stated in clear frequentist terms.)

Moreover, I'm still at a loss to understand what meaning the numerical values of probabilities could have except for the frequentist ratios that they imply. I raised the question in a recent discussion here, but I didn't get any satisfactory answers.

You watch someone flip a coin a hundred times. After a while, you get your frequentist sense of the probability that it will come up heads.

Then somebody takes a small, flat

squarepiece of metal, writes "heads" on one side. Before flipping it, he asks you: "What's the chance it's going to come up 'heads' 100 times in a row?"Would you say, "I have no idea?"

If you said, "Well, very unlikely, obviously", what makes it so obvious to you? What's your

degree of certaintyabout each statement in your line of reasoning? And where did those degrees of certainty come from?Sure, all sorts of past "reference classes" and analogous events might turn up as we discussed your reasoning. But the fact would still remain, most people, whether asked about that coin or asked about that small flat

squarepiece of metal, will give you an answer that's pretty inaccurate if you ask them how likely it is that it will come up headsfivetimes in a row, no matter whether you asked in frequentist terms or Bayesian terms.When it comes to assessing the chance of a certain series of independent events, bias of some kind does seem to enter. This is probably (um, heh) because, although we might be fairly frequentist when it comes to notably and frequent events, we don't naturally note any given series of independent events as an

event in itself. (For one thing, the sheer combinatorics prevent even encountering many such series.)I wouldn't be surprised if the ultimate synthesis shows our brains are

effectivelyfrequentist (even almost optimally so) when it comes the sorts of situations they evolved under, but also that these evolved optimizations break under conditions found in our increasingly artificial world. One does not find things much like coins in nature, nor much reason for people to use their computed fairness to resolve issues.*1 point [-]Haha, a similar argument could justify numerical "degrees of tastiness" for sandwiches.

Are you denying degrees of tastiness?!?

*9 points [-]Here's my take on 'Linda'. Don't know if anyone else has made the same or nearly the same point, but anyway I'll try to be brief:

Let E be the background information about Linda, and imagine two scenarios:

Now obviously P(A | E) is greater than or equal to P(B | E).

However, I think it's quite reasonable for P(A | E + "someone told us A") to be less than P(B | E + "someone told us B"), because if someone merely tells us A, we don't have any particularly good reason to believe them, but if someone tells us B then it seems likely that theyknowthis particular Linda, that they're thinking of the right person, and that they know she's a bank teller.However, the 'frequentist version' of the Linda experiment cannot possibly be (mis?)-interpreted in this way, because we're

fixingthe statements A and B and considering a whole bunch of people who are obviously unrelated to the processes by which the statements were formed.(Perhaps there's an analogous point to be made about your second example: Someone being tested at all is likely to be someone for whom there are independent reasons why they might have the disease (perhaps they exhibited some of the symptoms, got worried and went to see their doctor.)

But surely the experiment must have specified that the person being tested for the disease was picked at random from the population?)

*5 points [-]AlephNeil:

I just skimmed through the 1983 Tversky & Kahneman paper, and the same thing occurred to me. Given the pragmatics of human natural language communication, I would say that T&K (and the people who have been subsequently citing them) are making too much of these cases. I'm not at all surprised that the rate of "fallacious" answers plummets when the question is asked in a way that suggests that it should be understood in an unnaturally literal way, free of pragmatics -- and I'd expect that even the remaining fallacious answers are mostly due to casual misunderstandings of the question, again caused by pragmatics (i.e. people casually misinterpreting "A" as "A & non-B" when it's contrasted with "A & B").

The other examples of the conjunction fallacy cited by T&K also don't sound very impressive to me when examined more closely. The US-USSR diplomatic break question sounds interesting until you realize that the probabilities actually assigned were so tiny that they can't be reasonably interpreted as anything but saying that the event is within the realm of the possible, but extremely unlikely. The increase due to conjunction fallacy seems to me well within the noise -- I mean, what rational sense does it make to even talk about the numerical values of probabilities such as 0.0047 and 0.0014 with regards to a question like this one? The same holds for the other questions cited in the same section.

It strikes me that academics have a blind spot to one of the major weaknesses of this research because to get to their position they have had to adapt to exam questions effectively during their formative years.

One of the tricks to success in exams is to learn how to read the questions in a very particular way where you ignore all your background knowledge and focus to an unusual degree on the precise wording and context of the question. This is a terrible (and un-Bayesian) practice in most real world scenarios but is necessary to jump through the academic hoops required to get through school and university.

Most people who haven't trained themselves to deal with these type of questions will apply common sense and make 'unwarranted' assumptions. A wise strategy in the real world but not in exam style questions.

*2 points [-]see this comment.

also:

This is covered in Tversky and Khaneman, 1983. Also in conjunction controversy

*0 points [-]Matt_Simpson:

You mean the medical question? I'm not at all impressed with that one. This question and the subsequent one about its interpretation were worded in a way that takes a considerable mental effort to parse correctly, and is extremely unnatural for non-mathematicians, even highly educated ones like doctors. What I would guess happened was that the respondents skimmed the question without coming anywhere near the level of understanding that T&K assume in their interpretation of the results.

Again, when thinking about these experiments, one must imagine realistic people in a realistic setting, who are extremely unlikely to be willing to grapple with semantic subtleties that go beyond what they'll gather with effortless casual skimming of the questions and offhand answers given without much thought.

Do you mean this one (from Conjunction Controversy):

In what way is this question difficult for a doctor to parse? Give the subjects a little credit here.

Also note this about the Linda problem (also from Conjunction Controversy):

*3 points [-]Matt_Simpson:

Yes, that's the one.

In my experience, outside of some very exceptional situations like e.g. in-depth discussions of complex technical subjects, the overwhelming majority of people, including highly educated people, simply don't operate under the assumption that language should be understood in a logically precise and strict way. The standard human modus operandi is to skim the text in a very superficial manner and interpret it according to intuitive hunches and casual associations, guided by some strong preconceptions about what the writer is likely to be trying to say -- and it's up to you to structure your text so that it will be correctly understood with such an approach, or at least to give a clear and prominent warning that it should be read painstakingly in an unnaturally careful and literal way.

Some people are in the habit of always reading in a precise and literal way (stereotypically, mathematicians tend to be like that). I am also like that, and I'm sure many people here are too. But this is simply not the way the great majority of people function -- including many people whose work includes complex and precise formal reasoning, but who don't carry over this mode of thinking into the rest of their lives. In particular, from what I've seen, doctors typically don't practice rigid formal reasoning much, and it's definitely not reasonable to expect them to make such an effort in a situation where they lack any concrete incentive to do so.

I thought about this post for awhile - partially because I've just been too busy for LW the past few days - and I'm still pretty skeptical. I general, I think you're right - people don't closely read much of anything, or interpret much literally. I've seen enough economic experiments to know that subject rarely have even a basic grasp of the rules coming into the experiment, and only when the experiment begins and they start trying things out do they understand the environment we put them in.

However, in the heuristics and biases experiments what subjects are reading is only a couple of sentences long. In my exerpience, people tend to only skim when what they're reading is long or complicated. So I find it fairly hard to believe that most people aren't reading something like the Linda problem close enough to understand it - especially undergraduates at high end universities and trained doctors.

OTOH, I'm open to any evidence you have

*2 points [-]Matt_Simpson:

My thinking about this topic is strongly influenced by my experiences from situations where I was in charge of organizing something, but without any formal authority over the people involved, with things based on an honor system and voluntary enthusiasm. In such situations, when I send off an email with instructions, I often find it a non-trivial problem to word things in a such a way that I'll have peace of mind that it will be properly understood by all recipients.

In my experience, even very smart people with a technical or scientific background who normally display great intelligence and precision of thought in the course of their work will often skim and misunderstand questions and instructions worded in a precise but unnatural way, unless they have an incentive to make the effort to read the message with extra care and accuracy (e.g. if it's coming from someone whose authority they fear). Maybe some bad experiences from the past have made me excessively cautious in this regard, but if I caught myself writing an email worded the same way as the doctors' question by T&K and directed at people who won't be inclined to treat it with special care -- no matter how smart, except perhaps if they're mathematicians -- I would definitely rewrite it before sending.

*2 points [-]Why would you think that subjects are working from a different state of information for the two possibilities in the Linda question? Here's the question again

as the subjects read it:After reading the question, the probability of (a) and (b) is evaluated - with the same state of information: the background knowledge (E in your terms), that someone told us (a) (A), and that someone told us (b) (A). So formally, the two probabilties are:

So the conjunction rule still holds. Now it's certainly

possiblethat subjects are interpreting the question in the way you suggest (with different states of information for A and B), but it's also possible that they're interpreting it in any number of incorrect ways. They could think it's a thinly veiled question about how they feel about feminism, for example. So why do you think the possible interpretation you raise is plausible enough to be worrisome?note: this comment was scrapped and rewritten immediately after it was posted

*1 point [-]Why would someone tell us "Linda is a bank teller and Linda is a bank teller and active in the feminist movement."? That would be indeed a strange sentence.

ETA: Maybe the parent comment can be formulated more clearly in the following way (using frequentist language): People parse the discussed question not as

what fraction of people from category E belong also into category A?, but ratherwhat fraction of people telling us that a person (who certainly belongs to E) belongs also to A speak truth?, or even better,what fraction of individual statements of the described type is true?Although A may be proper subset of B, statements telling A about any particular Linda aren't proper subset of statements telling B about her. Quite contrary, they are disjoint. (That is, people tend to count frequencies of statements of given precise formulation, i.e. don't count each occurence of B as a simultaneous occurence of A, even if B can be reanalysed as

A and C. Of course, I am relying on my intuition in that and can be guilty of mind projection here.)It is entirely possible to imagine that among real world statements about former environmental activists, the exact sentence "she is a bank teller" is less often true than the exact sentence "she is a bank teller and an active feminist". I am quite inclined to believe that more detailed information is more often true than less detailed one, since the former is more likely to be given by informed people, and this mechanism may have contributed to evolution of heuristics which produce the experimentally detected conjunction fallacy.

*9 points [-]When considering the initial probability question regarding Linda, it strikes me that it isn't really a choice between a single possibility and two conjoined possibilities.

Giving a person an exclusive choice between "bank teller" OR "bank teller and feminist" will make people imply that "bank teller" means "bank teller and not feminist".

So both choices are conjoined items, it's just that one of them is hidden.

Given this, people may not be so incorrect after all.

Edit: People should probably stop giving this post points, given Sniffnoy's linking of a complete destruction of this objection :)

This has already been addressed in Conjunction Controversy.

I'll stick with my upmod because, while Sniffnoy's link explains that the Linda experiment did take this ambiguity into account and played around it, it was an entirely reasonable point to raise after reading

thispost, which gives no indication that those two weren't the only options or that they weren't placed side-by-side.*2 points [-]Agreed, I've always thought that the heuristics and biases research is less clear cut than is usually presented due to ambiguity in the question and experimental setup. People naturally read more into questions than is strictly implied because that is the normal way we deal with language. They may make not unreasonable assumptions that would normally be valid and are only ruled out by the artificial and unnatural constraints of the experiment.

For example, it has long struck me that the obvious explanation for hyperbolic discounting is people making quite reasonable assumptions about the probability of collecting the promised rewards and thus it is not good evidence for chronic time inconsistency in preferences. In looking up the Wikipedia reference for hyperbolic discounting I see that I am unsurprisingly not the first to notice this.

More evidence in favor of the hypothesis that we are natural frequentists: Even though I try to think like a Bayesian, I am mentally incapable of assigning probabilities without perfect information. The best I can do is offer a range of probabilities that seem reasonable, whereas a real Bayesian should be able to average the probabilities that it would assign given all possible sets of perfect information, weighted by likelihood.

One thing I do not like about research such as this is that they only report mean overconfidence values. How can you conclude from the mean that everyone is overconfident? Perhaps only halve of the subjects are very overconfident while the other halve are less overconfident.

Just give us all the data! It should be easy to visualize in a scatter plot, for example.

This is very interesting. Thanks!

When I heard about Bayesian and Frequentist, I thought Bayesianism made more intuitive sense because I was used to working with random variables. (It's the intuition of someone more used to chalkboards than lab coats.)

I wonder if people appear to be "natural frequentists" because we are better at thinking "how many" than "how likely." "How likely" is a prediction about the future; and it's easy to think that your wishes and hopes can influence the future.

Maybe what we're really bad at is internalizing the Ergodic Theorem -- understanding that if something happens a low percentage of the time, then it's unlikely to happen.

*2 points [-]One problem I have with Gigerenzer is that he often seems to be deliberately obtuse by taking philosophical positions that don't allow for the occurance of errors.

For instance, time and time again he states that, because one-time events don't have probabilities (in a frequentist interpretation), it's incoherent to say someone's confidence judgment is "wrong". As Kahneman points out, this violates common sense - surely we should consider someone wrong if they say they are 99.99% confident that tomorrow the sun will turn into a chocolate cake.

In another example, Kahneman stresses the importance of doing between-subject experiments in addition to in-subject experiments, since on in-subject experiments subjects are given additional information (the difference between subsequent problems) that could affect the experiment. So, for instance, the Linda problem would be performed between-subject by having one random group assess the probability (or frequency) that Linda is a bank teller, and another random group that she is a bank teller and a feminist, and then comparing the two average probabilities. Gigerenzer's claim, is that because, in these situations, no INDIVIDUAL subject displays internal inconsistency, this can't be seen as a bias.

I'm generally a big Gigerenzer fan, but this seems wrongheaded to me.

I'll go one further. Someone who frequently makes statements like these typically isn't reasoning at all, but simply rationalizing to justify some belief they have an emotional attachment to. I'd have to take anything else he says with a big grain of salt, because if he's capable of torturing logic this much on the obvious cases the ones that aren't obvious are probably even worse.

That's why when I try to explain Bayes' Theorem to people I draw it out as a venn diagram

andput a whole bunch of little stick figures inside.It's Bayes' theorem, not Baye's theorem :).

*3 points [-]No, it's Bayes's theorem.

Or at least it should be; I'm reasonably confident that you only use

s'for plural possessives; singular nouns that end inshave a possessive form ending ins's.Personally, I always write "Bayes's". But since I was citing Wikipedia, I didn't want to muddy waters further by not using their convention.

I've had this claim go both ways. I had at least one history teacher in highschool who strongly argued for this rule applying only to plural possessives. As with many rules, consistency matters more than anything else. I think that only applying it to plurals makes more sense (in that the distinction there actually helps you keep track of the number of the word) but it seems that nowadays it is more common for people to use s' construction for any word ending in s.

Thanks, fixed. One day I'll internalize that ;)

Also, "You're calculator works great in binary" should be "Your calculator..."

Fixed, Thanks!

Nitpick: shouldn't the answer to the disease question be 1/50.95 (instead of 1/50)? One person has the disease, and 49.95 (5% of 999) are false positives. So there are 50.95 total positives.

Yeah, I rounded. Using Bayes' theorem, the probability is .196 or so, so that gives .98/50.

*1 point [-]Somewhat relevant Op-Ed from the New York Times today on the limits of behavioural economics. The author, George Loewenstein, is considered a leading authority in the field.

I'm not sure if I buy that the "frequentist" explanations (as in the disease testing example) are best characterized by being frequentist -- it seems to me that they are just stating the problem and the data in a more relevant way to the question that's being asked. Without those extra statements, you have to decode the information down from a more abstract level.

*0 points [-]Upvoted; however, I thought the post could have been considerably shortened, with the last 13 or so paragraphs combined into just one or two.

*0 points [-]13!?

It has a bit of extra length because I made sure to explain some more elementary concepts for the benefit of newer LW users who aren't familiar with the sequences.

Matt,

I see how Gigerenzer's point is relevant to some of the biases such as the conjunction fallacy.

But what about other biases such as the anchoring bias?

Is there really a way to show that all fallacious reasoning in K&T's experiments is due to presentation of information in terms of probabilities as opposed to frequencies?

Thanks.

Anchoring is a phenomena that occurs in more places than just estimating probabilities and thus it seems to be a pretty common method of approximation use by our brains. This is one of the reasons why I argued (Gigerenzer doesn't argue this) that we only use heuristics when probabilities are in difficult to use form, but when they're in frequencies we just compute the answer

The experiment you would run seems to be straightforward, as long as you're just considering anchoring for probability estimates. Just find a previously-ran experiment, replicate it (as a control) and run another treatment changing the language to frequencies. Someone may have already ran this experiment, in fact.

*1 point [-]Well, to clarify, here's an example from here :

Here, the biased thinking isn't a result of thinking in terms of abstract probabilities as opposed to concrete frequencies.

I'm sympathetic to the points G makes. It's just that K&T's results don't always depend on information presented as probabilities.

*3 points [-]I must wonder whether, and to what extent, these results would replicate in a real-world situation where the question is perceived as truly important by the parties concerned.

When discussing research like this, people often imagine the subjects fully applying themselves, as if they were on an important exam or in a business situation where big money is involved. However, to get a more realistic picture, you should imagine yourself in a situation where someone is asking you obscure TV quiz-style questions about things that you don't care about in the slightest, bored to death, being there only because of some miserable incentive like getting a course credit or a few dollars of pocket money. I can easily imagine people in such a situation giving casual answers without any actual thought involved, based on random clues from the environment -- just like you might use e.g. today's date as an inspiration for choosing lottery numbers.

Therefore, the important question is: has anyone made similar observations in a situation where the subjects had a strong incentive to really give their best when thinking about the answers? If not, I think one should view these results with a strong dose of skepticism.

Your idea that the subjects are not taking the question seriously is a good one.

I had a discussion with someone about a very similar real life 'Linda'. It was finally resolved by realizing that the other person didn't think of 'and' and 'or' as defined terms that always differed and was quite put out that I thought he should know that. To put it in 'Linda' terms: he know that Linda was a feminist and doubted that she was a teller. This being the case the 'and' should be thought of as an 'or' and b was more likely than a. Why would anyone think differently? It kind of blew my mind that I was being accused of being sloppy or illogical by using the fixed defined meaning for 'and' and 'or'. I have since that time noticed that people actually often have this vagueness about logical terms.

*2 points [-]I generally think of "and" and "or" in the strict senses, but, by the same token, I get really annoyed when I use the word "or" (which, in English, is ambiguous about whether it is meant in the exclusive or inclusive sense) and people say "yes" or "true".

English already

haswords like "both" to answer in that question, which tells you in one syllable that the "exclusive or" reading is false but the "inclusive or" reading is true. This is not a standard part of symbolic logic curriculum, and is simplyhelpfulrather than a sign of having taken such a class and learned a technical jargon that borrowed the word "or" to strictly mean "inclusive or".I'd never head of someone generously interpreting an "and" as an "or" (or vice versa) but it makes sense to me that it would be common and helpful in the absence of assumed exposure to a technical curriculum with truth tables and quantified predicate logic and such (at least when a friendly discussion was happening, instead of a debate).

People actually do that when not trying to be annoying? That's surprising.

*0 points [-]Yeah, I would say "both" or "yes, it could be either" depending on what I meant. I also use "and/or" whenever I mean the inclusive or, though that's frowned on in formal writing.

That suggests another variant of the Linda problem: replace the "and" with "and also", and leave the rest unchanged. If this makes a big difference, it would suggest that many of the people who fail on the Linda problem fail for linguistic reasons (they have the wrong meaning for the word "and") rather than logical reasons.

*4 points [-]Many subjects fail to recognize that when a 6-sided die with 4 green faces and 2 red faces will be rolled several times, betting on the occurrence of the sequence GRRRRRG is dominated by betting on the sequence RRRRRG, when the subject is given the option to bet on either at the same payoff. This (well, something similar, I didn't bother to look up the actual sequences used) is cited as evidence that more is going on than subjects misunderstanding the meaning of "and" or "or". Sure, some subjects just don't use those words as the experimenters do, and perhaps this accounts for

someof why "Linda" shows such a strong effect, but it is a very incomplete explanation of the effect.Explanations of "Linda" based on linguistic misunderstandings, conversational maxims, etc., generally fail to explain other experiments that produce the same representativeness bias (though perhaps not as strongly) in contexts where there is no chance that the particular misunderstanding alleged could be present.

Good idea. At the next such situation, I'll try that. Hopefully it will not be soon but you never know.

*0 points [-]I would not say that this person replaced "and" by "or".

I guess they considered the statement "Lisa is a bank teller and a feminist" to be "50%" true if Lisa turns out to be a feminist but not a bank teller.

The formula used would be something like P(AB)=1/2*(P(A)+P(B))

You should read Conjunction Controversy (Or, How They Nail It Down) before proposing these sort of things.

In particular, if you haven't already, please read Extensional Versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment in full - it contains details of 22 different experiments designed to address problems like this.

*2 points [-]These articles talk only about the conjunction fallacy. Maybe it wasn't clear enough from the context, but my above reply was to a comment about the anchoring bias, and was meant to comment on that specific finding.

But in any case, I have no doubt that these results are reproducible in the lab. What I'm interested in is how much of these patterns we can see in the real world and where exactly they tend to manifest themselves. Surely you will agree that findings about the behavior of captive undergraduates and other usual sorts of lab subjects should be generalized to human life in general only with some caution.

Moreover, if clear patterns of bias are found to occur in highly artificial experimental setups, it still doesn't mean that they are actually relevant in real-life situations. What I'd like to see are not endless lab replications of these findings, but instead examples of relevant real-life decisions where these particular biases have been identified.

Given these considerations, I think that article by Eliezer Yudkowsky shows a bit more enthusiasm for these results than is actually warranted.

I believe this has been discussed in the context of the Efficient Market Hypothesis. I view it as something akin to the feud between Islam and Christianity.

mattnewport:

I'm unable to grasp the analogy -- could you elaborate on that?

*2 points [-]Two schools of economics / religion (behavioural / neoclassical, islam / christianity) with many shared assumptions (similar holy texts) that have attracted followers due to offering common sense advice and a solid framework of practical value but pursue an ongoing holy war over certain doctrinal issues that are equally flawed and ungrounded in reality.

Or: what they agree on is largely wrong but has some instrumentally useful elements. What they disagree on is largely irrelevant. The priesthood considers the differences very significant but most people ignore everything but the useful bits and get on with their lives.

*2 points [-]But that example

isprobabilities. Here's how I would redesign the experiment to make the subjects think in frequencies: