Conjunction Controversy (Or, How They Nail It Down)

Eliezer Yudkowsky

Conjunction Controversy (Or, How They Nail It Down)

by Eliezer Yudkowsky

10 min read20th Sep 200725 comments

58

Conjunction FallacyEmpiricismHeuristics & BiasesPractice & Philosophy of Science

Personal Blog

Followup to: Conjunction Fallacy

When a single experiment seems to show that subjects are guilty of some horrifying sinful bias - such as thinking that the proposition "Bill is an accountant who plays jazz" has a higher probability than "Bill is an accountant" - people may try to dismiss (not defy) the experimental data. Most commonly, by questioning whether the subjects interpreted the experimental instructions in some unexpected fashion - perhaps they misunderstood what you meant by "more probable".

Experiments are not beyond questioning; on the other hand, there should always exist some mountain of evidence which suffices to convince you. It's not impossible for researchers to make mistakes. It's also not impossible for experimental subjects to be really genuinely and truly biased. It happens. On both sides, it happens. We're all only human here.

If you think to extend a hand of charity toward experimental subjects, casting them in a better light, you should also consider thinking charitably of scientists. They're not stupid, you know. If you can see an alternative interpretation, they can see it too. This is especially important to keep in mind when you read about a bias and one or two illustrative experiments in a blog post. Yes, if the few experiments you saw were all the evidence, then indeed you might wonder. But you might also wonder if you're seeing all the evidence that supports the standard interpretation. Especially if the experiments have dates on them like "1982" and are prefaced with adjectives like "famous" or "classic".

So! This is a long post. It is a long post because nailing down a theory requires more experiments than the one or two vivid illustrations needed to merely explain. I am going to cite maybe one in twenty of the experiments that I've read about, which is maybe a hundredth of what's out there. For more information, see Tversky and Kahneman (1983) or Kahneman and Frederick (2002), both available online, from which this post is primarily drawn.

Here is (probably) the single most questioned experiment in the literature of heuristics and biases, which I reproduce here exactly as it appears in Tversky and Kahneman (1982):

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable:

(5.2) Linda is a teacher in elementary school.
(3.3) Linda works in a bookstore and takes Yoga classes.
(2.1) Linda is active in the feminist movement. (F)
(3.1) Linda is a psychiatric social worker.
(5.4) Linda is a member of the League of Women Voters.
(6.2) Linda is a bank teller. (T)
(6.4) Linda is an insurance salesperson.
(4.1) Linda is a bank teller and is active in the feminist movement. (T & F)

(The numbers at the start of each line are the mean ranks of each proposition, lower being more probable.)

How do you know that subjects did not interpret "Linda is a bank teller" to mean "Linda is a bank teller and is not active in the feminist movement"? For one thing, dear readers, I offer the observation that most bank tellers, even the ones who participated in anti-nuclear demonstrations in college, are probably not active in the feminist movement. So, even so, Teller should rank above Teller & Feminist. You should be skeptical of your own objections, too; else it is disconfirmation bias. But the researchers did not stop with this observation; instead, in Tversky and Kahneman (1983), they created a between-subjects experiment in which either the conjunction or the two conjuncts were deleted. Thus, in the between-subjects version of the experiment, each subject saw either (T&F), or (T), but not both. With a total of five propositions ranked, the mean rank of (T&F) was 3.3 and the mean rank of (T) was 4.4, N=86. Thus, the fallacy is not due solely to interpreting "Linda is a bank teller" to mean "Linda is a bank teller and not active in the feminist movement."

Similarly, the experiment discussed yesterday used a between-subjects design (where each subject only saw one statement) to elicit lower probabilities for "A complete suspension of diplomatic relations between the USA and the Soviet Union, sometime in 1983" versus "A Russian invasion of Poland, and a complete suspension of diplomatic relations between the USA and the Soviet Union, sometime in 1983".

Another way of knowing whether subjects have misinterpreted an experiment is to ask the subjects directly. Also in Tversky and Kahneman (1983), a total of 103 medical internists (including 37 internists taking a postgraduate course at Harvard, and 66 internists with admitting privileges at New England Medical Center) were given problems like the following:

A 55-year-old woman had pulmonary embolism documented angiographically 10 days after a cholecstectomy. Please rank order the following in terms of the probability that they will be among the conditions experienced by the patient (use 1 for the most likely and 6 for the least likely). Naturally, the patient could experience more than one of these conditions.

Dyspnea and hemiparesis

Calf pain

Pleuritic chest pain

Syncope and tachycardia

Hemiparesis

Hemoptysis

As Tversky and Kahneman note, "The symptoms listed for each problem included one, denoted B, that was judged by our consulting physicians to be nonrepresentative of the patient's condition, and the conjunction of B with another highly representative symptom denoted A. In the above example of pulmonary embolism (blood clots in the lung), dyspnea (shortness of breath) is a typical symptom, whereas hemiparesis (partial paralysis) is very atypical."

In indirect tests, the mean ranks of A&B and B respectively were 2.8 and 4.3; in direct tests, they were 2.7 and 4.6. In direct tests, subjects ranked A&B above B between 73% to 100% of the time, with an average of 91%.

The experiment was designed to eliminate, in four ways, the possibility that subjects were interpreting B to mean "only B (and not A)". First, carefully wording the instructions: "...the probability that they will be among the conditions experienced by the patient", plus an explicit reminder, "the patient could experience more than one of these conditions". Second, by including indirect tests as a comparison. Third, the researchers afterward administered a questionnaire:

In assessing the probability that the patient described has a particular symptom X, did you assume that (check one):
X is the only symptom experienced by the patient?
X is among the symptoms experienced by the patient?

60 of 62 physicians, asked this question, checked the second answer.

Fourth and finally, as Tversky and Kahneman write, "An additional group of 24 physicians, mostly residents at Stanford Hospital, participated in a group discussion in which they were confronted with their conjunction fallacies in the same questionnaire. The respondents did not defend their answers, although some references were made to 'the nature of clinical experience.' Most participants appeared surprised and dismayed to have made an elementary error of reasoning."

A further experiment is also discussed in Tversky and Kahneman (1983) in which 93 subjects rated the probability that Bjorn Borg, a strong tennis player, would in the Wimbledon finals "win the match", "lose the first set", "lose the first set but win the match", and "win the first set but lose the match". The conjunction fallacy was expressed: "lose the first set but win the match" was ranked more probable than"lose the first set". Subjects were also asked to verify whether various strings of wins and losses would count as an extensional example of each case, and indeed, subjects were interpreting the cases as conjuncts which were satisfied iff both constituents were satisfied, and not interpreting them as material implications, conditional statements, or disjunctions; also, constituent B was not interpreted to exclude constituent A. The genius of this experiment was that researchers could directly test what subjects thought was the meaning of each proposition, ruling out a very large class of misunderstandings.

Does the conjunction fallacy arise because subjects misinterpret what is meant by "probability"? This can be excluded by offering students bets with payoffs. In addition to the colored dice discussed yesterday, subjects have been asked which possibility they would prefer to bet $10 on in the classic Linda experiment. This did reduce the incidence of the conjunction fallacy, but only to 56% (N=60), which is still more than half the students.

But the ultimate proof of the conjunction fallacy is also the most elegant. In the conventional interpretation of the Linda experiment, subjects substitute judgment of representativeness for judgment of probability: Their feelings of similarity between each of the propositions and Linda's description, determines how plausible it feels that each of the propositions is true of Linda. If this central theory is true, then the way in which the conjunction fallacy follows is obvious - Linda more closely resembles a feminist than a feminist bank teller, and more closely resembles a feminist bank teller than a bank teller. Well, that is our theory about what goes on in the experimental subjects minds, but how could we possibly know? We can't look inside their neural circuits - not yet! So how would you construct an experiment to directly test the standard model of the Linda experiment?

Very easily. You just take another group of experimental subjects, and ask them how much each of the propositions "resembles" Linda. This was done - see Kahneman and Frederick (2002) - and the correlation between representativeness and probability was nearly perfect. 0.99, in fact. Here's the (rather redundant) graph:

Lindacorrelation

This has been replicated for numerous other experiments. For example, in the medical experiment described above, an independent group of 32 physicians from Stanford University was asked to rank each list of symptoms "by the degree to which they are representative of the clinical condition of the patient". The correlation between probability rank and representativeness rank exceeded 95% on each of the five tested medical problems.

Now, a correlation near 1 does not prove that subjects are substituting judgment of representativeness for judgment of probability. But if you want to claim that subjects are doing something else, I would like to hear the explanation for why the correlation comes out so close to 1. It will really take quite a complicated story to explain, not just why the subjects have an elaborate misunderstanding that produces an innocent and blameless conjunction fallacy, but also how it comes out to a completely coincidental correlation of nearly 1 with subjects' feeling of similarity. Across multiple experimental designs.

And we all know what happens to the probability of complicated stories: They go down when you add details to them.

Really, you know, sometimes people just make mistakes. And I'm not talking about the researchers here.

The conjunction fallacy is probably the single most questioned bias ever introduced, which means that it now ranks among the best replicated. The conventional interpretation has been nearly absolutely nailed down. Questioning, in science, calls forth answers.

I emphasize this, because it seems that when I talk about biases (especially to audiences not previously familiar with the field), a lot of people want to be charitable to experimental subjects. But it is not only experimental subjects who deserve charity. Scientists can also be unstupid. Someone else has already thought of your alternative interpretation. Someone else has already devised an experiment to test it. Maybe more than one. Maybe more than twenty.

A blank map is not a blank territory; if you don't know whether someone has tested it, that doesn't mean no one has tested it. This is not a hunter-gatherer tribe of two hundred people, where if you do not know a thing, then probably no one in your tribe knows. There are six billion people in the world, and no one can say with certitude that science does not know a thing; there is too much science. Absence of such evidence is only extremely weak evidence of absence. So do not mistake your ignorance of whether an alternative interpretation has been tested, for the positive knowledge that no one has tested it. Be charitable to scientists too. Do not say, "I bet what really happened was X", but ask, "Which experiments discriminated between the standard interpretation versus X?"

If it seems that I am driving this point home with a sledgehammer, well, yes, I guess I am. It does become a little frustrating, sometimes - to know about this overwhelming mountain of evidence from thousands of experiments, but other people have no clue that it exists. After all, if there are other experiments supporting the result, why haven't they heard of them? It's a small tribe, after all; surely they would have heard. By the same token, I have to make a conscious effort to remember that other people don't know about the evidence, and they aren't deliberately ignoring it in order to annoy me. Which is why it gets a little frustrating sometimes! We just aren't built for worlds of 6 billion people.

I'm not saying, of course, that people should stop asking questions. If you stop asking questions, you'll never find out about the mountains of experimental evidence. Faith is not understanding, only belief in a password. It is futile to believe in something, however fervently, when you don't really know what you're supposed to believe in. So I'm not saying that you should take it all on faith. I'm not saying to shut up. I'm not trying to make you feel guilty for asking questions.

I'm just saying, you should suspect the existence of other evidence, when a brief account of accepted science raises further questions in your mind. Not believe in that unseen evidence, just suspect its existence. The more so if it is a classic experiment with a standard interpretation. Ask a little more gently. Put less confidence in your brilliant new alternative hypothesis. Extend some charity to the researchers, too.

And above all, talk like a pirate. Arr!

Kahneman, D. and Frederick, S. 2002. Representativeness revisited: Attribute substitution in intuitive judgment. Pp 49-81 in Gilovich, T., Griffin, D. and Kahneman, D., eds. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge University Press, Cambridge.

Tversky, A. and Kahneman, D. 1982. Judgments of and by representativeness. Pp 84-98 in Kahneman, D., Slovic, P., and Tversky, A., eds. Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press.

Tversky, A. and Kahneman, D. 1983. Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90: 293-315.

Conjunction FallacyEmpiricismHeuristics & BiasesPractice & Philosophy of Science

Personal Blog

58

Mentioned in

215Burdensome Details

66So You Think You're a Bayesian? The Natural Mode of Probabilistic Reasoning

62How You Make Judgments: The Elephant and its Rider

57About Less Wrong

9Summarizing the Sequences Proposal

Load More (5/7)

New Comment

25 comments, sorted by

oldest

Click to highlight new comments since: Today at 11:30 AM

[+]anon517y-60

[-]Eliezer Yudkowsky17y200

Science is smarter than scientists. That's why if an experiment is both important, and still accepted after 20 years, you should suspect there's followup experiments shoring it up. (If not, there's something wrong, bigtime, with that whole field.)

[+]Brian17y-60

[-]Felix217y50

Arrrr. Shiver me timbers. I shore be curious what the rank be of "Linda is active in the feminist movement and is a bank teller" would be, seeing as how its meanin' is so far diff'rent from the larboard one aloft.

A tip 'o the cap to the swabbies what found a more accurate definition of "probability" (I be meanin' "representation".) than what logicians assert the meaning o' "probability" be. Does that mean, at a score of one to zero, all psychologists are better lexicographers than all logicians?

[-]Raemon12y00

I read this comment, predicted the day it was posted, then looked up at the date. I was off by one.

[-]J_Thomas17y80

....people may try to dismiss (not defy) the experimental data. Most commonly, by questioning whether the subjects interpreted the experimental instructions in some unexpected fashion - perhaps they misunderstood what you meant by "more probable".

Which in fact turned out to be the case.

This was done - see Kahneman and Frederick (2002) - and the correlation between representativeness and probability was nearly perfect. 0.99, in fact.

So there's no reason to look for other interpretations about what people meant by "more probable". Anything else they might mean will correlate 0.99 with this, operationally it will be almost the same thing.

So this is what the public means by "more probable". And it's often what people mean in practice by "more probable" even when they've had training in probability theory and statistics.

"An additional group of 24 physicians, mostly residents at Stanford Hospital, participated in a group discussion in which they were confronted with their conjunction fallacies in the same questionnaire. The respondents did not defend their answers, although some references were made to 'the nature of clinical experience.' Most participants appeared surprised and dismayed to have made an elementary error of reasoning."

They interpreted the question the standard way, and then later they remembered they were supposed to use probability.

Probability theory is still new. Most of it is newer than calculus. People consistently make gambling mistakes like the Monty Hall problem. The general belief is that you have to be real smart to get those things right. Just like you have to be real smart to learn calculus.

Is the word "representativeness" standard jargon? It's such an ugly word, but if it's well-established we can't replace it with a better one.

So. People interpret "more probable" as "less surprising". And most of the population does it, enough that this can be exploited reliably. This is potentially a very very profitable discussion.

[-]Senthil17y00

The experiment was bad and felt that way since I first came across the same. But I didn't have any idea that it's the single most questioned experiment or something like that. Recently, I skimmed a book called 'Gut feelings' by Gerd something (not sure about the second name) at the bookshop. There was a blurb by Steven Pinker saying that the book was good. A chapter had a good description on where this experiment is mistaken and said how reframing it in different words made people give the correct answer. What Kahneman says here is a fallacy which people may be encounter in some circumstances. That point can be taken but we need to imagine a good experiment which reflects the point well.

[-]Bob317y00

Interesting that Senthil brings up this book (http://www.amazon.com/Gut-Feelings-Intelligence-Gerd-Gigerenzer/dp/0670038636) because Eliezer's recent posts have gotten me thinking whether there are any good rules for when to trust/doubt intuition. Much of the discussion about bias suggests we should question, and often reject, our instincts but at some point this hits diminishing returns and in others (e.g., the planning fallacy) could be a mistake. If Eliezer or any other contributors has thoughts about good rules of thumb for when it's reasonable (safe? better?) to use rules of thumb, I would be eager to hear them.

[-]J_Thomas17y20

The experiment was bad and felt that way since I first came across the same.

What's bad about it? It looks like it gets reproducible results.

It looks to me like people often read "probable" to mean "plausible".

And we know they do that in actual betting situations like the Monty Haul problem.

People can be trained to think in terms of probability, but the general culture trains them to think otherwise. Perhaps its our evolutionary background, or perhaps its our culture. But somehow the culture trains people to be stupid in this particular way.

So even if you aren't susceptible to that yourself, you need to deal with it carefully. On the defensive side, when you try to persuade people, don't ever depend on probability arguments that don't sound plausible because people won't believe them.

And on the offensive side, you can carefully use this generalised stupidity to exploit people, if you choose to.

[-]Senthil17y00

What's bad about it? It looks like it gets reproducible results.

Thomas, I'm not sure why it's bad. That's the problem. I was unable to put my finger on it. I don't remember what I answered. But even after the explanation was given, I felt that it wasn't quite convincing that this would be a problem. It sure is getting reproducible results. But what may be causing the result may not be the bias.

I think you've answered why it's bad when you said that it's because of our culture, and particularly the way we use the word 'probable' to mean something than what the dictionary says it should. If that's the case, it doesn't throw any light on the conjunction fallacy per se. Even if the problem is framed differently, people should fall for the fallacy. But if you're able to check the section in 'Gut Feelings', you'll see that most people would answer it perfectly well. There would be no fallacy involved.

Thanks, Bob, for the Amazon link.

[-]Eliezer Yudkowsky17y200

Senthil, Gigerenzer is one of the primary critics of this experiment, but my understanding is that in the larger field his critiques are widely considered to have been refuted by further experiments.

More than half the subjects still committed the conjunction fallacy when they were asked to bet. If people are betting on similarities instead of probabilities, if doctors are treating similarities instead of probabilities, that is not a "misunderstanding" that explains away the experimental result. It IS the representativeness heuristic and conjunction fallacy!

Also, the conjunction fallacy has been replicated in many different formats besides the Linda experiment, as discussed today and yesterday. Why are people just ignoring this? Do they feel that if they come up with some arguable critique of the Linda experiment, that excuses them from any further work? That they don't have to explain all the other experiments?

I'm starting to get that feeling of frustration again. It doesn't excuse the subjects if they "misinterpreted" the experimental instructions, because they are misinterpreting real life the same way. More than half of them bet on the conjunction fallacy. Understanding exactly how someone makes a mistake does not mean it is not a mistake. They still lose the bet. The patient still dies. Am I making sense here?

[-]Ed217y00

I certainly don't have any problem with the experiments and as important as the conjunction bias is, I think its just as important to ask and assess why this bias exists. Once we can nail that down, it becomes easier to teach yourself and others how not to fall into that bias trap.

So the sympathy towards the subjects is part of the explanation. Same with the commments discussing the use of language and framing of the questions.

My opinion on this is that the reason the poor reasoning occurs is simply because we are comforted by one of the fitted answers sitting in conjunction with one of the unfit answers rather than an unfit answer by itself.

There may not be an easy way to teach this to a layman or ourselves so we see the correct reasoning easily but its a start.

[-]Keith_Elis17y40

Do people commit the conjunction fallacy even after having been warned of the conjunction fallacy?

[-]Eliezer Yudkowsky17y50

Keith, of course they do. The smart ones won't commit it deliberately.

I'm sure I do it accidentally. It's hard to debias this one.

[-]Senthil17y10

Eliezer, thanks for the explanation. I'm sorry that you're getting frustrated to explain this again. I agree with you and understand what you're trying to explain. It makes perfect sense. But it's difficult to make the explanation clear and easy for lay people. Also, maybe I didn't make myself clear in the above post.

I was referring only to the particular experiment. I'm not at all denying that the fallacy exists. I meant that the fallacy doesn't exist in the context of that experiment alone. I just felt that there could be a better thought out experiment demonstrate it.

I can compare this with reading a good detective story and a bad one. A good one is where you were shown the evidence and you could've predicted the murderer but didn't. A bad one introduces a character relatively late in the story or make one who wasn't talked about much the murderer. I feel the experiments demonstrating the fallacy to be similar to the latter type of stories, kind of contrived and unnatural.

[-]J_Thomas17y30

What would be the opposite mistake?

I came close to committing it. When I guessed the order I ignored the description almost completely. I first estimated how many school teachers there were compared to bank tellers. I thought there were mor school teachers. And how many psychiatric social workers? More bank tellers. Lots of insurance salesmen, more than bank tellers. And so on. I pretty much ignored Linda's description and just looked at my guesses about the numbers of people. Lisa could have fallen into any of those categories entirely apart from those little scraps of information about her.

I did it wrong -- I didn't consider that more women than men tend to be schoolteachers, feminists, and members of the League of Women Voters and bank tellers, but not insurance salesmen. But my guesses about how many there were of each type were probably way off anyway.

I remembered from a college summer job -- when people know a few random facts about somebody else they tend to put too much emphasis on those facts. Like, if you are asked to guess whether somebody is going to be a suicide bomber and what's important is being right, then the answer is almost always no. Hardly any arabs are suicide bombers. Hardly any Wahabi arabs are suicide bombers. Hardly any young male wahabi arabs whose girlfriends have jilted them are suicide bombers. The way to bet is almost always no.

But by completely ignoring the information about Linda instead of ignoring the (guessed at) statistics about numbers of each category, haven't I made the opposite mistake? There ought to be some best amount of weight to give that information. I assumed none of it was worth anything, and actually I even ignored that Linda was a woman though I obviously shouldn't have. To do it right, wouldn't you know the actual relative numbers of each category (instead of guessing) and also know how much weight to put on the individual information about Linda?

If you knew everything you could answer the question without bias.

[-]bigjeff513y10

The key mistake was not the probability numbers (though that certainly could be a mistake in real life), it was ranking bank-teller/feminist higher than bank-teller.

I think the point to bear in mind on this is that any time you add two criteria together the probabilities plummet.

When you did it yourself, you should have evaluated the bank teller and feminist part of the BT/F question separately (however you chose to evaluate it), and then examined the likelihood that both would be true. That way you should clearly see that the combination could not possibly have higher probabilities than either individual criteria.

It's certainly a hard thing to do, I'm going to have to look out for this one because I'd wager I do it a ton.

[-]komponisto13y10

What would be the opposite mistake?

Being confused by the Gettier problem.

[-]Eliezer Yudkowsky17y00

J Thomas, I would guess that in real life your method would work well in this specific case, because I'm guessing the prior odds are more extreme than the likelihoods. But you're correct that, in general, it is equally fallacious to ignore evidence as to ignore priors. :)

[-]J_Thomas17y00

So how do we correctly blend our knowledge of comparative numbers versus the way specific circumstances bias the odds?

Mix in too much of either sort of knowledge and you have a bias. But how can you know how much of each to include? Usually you'd have to guess.

[-]MoreOn13y20

Maybe the experimenters missed <yet another brilliant idea proven wrong in the last century>? Just kidding. What I ask instead is, Do people ever not suffer from conjunction bias?

I read about this experiment a couple years ago, about logic and intuition. (I’m writing from memory here, so it’s likely I screwed something up). People were given logical rules, and asked to find violations. Something like:

(Rule) If you are under 21, you can’t buy alcohol.

Bob is 24 and buys alcohol. (That’s not a violation)

Tom is 18 and buys alcohol. (Most people spotted this violation).

(Rule) If you go to France, you can’t drive a car.

Bob goes to France and takes a subway. (Not a violation).

Tom goes to France and drives. (Fewer people spotted this violation).

Of course it wasn’t easy like here, with a rule and a violation right next to each other. The rules were phrased more cleverly, too.

Anyway, people were better at logic when the situation was more intuitive. I wonder if any experiments have been done in which (untrained) people demonstrated a likewise absence of conjunction bias?

Maybe something like below would work, when you’re pointing out that (T) and (F) are occurring together.

Linda is 31 years old…

Please rank…

(F) Linda is active in the feminist movement.

(T) Linda is a bank teller.

(L) Both (T) and (F)

And if that doesn’t work… well, maybe better minds than mine had ALREADY done an experiment. Any suggestions for further reading, anyone? Summaries greatly appreciated.

[-]Sniffnoy13y80

The experiment (actually many experiments) you're thinking of is the Wason selection task, btw.

[-]Said Achmiz5y190

Link to Tversky and Kahneman (1983) is broken at this time. Here are two currently working links:

[-]Said Achmiz5y170

Link to Kahneman and Frederick (2002) is broken at this time. Here is a currently working link:

Representativeness revisited: Attribute substitution in intuitive judgment

[-]Sunny from QAD5y30

I calibrated my "strong upvote" on this post. It sounds silly now, but

Scientists can also be unstupid. Someone else has already thought of your alternative interpretation.

was a revelation for me.

Moderation Log