I got to shadow some breast physicians last month, and although it's sort of off topic I think I gained some insight as to why so many doctors get this question wrong.
Which is because it's very different from any situation they ever come across in clinical practice. Guidelines are to screen people with mammography and examination; anyone who comes up as suspicious on those two tests then gets a biopsy. No one gets diagnosed with breast cancer from a mammogram alone, the progression from mammogram on to the next step is hard-coded into a pre-determined algorithm, and so the question of "This woman got a positive on the mammogram; does she have cancer?" never comes up. A question that does come up a lot is a woman panicking because she got a positive mammogram and demanding to know whether she has breast cancer, and the inevitable answer is "We'll need to do more tests, but don't worry too much yet because most of these things are false positives."
So the doctors involved know that most real mammogram results are false positives, they know how to diagnose breast cancer based on the combination of tests they actually do, they just can't do Bayesian math problems when given probabilities. This is kind of interesting if you're curious about their intelligence but as far as I know doesn't really affect clinical care.
As far as the take-home practical message goes, on my reading it was never about how well doctors could "diagnose cancer" per se based on mammogram results--rather, the reason we ask about P(cancer | positive) is because it ought to inform our decision about whether a biopsy is really warranted. If a healthy young woman from a population with an exceedingly low base rate for breast cancer has a positive mammogram, the prior probability of her having cancer may still be low enough that there might actually be negative expected value in following up with a biopsy; after all, let's not forgot that a biopsy is not a trivial procedure and things do sometimes go wrong.
So I think this actually does have some implication for real-world clinical care: we ought to question whether it is wise to automatically follow up all positive mammograms with biopsies. Maybe it is, and maybe it isn't, but I don't think we should take the question for granted as appears to be the case.
If a biopsy is the next step in diagnosing breast cancer after a positive mammogram, then we shouldn't perform mammograms on anyone it still wouldn't be worth biopsying should their mammogram turn up positive.
Yes, that's exactly right.
And although I'm having a hard time finding a news article to verify this, someone informed me that the official breast cancer screening recommendations in the US (or was it a particular state, perhaps California?) were recently modified such that it is now not recommended that women younger than 40 (50?) receive regular screening. The young woman who informed me of this change in policy was quite upset about it. It didn't make any sense to her. I tried to explain to her how it actually made good sense when you think about it in terms of base rates and expected values, but of course, it was no use.
But to return to the issue clinical implications, yes: if a woman belongs to a population where the result of a mammogram would not change our decision about whether a biopsy is necessary, then probably she shouldn't have the mammogram. I suspect that this line of reasoning would sound quite foreign to most practicing doctors.
Now that's some interesting back story. I could see that if one knew that the route from positive test through to establishing conclusively if cancer is present was a fluid path, one might not be overly concerned with the test result itself.
As to this specific problem, I just used what EY used; perhaps there are more applicable/pertinent statistics problems that could be used.
I've been trying to think of other visualization tools that might be more universal or intuitive. I get the sliding java applets, but think if one can tie what they're showing to a real world tool or process of some sort, it will help. What these are doing is no different than his. The "top bar" are the two original spheres. The "bottom bar" is the reduced amount of each sphere (0.8 x 0.01 and 0.096 x 0.99) that remains after sifting.
Just a different way to look at it.
Thanks for the post, but I think this Venn Diagram style visualization of the problem is way more intuitive.
Absolutely, and thanks for the link. That was fantastic. Using a Venn diagram occurred to me, but I hadn't worked out exactly how to use it.
The one thing I will say about this is that I like the idea that using something (e.g. a test) tangible on the individuals or objects contributes to the idea that the new evidence is adding something to the pre-existing knowledge. The Venn diagram shows quite clearly what the final result was, but I like the idea that you're gathering more information, which contributes to the body of knowledge, and that this is what alone allows you to update/slide your estimate.
In the Venn diagram, they just start drawing circles and that's that.
In this one, it's a little more clear (at least to me) that the test is doing different things depending on who is being tested and that this is what creates the end inequality between cancer and non-cancer positive tests.
Does that make sense?
Thanks again for sharing. Honestly, I don't see these visualizations as one size fits all; the more who contribute or popularize the various tools out there, the better everyone is!
I really like this sieve approach. I feel a big improvement would be to show the output of the sieve as two boxes (red and blue) as well to help emphasize visually just how many false+ pass through and the relative size of false+ to all that pass through.
Check the update. I'm not quite sure how to describe visually what's "left" in the sieve. I don't want to show both test+ and false+ as outputs, exactly, because a sieve is suppose to keep some stuff back while letting other stuff through. But, I think the idea of the circles above makes things more clear than the bars in terms of what's going on as well as proportionality.
I do cover the equivalent of fasle+ verbally, but it would be nice to make it visually. I'll keep thinking about this. Part of it is that I was trying to set things up onto just one page. If I ditch that (it's already on two now), I could maybe spread things out even more and show what's left in the sieves for each group.
Thanks for the suggestion.
Hmm, this seems like a good approach, but I don't know that I'd understand the graphic if I didn't already know what it meant.
Yeah, I can definitely see that. I thought about two different sieves and more of a "splitting' of the groups, but tried to illustrate that the same test is being used on everyone, not different tests depending on whether you have cancer or not.
Maybe "l'll try that version and see if it's clearer.
I've been reading through the sequences, and am currently working through the Intro to Bayes' Theorem (by the fact that I'm reading the Intro to Bayes (finally), you can tell that I'm pretty early in the process). It's been quite thought provoking. I'm finally getting questions right more reliably, and wanted to share one of the visualization tools that helped me, at least. There are many "applets" strewn about, written in Java, that help one to visualize what the various probability components are doing. In the mammography example, at least, an the idea of a sieve popped into my head as a neat way to think about what the test is doing.
I'm planning to take fairly extensive notes (more about that in a soon-to-come post), but thought I'd share a little "re-write" of that problem with a graphic in case it's of any use, and also in case I've blundered in my understanding. Re-writing things in my own words helps make them my own -- I realize that this is probably going to come across as really, really, incredibly, simplistic, but it's where I'm at!
In case it's not intuitive... it's supposed to show 100% of women broken into their measured partitions of 1% with cancer and 99% without. Those respective groups are then "sifted," and the known reliability of the sieve for each of those groups is used to determine p(cancer|test+).
I'm open to aesthetic critiques as well -- I enjoy making things like this and knowing how intuitive it is to look at is helpful. It didn't turn out how my mind visualized it, but I figured it was decent enough for a start.
This was made using emacs org-mode, LaTeX, and TikZ.
Update: per some comments, I tried to make things more clear in a redo. The original picture shown is HERE.
----- Click for bigger picture or download -----