I've been reading through the sequences, and am currently working through the Intro to Bayes' Theorem (by the fact that I'm reading the Intro to Bayes (finally), you can tell that I'm pretty early in the process). It's been quite thought provoking. I'm finally getting questions right more reliably, and wanted to share one of the visualization tools that helped me, at least. There are many "applets" strewn about, written in Java, that help one to visualize what the various probability components are doing. In the mammography example, at least, an the idea of a sieve popped into my head as a neat way to think about what the test is doing.
I'm planning to take fairly extensive notes (more about that in a soon-to-come post), but thought I'd share a little "re-write" of that problem with a graphic in case it's of any use, and also in case I've blundered in my understanding. Re-writing things in my own words helps make them my own -- I realize that this is probably going to come across as really, really, incredibly, simplistic, but it's where I'm at!
In case it's not intuitive... it's supposed to show 100% of women broken into their measured partitions of 1% with cancer and 99% without. Those respective groups are then "sifted," and the known reliability of the sieve for each of those groups is used to determine p(cancer|test+).
I'm open to aesthetic critiques as well -- I enjoy making things like this and knowing how intuitive it is to look at is helpful. It didn't turn out how my mind visualized it, but I figured it was decent enough for a start.
This was made using emacs org-mode, LaTeX, and TikZ.
Update: per some comments, I tried to make things more clear in a redo. The original picture shown is HERE.
----- Click for bigger picture or download -----
I got to shadow some breast physicians last month, and although it's sort of off topic I think I gained some insight as to why so many doctors get this question wrong.
Which is because it's very different from any situation they ever come across in clinical practice. Guidelines are to screen people with mammography and examination; anyone who comes up as suspicious on those two tests then gets a biopsy. No one gets diagnosed with breast cancer from a mammogram alone, the progression from mammogram on to the next step is hard-coded into a pre-determined algorithm, and so the question of "This woman got a positive on the mammogram; does she have cancer?" never comes up. A question that does come up a lot is a woman panicking because she got a positive mammogram and demanding to know whether she has breast cancer, and the inevitable answer is "We'll need to do more tests, but don't worry too much yet because most of these things are false positives."
So the doctors involved know that most real mammogram results are false positives, they know how to diagnose breast cancer based on the combination of tests they actually do, they just can't do Bayesian math problems when given probabilities. This is kind of interesting if you're curious about their intelligence but as far as I know doesn't really affect clinical care.
As far as the take-home practical message goes, on my reading it was never about how well doctors could "diagnose cancer" per se based on mammogram results--rather, the reason we ask about P(cancer | positive) is because it ought to inform our decision about whether a biopsy is really warranted. If a healthy young woman from a population with an exceedingly low base rate for breast cancer has a positive mammogram, the prior probability of her having cancer may still be low enough that there might actually be negative expected value in following u... (read more)