This is probably going to sound utterly ridiculous, but I have a sad confession.
I've read Yudkowsky's post on Bayes' Theorem (http://yudkowsky.net/rational/bayes) five times. I've written down the equation. Tried to formulate an answer.
I still don't understand it. That being said, I've lived my entire life under the false mentality that maths is boring and painful, and it's just recently I've tried to actually understand the concepts I learn in school, and not just temporarily memorize them for the next exam.
Here's the problem, on Yudkowsky's post:
"1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?"
When Eliezer changes the percentages to real numbers:
"100 out of 10,000 women at age forty who participate in routine screening have breast cancer. 80 of every 100 women with breast cancer will get a positive mammography. 950 out of 9,900 women without breast cancer will also get a positive mammography. If 10,000 women in this age group undergo a routine screening, about what fraction of women with positive mammographies will actually have breast cancer?"
When I see this equation, I can properly make the answer come out to 7.8 percent. I do this, by taking the 80 women, and dividing 80 women by the 80 women plus the 950 women, so 80/80+950 (or 80/1030=.078). So I get 7.8%, which should be the right answer.
But when I try to do the same with percentages, it all gets sort of screwy. I take the 80 percent of women (.8) divided by that same 80 percent (.8) plus 9.5 percent of women without cancer who test postive for it (.095). So I get .8/.8+.095=89%.
I feel like I'm making a really, really stupid error. But I just don't know what it is. >_>
You're forgetting the "base rate" in your calculation: the actual rate of cancer in the population. What you should really be taking the ratio of is (the fraction of all women that have cancer and test positive) / (the fraction of all women that test positive, whether or not they have cancer). In percentages, that's
(80% of the 1% of women who have cancer, who correctly test positive) = 0.8 * 0.01.
divided by
(80% of the 1% of women who have cancer, who correctly test positive) together with (9.6% of the 99% of women who don't have cancer, who test positive anyway) = 0.8 0.01 + 0.096 0.99.
So the ratio is (0.8 0.01) / (0.8 0.01 + 0.096 * 0.99), and that does equal 0.078.
Thanks. I'm pretty sure I understand now. Although I'm not sure why I get the correct answer when I'm working with the actual numbers and not percentages when I do the math wrong.
But when I do the math like you wrote, I get the right answer for the precentages. So I get that part. But aren't I ignoring the base rate in the actual numbers one? Or no?