What I'm trying to figure out is, how to I determine whether a source I'm looking at is telling the truth? For an example, let's take this page from Metamed: http://www.metamed.com/vital-facts-and-statistics
At first glance, I see some obvious things I ought to consider. It often gives numbers for how many die in hospitals/year, but for my purposes I ought to interpret it in light of how many hospitals are in the US, as well as how many patients are in each hospital. I also notice that as they are trying to promote their site, they probably selected the data that would best serve that purpose.
So where do I go from here? Evaluating each source they reference seems like a waste of time. I do not think it would be wrong to trust that they are not actively lying to me. But how do I move from here to an accurate picture of general doctor competence?
It's the usual go-to frequentist test for comparing two sets of categorical data. You say you have 4 categories with 10/4/9/3 members and you have your null hypothesis and you're interested in how often, assuming the null, results as extreme or more extreme than your new data of 200/80/150/20 would appear. Like rolling a biased 4-sided dice.
(If you're curious, that specific made up example would be
chisq.test(matrix(c(10,4,9,3,200,80,150,20), ncol = 2),)
with a p-value of 0.4.)This seems like a really weird procedure. You should be looking at the frequencies of each of the 4 categories, not messing around with means and standard deviations. (I mean heck, just what about 2 decades of inflation or military growth or cutbacks?) What, you think that the 1995 data implies that the Pentagon had $37bn/$755K=49006 different projects?
I don't know Python or NumPY and your formatting is messed up, so I'm not sure what exactly you're doing. (One nice thing about using precanned routines like R's
chisq.test
: at least it's relatively clear what you're doing.)Looking closer, I'm not sure this data makes sense. 0.02 9 is... 0.18. Not a whole number. 47% 9 is 4.23. Also not a positive integer or zero. 0.29 * 9 is 2.61.
Sure, the percentages do sum to 100%, but D and E aren't even possible: 1/9 = 11%!
Basically, that's you saying exactly what is making me say "the coincidence is implausible". A sample of 9 will generally not contain an instance of something that comes up 2% of the time. Even more seldom will it contain that and an instance of something that comes up 3% of the time.
So, in spite of appearances, it seems as if our respective intuitions agree on something. Which makes me even more curious as to which of us is having a clack and where.