What I'm trying to figure out is, how to I determine whether a source I'm looking at is telling the truth? For an example, let's take this page from Metamed: http://www.metamed.com/vital-facts-and-statistics
At first glance, I see some obvious things I ought to consider. It often gives numbers for how many die in hospitals/year, but for my purposes I ought to interpret it in light of how many hospitals are in the US, as well as how many patients are in each hospital. I also notice that as they are trying to promote their site, they probably selected the data that would best serve that purpose.
So where do I go from here? Evaluating each source they reference seems like a waste of time. I do not think it would be wrong to trust that they are not actively lying to me. But how do I move from here to an accurate picture of general doctor competence?
Thanks for the feedback. Maybe I can better understand how what's blindingly obvious to me doesn't jump out at everyone else.
I don't know how or when to use a chi-squared test. What I did was assume - for the sake of checking my intuition - that the two sets of frequencies were indeed not made up.
To work out probabilities, you need to have some kind of model. I decided to use the simplest sampling model I could think of, where in both cases any given IT project has independently a fixed probability of turning out in one of the categories A, B, C, D, E.
The 1995 "study" has a sample size of $37Bn - this in fact turns out to match estimates of the entire DoD spend on IT projects in that year. So if these numbers are correct, then the frequencies must be precisely the probabilities for any given project to fall into the buckets A, B, C, D or E.
What I did next was work out some reasonable assumptions for the 1979 set of frequencies. It is drawn from a sample of 9 projects totaling $6.8M, so the mean project cost in the sample is $755K, and knowing a few other facts we can compute a lower bound for the standard deviation of the sample.
Given a mean, a standard deviation, and the assumption that costs are normally distributed in the population, we can approach by simulation an answer to the question "how likely is our assumption that both sets of frequencies are not made up and just happen to be within 1% of each other by chance, given the respective size of the samples".
The frequencies are given in terms of the categories as a proportion of the total cost. I wrote a Python program to repeatedly draw a sample of 9 projects from a population assumed to have the above mean cost and standard deviation, compute the relative proportions of the 5 categories, and return a true result if they were within 1% of the population probabilities.
Run this program passing the number of simulation runs as an argument. You can verify that the likelihood of reproducing the same set of frequencies within 1%, assuming that this happens by chance, is vanishingly small.
So, this "experiment" rejects the null hypothesis that the apparent match in both sets of frequencies is due to chance, as opposed to something else like one of them being made up.
(EDIT - removed the code I originally posted in this comment, a better version appears here.)
It's the usual go-to frequentist test for comparing two sets of categorical data. You say you have 4 categories with 10/4/9/3 members and you have your null hypothesis and you're interested in how often, assuming the null, results as extreme or more extreme than your new data of 200/80/150/20 would appear. Like rolling a biased 4-sided dice.
(If you're curious, that specific made up e... (read more)