What I'm trying to figure out is, how to I determine whether a source I'm looking at is telling the truth? For an example, let's take this page from Metamed: http://www.metamed.com/vital-facts-and-statistics
At first glance, I see some obvious things I ought to consider. It often gives numbers for how many die in hospitals/year, but for my purposes I ought to interpret it in light of how many hospitals are in the US, as well as how many patients are in each hospital. I also notice that as they are trying to promote their site, they probably selected the data that would best serve that purpose.
So where do I go from here? Evaluating each source they reference seems like a waste of time. I do not think it would be wrong to trust that they are not actively lying to me. But how do I move from here to an accurate picture of general doctor competence?
Do you disagree that the presence in a small sample of two instances of very rare species constitutes strong prima facie evidence against the "coincidence" hypothesis?
I don't know what you mean by the above, despite doing my best to understand. My intuition is that "the most likely outcome" is one in which our 9-project sample will contain no project in either of the "very rare" categories, or at best will have a project in one of them. (If you deal me nine poker hands, I do not expect to see three-of-a-kind in two of them.)
I didn't understand your earlier example using chi-squared, which is what I take you to mean by "already pointed out". You made up some data, and "proved" that chi-squared failed to reject the null when you asked it about the made-up data. You assumed a sample size of 100, when the implausibility of the coincidence hypothesis comes precisely from the much smaller sample size (plus the existence of "rare" categories and the overall number of categories).
I'm experiencing it as the opposite - I already have plenty of reasons to conclude that the 1995 data set doesn't exist, I'm trying to give it the maximum benefit of doubt by assuming that it does exist and evaluating its fit with the 1979 data purely on probabilistic merits.
(ETA: what I'm saying is, forget the simulation, on which I'm willing to cop to charges of "intellectual masturbation". Instead, focus on the basic intuition. If I'm wrong about that, then I'm wrong enough that I'm looking forward to having learned something important.)
(ETA2: the fine print on the chi-square test reads "for the chi-square approximation to be valid, the expected frequency should be at least 5" - so in this case the test may not apply.)
Why is coincidence a live hypothesis here? Surely we might expect there to be some connection - the numbers are ostensibly about the same government in the same country in different time periods. Another example of what I mean by you are making a ton of assumptions and you have not defined what parameters or distributions or sets of models you are working with. This is simply not... (read more)