What I'm trying to figure out is, how to I determine whether a source I'm looking at is telling the truth? For an example, let's take this page from Metamed: http://www.metamed.com/vital-facts-and-statistics
At first glance, I see some obvious things I ought to consider. It often gives numbers for how many die in hospitals/year, but for my purposes I ought to interpret it in light of how many hospitals are in the US, as well as how many patients are in each hospital. I also notice that as they are trying to promote their site, they probably selected the data that would best serve that purpose.
So where do I go from here? Evaluating each source they reference seems like a waste of time. I do not think it would be wrong to trust that they are not actively lying to me. But how do I move from here to an accurate picture of general doctor competence?
Well, my naive first thought was to abuse the opencyc engine for a while so it starts getting good rough guesses of which particular mathematical concepts and quantities and sets are being referred to in a given sentence, and plug it either directly or by mass download and conversion into various data sources like WolframAlpha or international health / crime / population / economics databases or various government services.
But that still means doing math (doing math with linguistics) tons and tons of programming to even get a working prototype that understands "30% of americans are older than 30 years old", way more work than I care to visualize just to get the system to not explode and respond in a sane manner when you throw at it something incongruent ("30 of americans are 30% years old" should not make the system choke, for example), etc. And then you've got to build something usable around that, interfaces, ways to extract and store data, and then probably pack everything together. And once you're there, you probably want to turn it into a product and sell it, since you might as well cash in some money on all of this work. Then more work.
The whole prospect looks like a small asteroid rather than a mountain, from where I'm sitting. I am not in the business of climbing, mining, deconstructing and exporting small asteroids. I'll stick to climbing over mountains until I have a working asteroid-to-computronium converter.
My suggestion would be to go via some sort of meta-analysis or meta-meta-analysis (yes, that's a thing); if you have, for example, a meta-analysis of all results in a particular field and how often they replicate, you can infer pretty accurately how well a new result in that field will replicate. (An example use: 'So 90% of all the previous results with this sample size or smaller failed to replicate? Welp, time to ignore this new result until it does replicate.')
It would of course be a ton of work to compile them all, and then any new result you were inte... (read more)