What I'm trying to figure out is, how to I determine whether a source I'm looking at is telling the truth? For an example, let's take this page from Metamed: http://www.metamed.com/vital-facts-and-statistics
At first glance, I see some obvious things I ought to consider. It often gives numbers for how many die in hospitals/year, but for my purposes I ought to interpret it in light of how many hospitals are in the US, as well as how many patients are in each hospital. I also notice that as they are trying to promote their site, they probably selected the data that would best serve that purpose.
So where do I go from here? Evaluating each source they reference seems like a waste of time. I do not think it would be wrong to trust that they are not actively lying to me. But how do I move from here to an accurate picture of general doctor competence?
Nigh-magical is the word indeed. I just realized that if my insane idea in the grandparent were made to work, it could be unleashed upon all research publications ever everywhere for mining data, figures, estimates, etc., and then output a giant belief network of "this is collective-human-science's current best guess for fact / figure / value / statistic X".
That does not sound like something that could be achieved by a developer less than google-sized. It also fails all of my incredulity and sanity checks.
(it also sounds like an awesome startup idea, whatever that means)
Or IBM-sized. But if you confined your ambitions to analyzing just meta-analyses, it would be much more doable. The narrower the domain, the better AI/NLP works, remember. There's some remarkable examples of what you can do in machine-reading a narrow domain and extracting meaningful scientific data; one of them is ChemicalTagger (demo), reading chemistry papers describing synthesis processes and extracting the process (although it has serious problems getting papers to use). I bet you could get a lot out of reading meta-analyses - there's a good summary j... (read more)