Does anybody know where to find a large database of statements that are roughly 50% likely to be true or false? These would be used for confidence calibration / Bayesian updating exercises for CMR/HRP.
One way to make such a database would be to buy a bunch of trivia games with True/False questions, and type each statement and its negation into a computer. A problem with this might be that trivia questions are selected to have surprising/counterintuitive truth values; I'm not sure if that's true. I'd be happy to acquire an already-made database of this form, but ideally I'd like statements that are "more neutral" in terms of how counterintuitive they are.
Any thoughts on where we might find a database like this to use/buy?
Thanks for any help!
Revision: We actually want a database of two-choice answer questions. This way, the player won't get trained on a base rate of 50% of statements in the world being true... they'll just get trained that when there are two possible answers, one is always true. In the end, the database should look something like this (warning: I made up the "correct" answers):
Question: "Which is diagnosed more often in America (2011)?";
Answers: (a) "the cold", (b) allergies";
Correct Answer: (a);
Tags: {medical}
Question: "Which city has a higher average altitude?";
Answers: (a) "Chicago", (b) "Las Vegas";
Correct Answer: (a)
Tags: {geography}
Question: "Who sold more albums while living"?;
Answers: (a) "Michael Jackson", (b) "Elvis Presley";
Correct Answer: (b)
Tags: {history, pop-culture, music}
Question: "Was the price of IBM stock higher or lower at the start of the month after the Berlin wall fell, compared with the start of the previous month?";
Answers: (a) "higher", (b) "lower";
Correct Answer: (a)
Tags: {history, finance}
statements that are ~50% true... this is actually pretty hard, mine some dataset for statistical info?
generally, I would look into RDF, (protege and topbraid composer free will let you poke around for free without knowing the data format)
US 2000 Census in RDF
Freebase has all manner of data in RDF
http://aws.amazon.com/publicdatasets/ public data sets, not all in RDF but "it's more important that the data have structure" and all that
cancer stats