Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Calibration Test with database of 150,000+ questions

37 Nanashi 14 March 2015 11:22AM

Hi all, 

I put this calibration test together this morning. It pulls from a trivia API of over 150,000 questions so you should be able to take this many, many times before you start seeing repeats.

http://www.2pih.com/caltest.php

A few notes:

1. The questions are "Jeopardy" style questions so the wording may be strange, and some of them might be impossible to answer without further context. On these just assign 0% confidence.

2. As the questions are open-ended, there is no answer-checking mechanism. You have to be honest with yourself as to whether or not you got the right answer. Because what would be the point of cheating at a calibration test?

I can't think of anything else. Please let me know if there are any features you would want to see added, or if there are any bugs, issues, etc. 

 

**EDIT**

As per suggestion I have moved this to the main section. Here are the changes I'll be making soon:

  • Label the axes and include an explanation of calibration curves.
  • Make it so you can reverse your last selection in the event of a misclick.

Here are changes I'll make eventually:

  • Create an account system so you can store your results online.
  • Move trivia DB over to my own server to allow for flagging of bad/unanswerable questions.

 

Here are the changes that are done:

  • Change 0% to 0.1% and 99% to 99.9%  
  • Added second graph which shows the frequency of your confidence selections. 
  • Color code the "right" and "wrong" buttons and make them farther apart to prevent misclicks.
  • Store your results locally so that you can see your calibration over time.
  • Check to see if a question is blank and skip if so.

Overconfident Pessimism

25 lukeprog 24 November 2012 12:47AM

You can build a machine to draw [deductive] conclusions for you, but I think you can never build a machine that will draw [probabilistic] inferences.

George Polya, 34 years before Pearl (1988) launched the probabilistic revolution in AI

The energy produced by the breaking down of the atom is a very poor kind of thing. Anyone who expects a source of power from the transformation of these atoms is talking moonshine.

Ernest Rutherford in 1933, 18 years before the first nuclear reactor went online

I confess that in 1901 I said to my brother Orville that man would not fly for fifty years. Two years later we ourselves made flights. This demonstration of my impotence as a prophet gave me such a shock that ever since I have distrusted myself...

Wilbur Wright, in a 1908 speech

 

Startling insights are hard to predict.1 Polya and Rutherford couldn't have predicted when computational probabilistic reasoning and nuclear power would arrive. Their training in scientific skepticism probably prevented them from making confident predictions about what would be developed in the next few decades.

What's odd, then, is that their scientific skepticism didn't prevent them from making confident predictions about what wouldn't be developed in the next few decades.

I am blessed to occasionally chat with some of the smartest scientists in the world, especially in computer science. They generally don't make confident predictions that certain specific, difficult, insight-based technologies will be developed soon. And yet, immediately after agreeing with me that "the future is very hard to predict," they will confidently state that a specific, difficult technology is more than 50 years away!

Error. Does not compute.

continue reading »

Amanda Knox: post mortem

23 gwern 20 October 2011 04:10PM

Continuing my interest in tracking real-world predictions, I notice that the recent acquittal of Knox & Sollecito offers an interesting opportunity - specifically, many LessWrongers gave probabilities for guilt back in 2009 in komponisto’s 2 articles:

Both were interesting exercises, and it’s time to do a followup. Specifically, there are at least 3 new pieces of evidence to consider:

  1. the failure of any damning or especially relevant evidence to surface in the ~2 years since (see also: the hope function)
  2. the independent experts’ report on the DNA evidence
  3. the freeing of Knox & Sollecito, and continued imprisonment of Rudy Guede (with reduced sentence)

Point 2 particularly struck me (the press attributes much of the acquittal to the expert report, an acquittal I had not expected to succeed), but other people may find the other 2 points or unmentioned news more weighty.

continue reading »

Calibrate your self-assessments

68 Yvain 09 October 2011 11:26PM

When I moved to Ireland, I knew that their school system, and in particular their examinations, would be different from the ones I was used to. I educated myself on them and by the time I took my first exam I thought I was reasonably prepared.

I walked out of my first examination almost certain I had failed. I remember emailing my parents, apologizing to them for my failure and promising I would do better when I repeated the class.

Then I got my results back, and learned I had passed with honors.

This situation repeated itself with depressing regularity over the next few semesters. Took exam, walked out in tears certain I had failed, made angsty complaints and apologies, got results back, celebrated. Eventually I decided that I might as well skip steps two to five and go straight to the celebrations.

This was harder than I expected. Just knowing that my feelings of abject failure usually ended out all right did not change those feelings of abject failure. I still walked out of each exam with the same gut certainty of disaster I had always had. What I did learn to do was ignore it: to force myself to walk home with a smile on my face and refuse to let myself dwell on the feelings of failure or take them seriously. And in this I was successful, and now the feelings of abject failure produce only a tiny twinge of stress.

In LW terminology, I am calibrating my self-assessment of examination success1.

continue reading »

1001 PredictionBook Nights

51 gwern 08 October 2011 04:04PM

I explain what I've learned from creating and judging thousands of predictions on personal and real-world matters: the challenges of maintenance, the limitations of prediction markets, the interesting applications to my other essays, skepticism about pundits and unreflective persons' opinions, my own biases like optimism & planning fallacy, 3 very useful heuristics/approaches, and the costs of these activities in general.

Plus an extremely geeky parody of Fate/Stay Night.

This essay exists as a large section of my page on predictions markets on gwern.net: http://www.gwern.net/Prediction%20markets#1001-predictionbook-nights

The Bias You Didn't Expect

92 Psychohistorian 14 April 2011 04:20PM

There are few places where society values rational, objective decision making as much as it values it in judges. While there is a rather cynical discipline called legal realism that says the law is really based on quirks of individual psychology, "what the judge had for breakfast," there's a broad social belief that the decision of judges are unbiased. And where they aren't unbiased, they're biased for Big, Important, Bad reasons, like racism or classism or politics.

It turns out that legal realism is totally wrong. It's not what the judge had for breakfast. It's how recently the judge had breakfast. A a new study (media coverage) on Israeli judges shows that, when making parole decisions, they grant about 65% after meal breaks, and almost all the way down to 0% right before breaks and at the end of the day (i.e. as far from the last break as possible). There's a relatively linear decline between the two points.

continue reading »

Techniques for probability estimates

58 Yvain 04 January 2011 11:38PM

Utility maximization often requires determining a probability of a particular statement being true. But humans are not utility maximizers and often refuse to give precise numerical probabilities. Nevertheless, their actions reflect a "hidden" probability. For example, even someone who refused to give a precise probability for Barack Obama's re-election would probably jump at the chance to take a bet in which ey lost $5 if Obama wasn't re-elected but won $5 million if he was; such decisions demand that the decider covertly be working off of at least a vague probability.

When untrained people try to translate vague feelings like "It seems Obama will probably be re-elected" into a precise numerical probability, they commonly fall into certain traps and pitfalls that make their probability estimates inaccurate. Calling a probability estimate "inaccurate" causes philosophical problems, but these problems can be resolved by remembering that probability is "subjectively objective" - that although a mind "hosts" a probability estimate, that mind does not arbitrarily determine the estimate, but rather calculates it according to mathematical laws from available evidence. These calculations require too much computational power to use outside the simplest hypothetical examples, but they provide a standard by which to judge real probability estimates. They also suggest tests by which one can judge probabilities as well-calibrated or poorly-calibrated: for example, a person who constantly assigns 90% confidence to eir guesses but only guesses the right answer half the time is poorly calibrated. So calling a probability estimate "accurate" or "inaccurate" has a real philosophical grounding.

There exist several techniques that help people translate vague feelings of probability into more accurate numerical estimates. Most of them translate probabilities from forms without immediate consequences (which the brain supposedly processes for signaling purposes) to forms with immediate consequences (which the brain supposedly processes while focusing on those consequences).

continue reading »

Bayesian Nights (Rationalist Story Time)

18 [deleted] 15 November 2010 02:20AM

Tell us a story. A tall tale for King Solamona, a yarn for the folk of Bensalem, a little nugget of wisdom, finely folded into a parable for the pages.

 

The game is simple:

  1. Choose a bias, a fallacy, some common error of thought.
  2. Write a short, hopefully entertaining narrative. Use the narrative to strengthen the reader against the errors you chose.
  3. Post your story in reply to this post.
  4. Give the authors positive and constructive feedback. Use rot13 if it seems appropriate.
  5. Post all discussion about this post in the designated post discussion thread, not under this top-level post.

 

This isn't a thread for developing new ideas. If you have a novel concept to explore, you should consider making a top-level post on LessWrong instead. This is for sharpening our wits against the mental perils we probably already agree exist. For practicing good thinking, for recognizing bad thinking, for fun! For sanity's sake, tell us a story.

Multiple Choice

10 Alicorn 17 May 2010 10:26PM

When we choose behavior, including verbal behavior, it's sometimes tempting to do what is most likely to be right without paying attention to how costly it is to be wrong in various ways or looking for a safer alternative.

If you've taken a lot of standardized tests, you know that some of them penalize guessing and some don't.  That is, leaving a question blank might be better than getting a wrong answer, or they might have the same result.  If they're the same, of course you guess, because it can't hurt and may help.  If they take off points for wrong answers, then there's some optimal threshold at which a well-calibrated test-taker will answer.  For instance, the ability to rule out one of four choices on a one-point question where a wrong answer costs a quarter point means that you should guess from the remaining three - the expected point value of this guess is positive.  If you can rule out one of four choices and a wrong answer costs half a point, leave it blank.

If you have ever asked a woman who wasn't pregnant when the baby was due, you might have noticed that life penalizes guessing.

If you're risk-neutral, you still can't just do whatever has the highest chance of being right; you must also consider the cost of being wrong.  You will probably win a bet that says a fair six-sided die will come up on a number greater than 2.  But you shouldn't buy this bet for a dollar if the payoff is only $1.10, even though that purchase can be summarized as "you will probably gain ten cents".  That bet is better than a similarly-priced, similarly-paid bet on the opposite outcome; but it's not good.

There's a few factors at work to make guessing tempting anyway:

continue reading »

Deception and Self-Doubt

8 Psychohistorian 11 March 2010 02:39AM

A little while ago, I argued with a friend of mine over the efficiency of the Chinese government. I admitted he was clearly better informed on the subject than I. At one point, however, he claimed that the Chinese government executed fewer people than the US government. This statement is flat-out wrong; China executes ten times as many people as the US, if not far more. It's a blatant lie. I called him on it, and he copped to it. The outcome is besides the point. Why does it matter that he lied? In this case, it provides weak evidence that the basics of his claim were wrong, that he knew the point he was arguing was, at least on some level, incorrect.

The fact that a person is willing to lie indefensibly in order to support their side of an argument shows that they have put "winning" the argument at the top of their priorities. Furthermore, they've decided, based on the evidence they have available, that lying was a more effective way to advance their argument than telling the truth. While exceptions obviously exist, if you believe that lying to a reasonably intelligent audience is the best way of advancing your claim, this suggests that you know your claim is ill-founded, even if you don't admit this fact to yourself.

continue reading »

View more: Next