This reminds me of an infamous chemistry exam that no longer existed at my college by the time I got there but had passed into the student lore. For each question, you would first mark your answer (multiple choice, 4 or 5 choices), and then mark a confidence option. These were "high confidence" (5 points if right, -3 if wrong), "low confidence" (3 points if right, 0 if wrong), and "I don't know, give me a point" (1 point regardless of what answer is marked).
This exam was not popular with the students.
For those who don't feel like running the numbers, "I don't know" is the best option when you think the probability of your answer being correct is between 20% and 33%, "low confidence" is the best when you think the probability is between 33% and 60%, and "high confidence" is the best when you think the probability is between 60% and 100%.
If your probability doesn't fall between 20% and 100%, you're doing something wrong.
I vaguely recall reading an anecdote about a similar testing scheme where you had to give an actual numerical confidence value for each answer. Saying you were 100% confident of an answer that was wrong would give you minus infinity points.
I bet that would be even less popular with students.
I've given those kinds of tests in my decision analysis and my probabilistic analysis courses (for the multiple choice questions). Four choices, logarithmic scoring rule, 100% on the correct answer gives 1 point, 25% on the correct answer gives zero points, and 0% on the correct answer gives negative infinity.
Some students loved it. Some hated it. Many hated it until they realized that e.g. they didn't need 90% of the points to get an A (I was generous on the points-to-grades part of grading).
I did have to be careful; minus infinity meant that on one question you could fail the class. I did have to be sure that it wasn't a mistake, that they actually meant to put a zero on the correct answer.
If you want to try, you might want to try the Brier scoring rule instead of the logarithmic; it has a similar flavor without the minus infinity hassle.
minus infinity meant that on one question you could fail the class
...wow. Well, I guess that's one way to teach people to avoid infinite certainty. Reminiscent of Jeffreyssai. Did that happen to a lot of students?
Some students started putting zeros on the first assignment or two. However, all they needed was to see a few people get nailed putting 0.001 on the right answer (usually on the famous boy-girl probability problem) and people tended to start spreading their probability assignments. Some people never learn, though, so once in a while people would fail. I can only remember three in eight years.
My professor ran a professional course like this. One year, one of the attendees put 100% on every question on every assignment, and got every single answer correct. The next year, someone attended from the same company, and decided he was going to do the same thing. Quite early, he got minus infinity. My professor's response? "They both should be fired."
I cannot begin to say how vehemently I disagree with the idea of firing the first attendee. If I found out that your professor had fired them I would fire your professor.
Sure, it has to be an expected utility fail if you take the problem literally, because of how little it would have cost to put only 99.99% on each correct answer, and how impossible it would be to be infinitely certain of getting every answer right. But this fails to take into account the out-of-context expected utility of being AWESOME.
Firing the second guy is fine.
I think the problem is that people tend to conflate intention with effect, often with dire effect, (eg. "Banning drugs == reducing harm from drug use"). Thus when they see a mechanism in place that seems intended to penalise guessing, they assume that its the same as actually penalising guessing, and that anything that shows otherwise must be a mistake.
This may explian the "moral" objection of the one student: The test attempts to penalise guessing, so working against this intention is "cheating" by exploiting a flaw in the test. With the no-penalty multiple choice, theres no such intent so the assumption is that the benefits of guessing are already factored in.
This may not in fact be as silly as it sounds. Suppose that the test is unrelated to mathematics, and that there is no external motive to doing well. Eg. you are taking a test on Elizabethan history with no effect on your final grade, and want to calibrate yourself against the rest of the class. Here, this kind of test is a flaw, because the test isn't measuring solely what it intends to, but will be biased towards those who spot this advantage. If you are interested solely in an accurate ...
I'm surprised that test-preparation companies haven't picked up on this. Training people to understand calibration and loss aversion could be very helpful on standardized tests like the SATs. I've never taken a Kaplan or Princeton Review course, but those who have tell me this topic isn't covered. I'd be surprised if the people involved didn't know the science, so maybe they just don't know of a reliable way to teach such things?
They were in SAT prep books 25-27 years ago. (I took the SAT's while I was still 15.) The explanation given was something along the lines of, "Most people say that the SAT penalizes you for guessing, but this is wrong. Rather, it simply makes sure that, on average, guessing won't get you any extra points if you don't know anything about the question. If you can eliminate even one wrong answer out of five, you will always come out ahead by guessing. If you can't, then you still won't lose anything by guessing." They then showed math and examples to back it up.
It was actually in a very early part of the book I read, because they wanted you to understand how important it was to be able to identify even one wrong answer, and thus why the methods you were going to learn for doing that were important.
It would be interesting to re-frame the test as "start with 50 points. Get 1/2 for a correct answer, lose 1/2 for don't know, and lose 1 for a wrong answer". I suspect your friends would accept this as equivalent scoring, and would start guessing more.
The loss aversion is probably less strong because they're already taking a loss by not guessing, so losing just a bit more isn't that painful.
Oh my dear lord Cthulhu. Can I ask what level of class this was? If you say it was a postgraduate course at MIT, I may gather the last sane members of the human race and move to Pluto.
Postgraduate course at a university that's not Ivy League caliber but reasonably well-respected. In contrast to ahem some of the comments below, these people are all quite smart, some consistently better able to understand difficult concepts than I and a few having good original published research. This sort of rationality stuff is just a different skill that some smart people just don't have aptitude in.
How about wording this differently? Not the "last sane members of the human race." But the "first sane members of the human race."
A year back, I encountered a this kind of a test: binary multiple choice, one point for right answer, minus half a point for a wrong answer, zero points for no answer. (Multiple-choice exams of any kind are very rare in Finnish universities, so that's pretty much the only time in my life when I've been faced with a test like that.) Looking at the scoring, I came to the same conclusion as you: my expected score would be higher if I'd just try guessing each of the questions I wasn't sure on.
I didn't follow my own advice. I now wish I had, as I failed that exam. I was under a pretty heavy workload at the time, so I never ended up retaking it. I suspect I'd have passed if I'd just shut up and multiplied.
Why didn't I follow my own advice? I did have some kind of a conscious reason, but in retrospect it seems so flimsy that I have difficulty even formulating it here. It went something along the lines of "I might as well take all the questions I have absolutely no clue on and mark them all as 'true', which gives me a 50-50 chance to be right on each one assuming there are as many true as there are false questions. But what if the lecturer, forseeing that somebody would reason this wa...
Ooh, this is interesting. Eliezer says he hopes this wasn't at MIT or somewhere, and now people are remembering the MIT reference and assuming I go to MIT. Reminds me of that bias where you try to debunk a rumor, and all people can remember is that they heard someone talking about the rumor somewhere and believe it more. What's that called? There was an OB article on it somewhere, I think.
I should hire Eliezer to come by and make offhanded MIT references during my job interviews.
I suspect that your friends were simply trying to rationalize their previous behavior or avoid admitting they were wrong. I'll bet more of them would have been sympathetic to your arguments if they'd been presented before they'd ever taken a test of that type. In fact, I'll bet a few of them would find your arguments so obvious as to barely be worth mentioning if presented in this context. (E. g. if you'd posed as a brainteaser to your friends: "on a test of this type, do you increase your expected score by guessing or marking don't know", I'll bet some of them would have said "Guess. That's obvious.")
According to my interpretation, the only reason your outcome was superior was because you made the discovery early on under your own steam. To measure whether you would be better at admitting you were wrong than your friends, we would have to give you a test where you actually had to admit you were wrong.
Anyway, guessing does increase the variance in your answer. So maybe a more complete argument where you asked your friends how many questions they expected to know and then gave them odds for getting each of "no pass", "pass", "honors" and "high honors" using guess and no-guess strategies would have been more effective.
Taking tests is one place where I've noticed focus on rationality can give you a boost. I had two classes with a very similar format - each semester had two exams and a final. Each exam had several multi-part questions that got progressively more difficult. The average score was expected to be about 50%, and not everyone was expected to finish any of the exams. Grade in the class was based on ranking - the person with the highest cumulative score got an A, e.g.
A classmate and I realized that we could use the bias other students had for wanting to focus on the easy points to our advantage. That is, the later questions were harder, but they gave much more bang for the buck. It was kind of painful to leave questions you knew you could answer easily blank (which is where overcoming the bias comes in), but it was most certainly worth it when we got the top ranks.
This... absolutely sickens me. It's bad enough when I hear my family members argue morals/politics/economics that they subscribe to for proximate lifestyle purposes - but when University students pull this, and then ignore the eminent Yvain when he councils them otherwise?
My only comforts are the harsh cold truth of schadenfreude, that such beings don't deserve an extra 5%, and that at least I only wasted three years and $20 000 at post-secondary.*
*(My degree was non-technical; Humanities students who don't want a PhD should drop out in second year, spend a year reading, and then lie on their resume.)
P.S. Excellent break down of the reasoning process, Yvain. I think you hit the nail on the head.
The students infer a social rule penalising guessing. In almost all cases exploiting a technicality that works around a social rule is penalised. Nobody likes munchkins.
I would expect to observe a tendency for people rationalise reasons to not guess even if loss aversion was contolled for.
The idea of doing significantly worse than chance by guessing on the test sounds absurd, but I recall that when I was on Chemistry Team in high school, many of the teams we competed against managed it, with teams of four getting average scores of below 20% on four-option multiple choice tests.
If the students were really guessing at random, they ought to expect to do better, but answering questions to the best of their limited knowledge may cause them to do significantly worse than chance if the questions are designed to trip up people with common misunderstandings or gaps in their knowledge of the subject.
I think you got your math wrong
If you get 20 out of 30 questions wrong, you are break even, therefore the probability of losing points by guessing is
Sum( (i 30), i = 21..30) / 2^30 ~ 2.14% > 1%
When I took my high school's AP Calculus classes these last two years, the teacher pointed out that since, on average, guessing would give the same result as leaving questions blank, you might as well guess. As far as I know, nobody disagreed with him.
(Actually, he said it's better to guess, because leaving a question blank means running the risk of accidentally putting the next question's answer in the wrong place--which, in one case, led to a student answering practically every question in one section wrongly. But that's relatively impertinent.)
For an exam where what matters is your grade relative to other test-takers, like the SAT, probably yes, but on an exam with a hard pass/fail threshold, the utility function is discontinuous (and therefore non-linear) around the threshold, so guessing might make a difference.
It sounds like your fellow students understood the concept of a guessing penalty, but did not realise that the guessing penalty was too low in this case. One approach to convince them might have been:
Assume you get -0.0001 points for guessing an incorrect answer. Obviously, you should answer every question, because the penalty for guessing is so low. Now, assume that the guessing penalty is -20 points. Again, you obviously shouldn't guess. What would the penalty have to be where you're indifferent between guessing and not guessing? Obviously, when the pena...
Possibly part of the loss aversion is the desire not to look foolish. I mean, if the teacher is reviewing your exam results, and he sees you answered 100 questions correctly and said "don't know" for the rest, then you look like a pretty smart and modest guy compared to the schmuck who answered 125 questions correctly and 25 questions incorrectly.
Probably in our evolutionary history, if you looked foolish it was bad news.
But anyway, I am mainly posting in this thread to state that as an attorney I can attest that loss aversion is a big issue in ...
Suggested intervention if anyone finds themselves in this situation in the future: distribute copies of Ender's Game and/or Methods. Subjectively it feels to me like identifying with Ender / MoR!Harry has made me better at noticing and more motivated to take advantage of these kinds of optimization opportunities.
Regarding penalizing guessing, if you're going to penalize it you might as well go all the way. My high school math club once hosted a competition which included a round with a ridiculous guessing penalty (free response, 1 point for a correct answer, 0 for a blank, and -3 or -5 for an incorrect answer). Exactly one person out of a hundred-ish got a positive score.
This is incredible. As others have said the most likely explanation is that people could see the system was intended to dis-incentivise guessing and that this design intent shaped the way they saw the test.
Now I want an exam on "logic and probability" which uses this system. The surprise being that the grading system indicated is in fact a lie. You fail if you put a single "don't know". Otherwise you pass with 100%.
(A teacher of mine at school once set us a reading test. It had a big line at the top saying "read this entire test before starting". The...
This reminds me of a class I had as an undergraduate. To avoid taking another class with a lab, I took Ethnobotany to finish out my general science requirements. The tests were multiple choice with conjunctive answers. For example:
The parts of a flower include: a) Petal b) Seed c) Stamen d) Sepal
To which the correct answer is a, c, and d. It was computer scored such that the only correct answer was to bubble a, c, and d: bubbling a and c got you no points, nor did bubbling a, b, c, and d, for example.
Given this test format, I put the Conjuncti...
When I came across a question where I had no idea if one of the options was part of the answer, I never included it since including it would have made the resulting answer less likely to be true than the answer would have been by leaving it off.
That does not make sense. If the option you had no idea about were indeed part of the answer, then leaving it out would cause your answer to be incorrect. The choice is between answer like "a and b and c and d" or "a and b and c and not d". The Conjunction Fallacy would involve comparing these to the answer "a and b and c", which, while more likely than the previous choices because it dominates them, is not an admissible answer to the test as you described it.
What may have made this strategy beneficial, is that if you are more likely to recognize options that actually apply to the question, since you had been studying the subject matter, so options that you had no idea about were likely unrelated to the question.
The basic mistake seems to be loss aversion, the tendency to regret losses more than one values gains.
I tend to think of loss aversion as a preference rather than a mental error, but I agree that it probably explains a lot of the "don't know" answers. What I cannot figure out is why the test designers want an individual's level of loss aversion to affect their score.
Another possible (and probably over-charitable) explanation for the lack of guessing is that students are afraid of being drawn to the wrong answer. For example, I've heard that ...
I would say it basically comes down to the fact that abstract rationality is slow and requires lots of processing power. For the same reasons we can usually only mentally afford to employ a certain limited set of fairly abstracted terms, and can only follow the implications of this to a limited degree. If we were all Kryptonians it would probably be pretty functionally rational to stay in 'far mode' all the time, but as the squishy, dumb bugs we are a lot of our functional capacity derives from various habitual and patterned behaviour. Far mode mostly s...
College admissions and financial aid generally are gold mines of examples of non-optimizing behavior with large consequences. I fairly regularly see people losing tens of thousands of dollars or more through failure to spend a few hours doing relevant research.
FYI, the average student can expect to increase their score by about 8% by guessing, assuming the "non-guesses" were right 80% of the time. An enormous effect.
Maybe (if your goal was to get them to score higher) you could have pitched it as gambling for entertainment, i.e. record which answers you guessed on, and later compare who was luckiest in getting the biggest portion of those correct.
Most apathetic students have no qualms with guessing. It sounds like these peers of yours are either extremely diligent and motivated by fear, or unwilling to lose face by admitting that you're more clever than they.
I suggest that the students aren't as irrational as they appear. After all, why would the designer of the test incorporate a "don't know" option and a penalty for wrong answers, except to discourage guessing on questions that you're clueless on? And if I were a random student (instead of someone especially interested in the mathematics of decision theory), why should I take the trouble to second guess the test designer, instead of assuming that (with high probability) he is rational and competent at his job?
ETA: Also, you're supposed to maximize ...
Related to: Extreme Rationality: It's Not That Great
A while back, I said provocatively that the rarefied sorts of rationality we study at Less Wrong hadn't helped me in my everyday life and probably hadn't helped you either. I got a lot of controversy but not a whole lot of good clear examples of getting some use out of rationality.
Today I can share one such example.
Consider a set of final examinations based around tests with the following characteristics:
* Each test has one hundred fifty true-or-false questions.
* The test is taken on a scan-tron which allows answers of "true", "false", and "don't know".
* Students get one point for each correct answer, zero points for each "don't know", and minus one half point for each incorrect answer.
* A score of >50% is "pass", >60% is "honors", >70% is "high honors".
* The questions are correspondingly difficult, so that even a very intelligent student is not expected to get much above 70. All students are expected to encounter at least a few dozen questions which they can answer only with very low confidence, or which they can't answer at all.
At what confidence level do you guess? At what confidence level do you answer "don't know"?
I took several of these tests last month, and the first thing I did was some quick mental calculations. If I have zero knowledge of a question, my expected gain from answering is 50% probability of earning one point and 50% probability of losing one half point. Therefore, my expected gain from answering a question is .5(1)-.5(.5)= +.25 points. Compare this to an expected gain of zero from not answering the question at all. Therefore, I ought to guess on every question, even if I have zero knowledge. If I have some inkling, well, that's even better.
You look disappointed. This isn't a very exciting application of arcane Less Wrong knowledge. Anyone with basic math skills should be able to calculate that out, right?
I attend a pretty good university, and I'm in a postgraduate class where most of us have at least a bachelor's degree in a hard science, and a few have master's degrees. And yet, talking to my classmates in the cafeteria after the first test was finished, I started to realize I was the only person in the class who hadn't answered "don't know" to any questions.
I have several friends in the class who had helped me with difficult problems earlier in the year, so I figured the least I could do for them was to point out that they could get several free points on the exam by guessing instead of putting "don't know". I got a chance to talk to a few people between tests, and I explained the argument to them using exactly the calculation I gave above. My memory's not perfect, but I think I tried it with about five friends.
Not one of them was convinced. I see that while I've been off studying and such, you've been talking about macros of absolute denial and such, and while I'm not sure I like the term, this almost felt like coming up against a macro of absolute denial.
I had people tell me there must be some flaw in my math. I had people tell me that math doesn't always map to the real world. I had people tell me that no, I didn't understand, they really didn't have any idea of the answer to that one question. I had people tell me they were so baffled by the test that they expected to consistently get significantly more than fifty percent of the (true or false!) questions they guessed on wrong. I had people tell me that although yes, in on the average they would do better, there was always the possibility that by chance alone they would get all thirty of the questions they guessed on wrong and end up at a huge disadvantage1.
I didn't change a single person's mind. The next test, my friends answered just as many "don't know"s as the last one.
This floored me, because it's not one of those problems about politics or religion where people have little incentive to act rationally. These tests were the main component of the yearly grade in a very high-pressure course. My friend who put down thirty "don't know"s could easily have increased his grade in the class 5% by listening to me, maybe even moved up a whole letter grade. Nope. Didn't happen. So here's my theory.
The basic mistake seems to be loss aversion2, the tendency to regret losses more than one values gains. This could be compounded by students' tendency to discuss answers after the test: I remember each time I heard that one of my guesses had been wrong and I'd lost points, it was a deep psychic blow. No doubt my classmates tended to remember the guesses they'd gotten wrong more than the ones they'd gotten right, leading to the otherwise inexplicable statement that they expect to get more than half of their guesses wrong. But this mistake should disappear once the correct math is explained. Why doesn't it?
In The Terrible...Truth About Morality, Roko gives a good example of the way our emotional and rational minds interact. A person starts with an emotion - in that case, a feeling of disgust about incest, and only later come up with some reason why that emotion is the objectively correct emotion to have and why their action of condemning the relationship is rationally justified.
My final exam, thanks to loss aversion, created an emotional inclination against guessing, which most of the students taking it followed. When confronted with an argument against it, my friends tried to come up with reasons why the course they took was logical - reasons which I found very unconvincing.
It's really this last part which was so perfect I couldn't resist posting about it. One of my close friends (let's call him Larry) finally admitted, after much pestering on my part, that guessing would increase his score. But, he said, he still wasn't going to guess, because he had a moral objection to doing so. Tests were supposed to measure how much we knew, not how lucky we were, and if he really didn't know the answer, he wanted that ignorance to be reflected in his final score.
A few years ago, I would have respected that strong committment to principle. Today, jaded as I am, I waited until the last day of exams, when our test was a slightly different format. Instead of being true-false, it was multiple-choice: choose one of eight. And there was no penalty for guessing; indeed, there wasn't even a "don't know" on the answer sheet, although you could still leave it blank if you really wanted.
"So," I asked Larry afterwards, "did you guess on any of the questions?"
"Yeah, there were quite a few I didn't know," he answered.
When I reminded him about his moral commitment, he said something about how this was different because there were more answers available so it wasn't really the same as guessing on a fifty-fifty question. At the risk of impugning my friend's subconscious motives, I think he no longer had to use moral ideals to rationalize away his fear of losing points, so he did the smart thing and guessed.
Footnotes
1: If I understand the math right, then if you guess on thirty questions using my test's scoring rule, the probability of ending up with a net penalty from guessing is
less than one percent[EDIT: Actually just over two percent, thank you ArthurB]. If, after finishing all the questions of which they were "certain", a person felt confident that they were right over the cusp of a passing grade, assigned very high importance to passing, and assigned almost no importance to any increase in grade past the passing point, then it might be rational not to guess, to avoid the less than one percent chance of failure. In reality, no one could calculate their grade out this precisely.2: Looking to see if anyone else had been thinking along the same lines3, I found a very interesting paper describing some work of Kahneman and Tversky on this issue, and proposing a scoring rule that takes loss aversion into account. Although I didn't go through all of the math, the most interesting number in there seems to be that on a true/false test that penalizes wrong answers at the same rate it rewards correct answers (unlike my test, which rewarded guessing), a person with the empirically determined level of human loss aversion will (if I understand the stats right) need to be ~79% sure before choosing to answer (as opposed to the utility maximizing level of >50%). This also linked me to prospect theory, which is interesting.
3: I'm surprised that test-preparation companies haven't picked up on this. Training people to understand calibration and loss aversion could be very helpful on standardized tests like the SATs. I've never taken a Kaplan or Princeton Review course, but those who have tell me this topic isn't covered. I'd be surprised if the people involved didn't know the science, so maybe they just don't know of a reliable way to teach such things?