All of J.D.'s Comments + Replies

J.D.*10

[....is at the expense of rewarding knowledge of the correct answer.]

Hmm... I'm not sure that Alice has really more knowledge than Bob in your example.

[EDIT : In fact, in your example, for the quadratic scoring rule, the score of 50:50:$\epsilon:\epsilon$ is better than the score of 40:20:20:20 since $12/25 < 1/2 + 2\epsilon^2$, so that we can indeed say that Alice has more knowledge than Bob after this rule. The following example is, IMHO, more interesting. /EDIT].

Let me propose an other perspective with the following two answers for propositions... (read more)

2Bucky
The score for the 50:50:0:0 student is: 1−0.52−0.52−02−02=0.5 The score for the 40:20:20:20 student is: 1−0.62−0.22−0.22−0.22=0.52 I think the way you've done it is Briers rule which is (1 - the score from the OP). In Briers rule the lower value is better.
4gjm
Sure, 2 knows something 1 doesn't; e.g., 2 knows more about how unlikely B is. But, equally, 1 knows something 2 doesn't; e.g., 1 knows more than 2 about how unlikely C is. In the absence of any reason to think one of these is more important than the other, it seems reasonable to think that different probability assignments among the various wrong answers are equally meritorious and should result in equal scores. ... Having said that, here's an argument (which I'm not sure I believe) for favouring more-balanced probability assignments to the wrong answers. We never really know that the right answer is 100:0:0:0. We could, conceivably, be wrong. And, by hypothesis, we don't know of any relevant differences between the "wrong" answers. So we should see all the wrong answers as equally improbable but not quite certainly wrong. And if, deep down, we believe in something like the log scoring rule, then we should notice that a candidate who assigns a super-low probability to one of those "wrong" answers is going to do super-badly in the very unlikely case that it's actually right after all. So, suppose we believe in the log scoring rule, and we think the correct answer is the first one. But we admit a tiny probability h for each of the others being right. Then a candidate who gives probabilities a,b,c,d has an expected score of (1-3h) log a + h (log b + log c + log d). Suppose one candidate says 0.49,0.49,0.01,0.01 and the other says 0.4,0.2,0.2,0.2; then we will prefer the second over the first if h is bigger than about 0.0356. In a typical educational context that's unlikely so we should prefer the first candidate. Now suppose one says 0.49,0.49,0.01,0.01 and the other says 0.49,0.25,0.25,0.01; we should always prefer the second candidate. None of this means that the Brier score is the right way to prefer the second candidate over the first; it clearly isn't, and if h is small enough then of course the correction to the naive log score is also very small, provided c
J.D.20

[... why do they score more?]

I'm not sure if these are good reasons, but it seems to me that

1) The expected answer to the quiz does not just consist in identifying A as a correct answer but also in identifying the others as incorrect answers. I mean that the expected right answer is 100:0:0:0 (and not, for example, 100:50:0:0 or whatever else).

2) Giving 25:25 for B:C is better than giving 50:0 even if answer C is 0 since 25:25 is closer to 0:0 than 50:0 (for the usual Euclidean distance). In this perspective, a better answer for the 50:50:0:0's ... (read more)

3Bucky
Maybe 1) is where I have a fundamental difference. Given evidence A, a Bayesian update considers how well evidence A was predicted. There is no additional update due to how well ¬A being false was predicted. Even if ¬A is split into sub-categories, it isn't relevant as that evidence has already been taken into account when we updated based on A being true. r.e. 2) 50:25:0:0 gives a worse expected value than 50:50:0:0 as although my score increases if A is true, it decreases by more if B is true (assuming 50:50:0:0 is my true belief) r.e. 3) I think it's important to note that I'm assuming that exactly 1 of A or B or C or D is the correct answer. Therefore that the probabilities should add up to 100% to maximise your expected score (otherwise it isn't a proper scoring rule).