lukeprog comments on Calibration Test with database of 150,000+ questions - Less Wrong

37 Post author: Nanashi 14 March 2015 11:22AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread.

Comment author: lukeprog 13 March 2015 05:39:19PM *  8 points [-]

Awesome!

I've been dying for something like this after I zoomed through all the questions in the CFAR calibration app.

Notes so far:
* The highest-available confidence is 99%, so the lowest-available confidence should be 1% rather than 0%. Or even better, you could add 99.9% and 0.1% as additional options.
* So far I've come across one question that was blank. It just said Category: jewelry and then had no other text. Somehow the answer was Ernest Hemingway.
* Would be great to be able to sign up for an account so I could track my calibration across multiple sessions.

Comment author: Nanashi 13 March 2015 06:09:41PM 1 point [-]

Re: 0%, that's fair. Originally I included 0% because certain questions are either unanswerable (due to being blank, contextless, or whatnot) but even then there's still a non-zero possibility of guessing the right answer out of a near-infinite number of choices.

Re: Calibration across multiple sessions. Good idea. I'll start with a local-based solution because that would be easiest and then eventually do an account-based thing.

Re: Blank questions. Yeah, I should probably include some kind of check to see if the question is blank and skip it if so.

Comment author: lukeprog 13 March 2015 09:57:50PM 2 points [-]

Thanks! BTW, I'd prefer to have 1% and 0.1% and 99% and 99.9% as options, rather than skipping over the 1% and 99% options as you have it now.

Comment author: Nanashi 14 March 2015 11:38:13AM 1 point [-]

I considered that but I think at least for now it may just overcomplicate things for not a ton of benefit. Subjectively it seems that out of 100 questions, there are maybe 10 that I would assign the highest possible confidence. Of those I'd say only 1 out them would be questions that I'd pick 99% confidence if it were available instead of, say, 99.9%.

So assuming (incorrectly) that I'm perfectly calibrated it would take about 7000 questions in order to stand a >50% chance of seeing a meaningful difference between the two confidence levels.

Comment author: RowanE 13 March 2015 05:44:46PM -1 points [-]

It's possible to be, to some extent, certain that you haven't thought of a correct answer (if not certain you don't know the answer), because you don't have any answer in mind and yet are not considering the answer "this is a trick question" or "there is no correct answer". Is this something that should be represented, making "0%" correct to include, or am I confused?

I got one blank question, which I think was an error with loading since the answer came up the same as the previous question, and the one after it took a couple seconds to appear on-screen.

Comment author: lukeprog 13 March 2015 05:59:17PM *  0 points [-]

I'd prefer not to allow 0 and 1 as available credences. But if 0 remained as an option I would just interpret it as "very close to 0" and then keep using the app, though if a future version of the app showed me my Bayes score then the difference between what the app allows me to choose (0%) and what I'm interpreting 0 to mean ("very close to 0") could matter.

Comment author: owencb 13 March 2015 07:56:56PM 4 points [-]

I think it's misleading to just drop in the statement that 0 and 1 are not probabilities.

There is a reasonable and arguably better definition of probabilities which excludes them, but it's not the standard one, and it also has costs -- for example probabilities are a useful tool in building models, and it is sometimes useful to use probabilities 0 and 1 in models.

(aside: it works as a kind of 'clickbait' in the original article title, and Eliezer doesn't actually make such a controversial statement in the post, so I'm not complaining about that)

Comment author: lukeprog 13 March 2015 09:55:55PM 1 point [-]

Fair enough. I've edited my original comment.

(For posterity: the text for my original comment's first hyperlink originally read "0 and 1 are not probabilities".)

Comment author: owencb 13 March 2015 10:16:04PM 0 points [-]

Perfect, thanks!