TobyBartels comments on 2014 Less Wrong Census/Survey - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (724)
I'd be much more comfortable answering the probability sections if I knew what epsilon is. I usually say 0% when the value is less than 0.5% and 100% when the value is greater than 99.5%, rounding to the nearest whole percentage, on the grounds that the whole point of using percentages is to avoid explicit fractions (common or decimal). But then you ruin this by explicitly mentioning 0.5% and 99.99% as possible answers. If you had put a hard limit on the number of digits allowed, then I could have used that. In the end, since I saw no consistent guidance, I fell back on my usual practice. The result is that I had a lot of 0s and 100s; hopefully that won't mess up your algorithms.
ETA: It is probably relevant here that I am a naturally lazy person.
I think it might have been better to ask people to estimate what are the odds that a given statement is true. If a probability of a statement is close to zero or close to one, it gives us better precision without having to worry about digits after the decimal point (however, if a probability is close to one half, it is probably better to ask for a probability). Although it is easy to convert odds to probabilities, how many people in this survey actually took the mental effort to remind themselves to calculate the odds first and only then to express them as probabilities? I might be wrong, but I guess that only a minority. An idea for the next year survey - it might be interesting to compare the answers of two groups, one of which would be asked to estimate probabilities, the other one to estimate the odds.
Yes, odds are good (and log-odds are even better), but people are bad at both dealing with very large absolute values and dealing with very fine precisions. I think that the survey is correct to put in a cut-off (whether an ϵ for probabilities, an N for log-odds, or one of each for odds); it should just tell us where. (Edit: put in stuff about log-odds properly.)
Are you using "odds" to refer to percentages and "probabilities" to refer to fractions? I don't think there is actually any difference in meaning between the two terms.
Colloquial language doesn't make this distinction, but by technical convention, they are different.
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’; numerically, that's the fraction 5/3, or rather (because of the ‘against’) its reciprocal, 3/5. Thus odds run from 0 (impossible) to infinity (certain), with odds of 1 being perfectly balanced between Yes and No. In contrast, probabilities run only from 0 to 1. An event with odds of 5 to 3 against, or equivalently odds of 3/5, has a probability of 3/(3+5) = 3/8. So the numbers are different. The conversion formulas are O = P/(1 − P) and P = O/(1 + O).
Then there are log-odds; this is log₂ O bits. (You can also use other bases than 2 and correspondingly other units than bits.) Now 0 indicates perfect balance between Yes and No; a positive number means more likely Yes than No, and a negative number means less likely Yes than No. Log-odds run from negative infinity (impossible) to infinity (certain).
Oh right, I forgot about that definition. The main probability conversions that I was aware of involved converting between fractions and percentages, sometimes expressed instead as probabilities between 0 and 1. Theoretically, it makes sense that odds can also be converted to or from probabilities, now that I think about it. Thanks for your explanation.
Epsilon is a minuscule amount. It's vanishingly small, but it's still there.
Yes, but which minuscule amount?
To be more specific: If ϵ ≥ 5 × 10⁻ⁿ (which it must be for some n, if it is a positive real number), then I only need to figure out my probability to n + 1 digits. Upon doing so, if it's all 0s, then my probability is no more than ϵ, so I can enter 0. Otherwise, I should enter something larger. (And a similar thing holds on the other end.) Specifying ϵ serves the practical purpose of telling us how much work to put into estimating our probabilities. Since I had no guideline for that, I chose to default to ϵ = 1/2 (in percentage points), rather than try to additionally work out how small ϵ was supposed to be.
If, instead of bringing up ϵ, the survey had instructed us to use as many decimals as we need to avoid ever answering either 0 or 100, then I probably would have done more work. (There are reasons why this is bad, since the results will be increasingly unreliable, but still it could have said that.) But since I knew that at some point my work would be ignored, I didn't do any.
(Edits: minor grammar and precise phrasing of inequalities.)
I took epsilon to be simply 0.5, on the basis of "the survey can take decimals but I'm going to use whole numbers as suggested, so 0 means I rounded down anything less than 0.5". This is imprecise but gives me greater confidence in my answers, and (as you say), I have some tendency towards laziness.
Yes, that's what I did too (0.5%).
I don't think it will mess up the algorithms. My guess is that most people probably rounded most calibration answers to the tens place due to lack of enough confidence to be more precise, but since people are giving different values, the average across all respondents is unlikely to fall on an increment of ten, and should be a reasonably accurate measure of the respondents' collective assigned probability for a question.
It could mess them up, because in theory a single wrong answer with 100% confidence renders the entire series infinitely poorly calibrated. The survey says that this won't be done, that 100% will be treated as something slightly less than that. But how much less could depend on assumptions that the survey-makers made about how often people would answer this way, and maybe I did it too much.
I doubt it, since I'm pretty sure that they know enough about these pitfalls to avoid them. But I felt that I answered 0 and 100 quite a lot, so I thought that some warning was in order.
Even though percentages are typically used for cases where precision is less important, I'd say that in this context it would be better to err on the side of precision.