1 min read28th Nov 201121 comments

31

(I wrote this post for my own blog, and given the warm reception, I figured it would also be suitable for the LW audience. It contains some nicely formatted equations/tables in LaTeX, hence I've left it as a dropbox download.)

Logarithmic probabilities have appeared previously on LW here, here, and sporadically in the comments. The first is a link to a Eliezer post which covers essentially the same material. I believe this is a better introduction/description/guide to logarithmic probabilities than anything else that's appeared on LW thus far.

 

 

Introduction:

Our conventional way of expressing probabilities has always frustrated me. For example, it is very easy to say nonsensical statements like, “110% chance of working”. Or, it is not obvious that the difference between 50% and 50.01% is trivial compared to the difference between 99.98% and 99.99%. It also fails to accommodate the math correctly when we want to say things like, “five times more likely”, because 50% * 5 overflows 100%.
Jacob and I have (re)discovered a mapping from probabilities to log- odds which addresses all of these issues. To boot, it accommodates Bayes’ theorem beautifully. For something so simple and fundamental, it certainly took a great deal of google searching/wikipedia surfing to discover that they are actually called “log-odds”, and that they were “discovered” in 1944, instead of the 1600s. Also, nobody seems to use log-odds, even though they are conceptually powerful. Thus, this primer serves to explain why we need log-odds, what they are, how to use them, and when to use them.

 

Article is here (Updated 11/30 to use base 10)

New to LessWrong?

New Comment
21 comments, sorted by Click to highlight new comments since: Today at 11:47 AM

Comments:

Log base ten may be more intuitive for conversion purposes. Then adding another 9 corresponds to adding 1.

"Five times more likely" should overflow for probabilities greater than 0.2. This is because the terminology "times more likely" is usually used in the context of decision-making, so it manipulates the linear probabilities because that's what goes into the expected utility.

Yeah, I was definitely thinking about that. The mathematician in me won out in the end.

It occurs to me that a lot of people have probably thought about this, and they have alternately used base 2, base e, and base 10. Unless we get the entire LW community to standardize on one base, we won't be able to coherently communicate with one another using log-probabilities, and therefore log-probabilities will stay relegated to the dustbin.

base 2 - advantages, we can talk about N bytes' worth of evidences.

base e - mathematician's base

base 10 - common layperson can understand it, advantages with the 9's and 0's.

Actually, I think you're right, log base 10 is probably better. If others agree, I'll rewrite the article in base 10.

base e - mathematician's base

What's the specific benefit of base e for log-odds, though? Base e has lots of special properties that make it useful in many areas of mathematics (e^x is its own derivative, de Moivre's formula, &c.), but is this one of them? (It could be; I don't know.)

[-][anonymous]12y120

To quote Jaynes, p.91 of PT:TLoS:

In many applications it is convenient to take the logarithm of the odds because of the fact that we can then add up terms. Now we could take the logarithm to any base we please, and this cost the writer some trouble. Our analytic expressions always look neater in terms of natural (base e) logarithms. But back in the 1940s and 1950s when this theory was first developed, we used base 10 logarithms because they were easier to find numerically; the four-figure tables would fit on a single page. Finding a natural logarithm was a tedious process, requiring leafing through enormous old volumes of tables.

Today, thanks to hand calculators, all such tables are obsolete and anyone can find a ten-digit natural logarithm just as easily as a base 10 logarithm. Therefore, we started happily to rewrite this section in terms of the aesthetically prettier natural logarithms. But the result taught us that there is another, even stronger, reason for using base 10 logarithms. Our minds are thoroughly conditioned to the base 10 number system, and base 10 logarithms have an immediate, clear intuitive meaning to all of us. However, we just don't know what to make of a conclusion stated in terms of natural logarithms, until it is translated back into base 10 terms. Therefore, we re-wrote this discussion, reluctantly, back into the old, ugly base 10 convention.

So to answer your question, the only advantage of base e is that "ln" looks tidier than "log10".

Apart from being more intuitively understandable to humans, using base 10 also allows us to multiply by 10 and measure evidence in the familiar unit of decibels.

The natural unit of ratio, the neper (Np), is easier to interpret for small ratio contributions, where the derivative of exp(x) is ≈1:

0.1Np = exp( 0.1) ∶ 1 ≈ 1.1 ∶ 1
-0.1Np = exp(-0.1) ∶ 1 ≈ 0.9 ∶ 1

This could make for an easy upgrade path to use of nepers or centinepers instead of percents in comparatives involving rates, which would reduce semantic confusion. "50% faster" can mean "gets 150% as far" (so .41Np faster, or 41 cNp, or perhaps 41Np%) or "takes 50% as much time" (so .69Np faster, or 69cNp, or 69Np%). That's an argument for using nepers as a standard base outside communications of probability.

(trivia: Nepers and radians are each other turned sideways, being respectively the real and imaginary parts of eigenvalues of linear differential equation systems.)

base 2 - advantages, we can talk about N bytes' worth of evidences.

Wouldn't it be easier to talk about N bytes worth of evidence in base 256? Bits of evidence seems the more useful metric!

Article is rewritten in base 10, and I rewrote some of the explanation for Bayesian updates. Enjoy!

I would like to see the article in base 10.

Sorry for the necro -- the linked article is 404'd. I uploaded a backup here. I didn't find it on the author's site but did find a copy through Web Archive; still, maybe my link will save someone else the hassle.

and spuriously in the comments

I don't think this word means what you think it means.

(Also I didn't know you were on Less Wrong. I had previously plugged this summary of log-odds on my blog and was considering mentioning it here.)

Can I find the article somewhere else? Link is dead now

See Jach's reply.

Good work! You might mention that the reason why log-odds are awful for things like adding probabilities of two disjoint events is that there's not a nice formula for log(x+y). That's the price of turning multiplication into addition.

[-][anonymous]12y10

I find it interesting that you lack familiarity with log-odds? What field are you in? Statisticians will usually be familar with them, as the logit is the canonical link function for the binomial function when using general linear modeling. Cut of (some) jargon, if I have a data set with binomial outcomes, and I wish to model my data as having normal errors, and the predictors as having linear effect on the outcome, I'd convert my data by using log odds. So, for instance, if I was looking at age as a predictor for diabetes (which is a yes no outcome)

I have a very strong competition math background from high school, but my primary field is chemistry.

Of all the weird coincidences - I rediscovered this myself the week before last. (likewise inspired by previous LW discussion of log-odds, which seemed intuitively correct but not rigorously or symmetrically defined...)

What I failed to do, shamefully in view of your example, was to write everything up concisely and clearly to share with others. Thank you for being less short-sighted or less selfish.

It's a good article for learning about log odds, but I disagree with some of the justification. Yes it is easy to say something has a 110% chance of working, but a nonsensical lie like this is better than a plausible lie which may trick you into believing it.

It seems to me that this doesn't have any real advantage over odds ratios. If I want to do a Bayesian update, I multiply the odds by the relative likelihood. In the example in the article (1/10,000 chance of having the disease, 3% false positive, and 1% false negative), You just take 1:9999 and multiply it by 0.99/0.03 = 33:1 for each successful test. Then you have 33:9999 = 1:303, then 33:303 = 11:101, and finally 363:101 for the final test. Then to change back, you just take 363/(363+101) = 78.23%. The calculations are slower (two multiplications vs. one addition), but it's much easier and more intuitive to convert between them and traditional probabilities.

What you've described is in fact, exactly the same thing as log-odds - they're simply separated by a logarithm/exponentiation. Thus, all the multiplications you describe are the counterpart of the additions I describe. I agree, we could work with odds ratio, without taking the logarithm - but using logarithms has the benefit of linearizing the probability space. The distance between 1 L% and 5 L% is the same as the distance between 10 L% and 14 L%, but you wouldn't know it by looking at 2.72:1 and 150:1 versus 22,000:1 and 1,200,000:1.

Pick up Jaynes' Probability Theory and turn to the section on decibels of evidence, an even more convenient measure. Or for a summary see Eliezer's 0 And 1 Are Not Probabilities in the sequences.

When you work in log odds, the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other. That is, the log odds gives us a natural measure of spacing among degrees of confidence.

Or for a summary see Eliezer's 0 And 1 Are Not Probabilities

(Downvoted; the OP already linked to that exact post.)