You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

brilee comments on Log-odds (or logits) - Less Wrong Discussion

20 Post author: brilee 28 November 2011 01:11AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (18)

You are viewing a single comment's thread. Show more comments above.

Comment author: brilee 28 November 2011 02:47:57AM *  5 points [-]

Yeah, I was definitely thinking about that. The mathematician in me won out in the end.

It occurs to me that a lot of people have probably thought about this, and they have alternately used base 2, base e, and base 10. Unless we get the entire LW community to standardize on one base, we won't be able to coherently communicate with one another using log-probabilities, and therefore log-probabilities will stay relegated to the dustbin.

base 2 - advantages, we can talk about N bytes' worth of evidences.

base e - mathematician's base

base 10 - common layperson can understand it, advantages with the 9's and 0's.

Actually, I think you're right, log base 10 is probably better. If others agree, I'll rewrite the article in base 10.

Comment author: Zack_M_Davis 28 November 2011 03:02:49AM *  8 points [-]

base e - mathematician's base

What's the specific benefit of base e for log-odds, though? Base e has lots of special properties that make it useful in many areas of mathematics (e^x is its own derivative, de Moivre's formula, &c.), but is this one of them? (It could be; I don't know.)

Comment author: [deleted] 28 November 2011 09:22:34PM 7 points [-]

To quote Jaynes, p.91 of PT:TLoS:

In many applications it is convenient to take the logarithm of the odds because of the fact that we can then add up terms. Now we could take the logarithm to any base we please, and this cost the writer some trouble. Our analytic expressions always look neater in terms of natural (base e) logarithms. But back in the 1940s and 1950s when this theory was first developed, we used base 10 logarithms because they were easier to find numerically; the four-figure tables would fit on a single page. Finding a natural logarithm was a tedious process, requiring leafing through enormous old volumes of tables.

Today, thanks to hand calculators, all such tables are obsolete and anyone can find a ten-digit natural logarithm just as easily as a base 10 logarithm. Therefore, we started happily to rewrite this section in terms of the aesthetically prettier natural logarithms. But the result taught us that there is another, even stronger, reason for using base 10 logarithms. Our minds are thoroughly conditioned to the base 10 number system, and base 10 logarithms have an immediate, clear intuitive meaning to all of us. However, we just don't know what to make of a conclusion stated in terms of natural logarithms, until it is translated back into base 10 terms. Therefore, we re-wrote this discussion, reluctantly, back into the old, ugly base 10 convention.

So to answer your question, the only advantage of base e is that "ln" looks tidier than "log10".

Apart from being more intuitively understandable to humans, using base 10 also allows us to multiply by 10 and measure evidence in the familiar unit of decibels.

Comment author: Steve_Rayhawk 30 November 2011 06:43:51PM *  5 points [-]

The natural unit of ratio, the neper (Np), is easier to interpret for small ratio contributions, where the derivative of exp(x) is ≈1:

0.1Np = exp( 0.1) ∶ 1 ≈ 1.1 ∶ 1
-0.1Np = exp(-0.1) ∶ 1 ≈ 0.9 ∶ 1

This could make for an easy upgrade path to use of nepers or centinepers instead of percents in comparatives involving rates, which would reduce semantic confusion. "50% faster" can mean "gets 150% as far" (so .41Np faster, or 41 cNp, or perhaps 41Np%) or "takes 50% as much time" (so .69Np faster, or 69cNp, or 69Np%). That's an argument for using nepers as a standard base outside communications of probability.

(trivia: Nepers and radians are each other turned sideways, being respectively the real and imaginary parts of eigenvalues of linear differential equation systems.)

Comment author: wedrifid 28 November 2011 12:36:56PM 7 points [-]

base 2 - advantages, we can talk about N bytes' worth of evidences.

Wouldn't it be easier to talk about N bytes worth of evidence in base 256? Bits of evidence seems the more useful metric!

Comment author: brilee 30 November 2011 04:12:23PM 0 points [-]

Article is rewritten in base 10, and I rewrote some of the explanation for Bayesian updates. Enjoy!

Comment author: shokwave 28 November 2011 11:36:08AM 0 points [-]

I would like to see the article in base 10.