brilee comments on Log-odds (or logits) - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (18)
Yeah, I was definitely thinking about that. The mathematician in me won out in the end.
It occurs to me that a lot of people have probably thought about this, and they have alternately used base 2, base e, and base 10. Unless we get the entire LW community to standardize on one base, we won't be able to coherently communicate with one another using log-probabilities, and therefore log-probabilities will stay relegated to the dustbin.
base 2 - advantages, we can talk about N bytes' worth of evidences.
base e - mathematician's base
base 10 - common layperson can understand it, advantages with the 9's and 0's.
Actually, I think you're right, log base 10 is probably better. If others agree, I'll rewrite the article in base 10.
What's the specific benefit of base e for log-odds, though? Base e has lots of special properties that make it useful in many areas of mathematics (e^x is its own derivative, de Moivre's formula, &c.), but is this one of them? (It could be; I don't know.)
To quote Jaynes, p.91 of PT:TLoS:
So to answer your question, the only advantage of base e is that "ln" looks tidier than "log10".
Apart from being more intuitively understandable to humans, using base 10 also allows us to multiply by 10 and measure evidence in the familiar unit of decibels.
The natural unit of ratio, the neper (Np), is easier to interpret for small ratio contributions, where the derivative of exp(x) is ≈1:
0.1Np = exp( 0.1) ∶ 1 ≈ 1.1 ∶ 1
-0.1Np = exp(-0.1) ∶ 1 ≈ 0.9 ∶ 1
This could make for an easy upgrade path to use of nepers or centinepers instead of percents in comparatives involving rates, which would reduce semantic confusion. "50% faster" can mean "gets 150% as far" (so .41Np faster, or 41 cNp, or perhaps 41Np%) or "takes 50% as much time" (so .69Np faster, or 69cNp, or 69Np%). That's an argument for using nepers as a standard base outside communications of probability.
(trivia: Nepers and radians are each other turned sideways, being respectively the real and imaginary parts of eigenvalues of linear differential equation systems.)
Wouldn't it be easier to talk about N bytes worth of evidence in base 256? Bits of evidence seems the more useful metric!
Article is rewritten in base 10, and I rewrote some of the explanation for Bayesian updates. Enjoy!
I would like to see the article in base 10.