# Thinking Bayesianically, with Lojban

9 24 January 2012 06:47PM

"Do not walk to the truth, but dance. On each and every step of that dance your foot comes down in exactly the right spot. Each piece of evidence shifts your beliefs by exactly the right amount, neither more nor less. What is exactly the right amount? To calculate this you must study probability theory. Even if you cannot do the math, knowing that the math exists tells you that the dance step is precise and has no room in it for your whims." -- from "Twelve Virtues of Rationality", by Eliezer Yudkowsky

One of the more useful mental tools I've found is the language Lojban ( http://www.lojban.org/tiki/Learning ), which makes explicit many of the implicit assumptions in languages. (There's also a sub-language based on Lojban, called Cniglic ( http://www.datapacrat.com/cniglic/ ), which can be added to most existing languages to offer some additional functionality.)

One of the things Lojban (and Cniglic) has are 'evidentials', words which can be used to tag other words and sentences to explain how the speaker knows them: "ja'o", meaning "I conclude", "za'a" meaning "I observe", "pe'i" meaning "It's my opinion", and more. However, there hasn't been any easy and explicit way to use this system to express Bayesian reasoning...

... until today.

Lojban not only allows for, but encourages, "experimental" words of certain sorts; and using that system, I have now created the word "bei'e" (pronounced BAY-heh), which allows a speaker to tag a word or sentence with how confident they are, in the Bayesian sense, of its truth. Taking an idea from the foundational text by E.T. Jaynes, "bei'e" is measured in decibels of logarithmic probability. This sounds complicated, but in many cases, is actually much easier to use than simple odds or probability; adding 10 decibels multiplies the odds by a factor of 10.

The current reftext for "bei'e" is at http://www.lojban.org/tiki/bei%27e , which basically amounts to adding Lojbannic digits to the front of the word:

 ni'uci'ibei'e -oo 0% 1:oo complete disbelief, paradox ni'upabei'e -1 44.3% 4:5 ni'ubei'e <0 <50% <1:1 less than even odds, less likely than so nobei'e 0 50% 1:1 neither belief nor disbelief, agnosticism ma'ubei'e >0 >50% >1:1 greater than even odds, more likely than not pabei'e 1 55.7% 5:4 preponderance of the evidence rebei'e 2 61.3% 3:2 cibei'e 3 66.6% 2:1 clear and convincing evidence vobei'e 4 71.5% 5:2 mubei'e 5 76.0% 3:1 beyond a reasonable doubt xabei'e 6 80.0% 4:1 zebei'e 7 83.3% 5:1 bibei'e 8 86.3% 6:1 sobei'e 9 88.8% 8:1 panobei'e 10 90.9% 10:1 pacibei'e 13 95.2% 20:1 xarebei'e 62 99.99994% 1,500,000:1 5 standard deviations ci'ibei'e oo 100% oo:1 complete belief, tautology xobei'e ? ?% ?:? question, asking listener their level of belief

By having this explicit mental tool, even if I don't use it aloud, I'm finding it much easier to remember to gauge how confident I am in any given proposition. If anyone else finds use in this idea, so much the better; and if anyone can come up with an even better mental tool after seeing this one, that would be better still.

.uo .ua .uisai .oinairo'e

Sort By: New
Comment author: 16 March 2012 06:20:59PM 0 points [-]

In my mind beyond a reasonable doubt doesn't mean “with a posterior probability of 76%” -- more like 99.9%. (Then again, I'm not a native English speaker.)

Comment author: 06 February 2012 04:15:13PM 1 point [-]

I did try to learn a bit of Lojban when playing with Anki. I came to the conclusion that I really, really don't like trying to produce speech with a language without having lots of cached idiom to draw from. It makes me feel like my brain is locked in an autist mode completely deaf to the tone connotations of whatever I'm saying.

I might be a bit of an outlier since I also hated all the foreign language teaching in school, which went straight into making everyone speak in very broken language, instead of spending the first few years reading foreign language pulp comics and novels with a simple vocabulary where aliens eat peoples' intestines, which is the good way to learn a foreign language.

Comment author: 26 January 2012 08:46:19AM 1 point [-]
Comment author: 05 February 2012 07:58:45PM 0 points [-]

On what basis do you readily reject lojban?

m'ube'i Lojban is a really good mental tool that will let you escape most of the 37 ways suboptimal use of mental categories can have a negative effect on your cognition.

Comment author: 06 February 2012 01:27:45AM *  1 point [-]

1) Because my experience with language classes in high school and college has taught me that learning a language that I don't already know is damn hard, not very fun, and is completely useless unless I invest a ridiculous amount of time and effort into becoming an expert (defined as someone who can watch TV in the language without subtitles, and/or read novels for adults written in that language without resorting to a dictionary)

2) Because there isn't any media written in it that I especially want to read/watch/play which would inspire me to put in the ridiculous amount of time that becoming an expert in a language requires

3) Because nobody is going to pay me to study it.

Comment author: 06 February 2012 10:48:23AM 0 points [-]

That is some very valid reasons 2 and 3, but I will have to dispute 1.

Lojban is not hard. If you have experience with formal language/predicate logic/programming it is trivial to modify that understanding. Lojban has ~2000 words and word roots necessary to be completely fluent in it, compared to the average English speakers vocab of ~15000. The grammar can be summarized in 11 rules, there are no irregularities, no words that change arbitrarily, etc. Lojban, compared to French, German, Russian, Spanish, Portuguese or what it might have been you studied in high school, should not be hard.

Comment author: 16 March 2012 06:25:17PM *  1 point [-]

Lojban is not hard. If you have experience with formal language/predicate logic/programming it is trivial to modify that understanding.

How about having a conversation in it. (In normal conversations, people just don't have the time to engage in Type 2 processes -- most utterances take a few seconds at most. I've heard that lots of people tried to learn Lojban well enough to have real-time conversations in it and failed.) Also, the French, German, Russian, Spanish, or Portuguese you studied in high school are close enough to English -- not only genetically (all Indoeuropean languages, FWIW), but also typologically -- see http://en.wikipedia.org/wiki/Standard_Average_European. I've heard that native English speakers have an easier time learning French than Indonesian (probably the simplest non-creole natural language), FFS.

Comment author: 06 February 2012 04:23:48PM 0 points [-]

http://news.ycombinator.com/item?id=2634912 this guy seems to have tried to learn Lojban for a long time, and didn't really get going with it. Most people who fail and drop out probably don't talk that much about it.

Are long-time Lojban enthusiasts generally able to produce Lojbanic text and speech with little effort which other long-time Lojban enthusiasts can understand with little effort?

Comment author: 06 February 2012 12:32:06PM 2 points [-]

Lojban is not hard. If you have experience with formal language/predicate logic/programming it is trivial to modify that understanding. Lojban has ~2000 words and word roots necessary to be completely fluent in it, compared to the average English speakers vocab of ~15000.

While technically true, those 2000-some words combine in nontrivial and mostly arbitrary ways. The language is no toki pona. I think the proper comparison is with Mandarin; there one learns on the order of 4000 characters, which then combine in not-immediately-obvious ways.

The grammar can be summarized in 11 rules,

The PEG that parses Lojban is the size of an X-Box. This claim is plainly false. There are more than 11 cmavo that substantially change the parsing of lojban in distinct ways.

there are no irregularities, no words that change arbitrarily, etc.

1. bisli -- x1 is a quantity of/is made of/contains ice [frozen crystal] of composition/material x2
2. blaci -- x1 is a quantity of/is made of/contains glass of composition including x2

Then suddenly,

1. cakla -- x1 is made of/contains/is a quantity of chocolate/cocoa
2. canre -- x1 is a quantity of/contains/is made of sand/grit from source x2 of composition including x3
3. danmo -- x1 is made of/contains/is a quantity of smoke/smog/air pollution from source x2

These are all gismu places that have to be memorized, because there is no template rule for gismu referring to materials. While "there are no irregularities, no words that change arbitrarily" is technically true, there are also few regularities in the basic words (= gismu and cmavo) of the language. The situation is resoundingly worse once one starts forging lujvo.

Comment author: 06 February 2012 06:46:36PM *  0 points [-]

The grammar can be summarized in 11 rules,

The PEG that parses Lojban is the size of an X-Box [...]

I think the operating phrase here is "summarized," it is akin to the way you can write a human-readable book about english grammar even though the only known parser for it is the human brain. I have, specifically, viewed the Yacc code that can parse Lojban (with some clever use of error recovery) and it holds on the order of 600 rules. My point was that if you wrote a book on Lojban grammar it would have 11 chapters, each meticulously detailing a different category cmavo and their use, how to construct brivla, how to construct lujvo and some other things. Then you would only need that book, a slim dictionary and a pronunciation guide.

Then suddenly,

That is a very valid point, the amount of information is probably the same.

Comment author: 06 February 2012 07:32:32PM *  0 points [-]

The grammar can be summarized in 11 rules,

The PEG that parses Lojban is the size of an X-Box [...]

I think the operating phrase here is "summarized," it is akin to the way you can write a human-readable book about english grammar even though the only known parser for it is the human brain. [...] My point was that if you wrote a book on Lojban grammar it would have 11 chapters, each meticulously detailing a different category cmavo and their use, how to construct brivla, how to construct lujvo and some other things.

I claim there is no meaningful "summary" of Lojban that constrains itself to eleven "rules", each less than a typical paragraph in length. The reference grammar covers most of the language, taking arguably 18 or 19 chapters to do so. Most of those chapters cover distinct classes of words, to boot.

There is an ancient log that mentions 11 rules in it, but that is just that -- ancient history (circa 1988! A quarter of the LW population wasn't even alive then!). It doesn't even pretend to be a reasonable catalog of the language. Perhaps they've updated since then, but a swift Googling doesn't bring up anything more recent.

In summary, lojban is a hard language mixing the worst of incompressible memorization (e.g., gismu places, lujvo, fu'ivla), archaic logic/maths (e.g., mekso), and just straight-up bad design. I liked it precisely because it was challenging and fun to hack on. At the end of the day, a person wanting to learn a new language is better served by learning a common natlang.

Comment author: 06 February 2012 11:37:24PM 0 points [-]

Good point, I guess I hadn't researched the issue sufficiently.

Comment author: 06 February 2012 12:02:03PM *  1 point [-]

Is the vocabulary of Lojban rich enough that you could translate Hamlet into it? If Lojban's vocabulary is that easy to learn, does that also render it trivial?

Comment author: 06 February 2012 04:30:10PM 2 points [-]

My first question was how you'd translate "Blood for the Blood God! Skulls for the Skull Throne!" into Lojban.

(An irc discussion got to the point of trying to translate the more logically explicated form "Let the current state of affairs be such that it contains blood that was not contained in the preceding state of affairs and that is blood that belongs to the Blood God and let the current state of affairs be such that it contains skulls that were not contained in the preceding state of affairs and that are skulls that belong in the Skull Throne.", but it sorta seemed to lose something in that translation and the snappy Lojbanic version continued to evade us.)

Comment author: 06 February 2012 08:31:07PM 0 points [-]

I am quite a green beginner but with a bit of rephrasing you could get something analogous to "To drain the blood of our enemies is the practice of the blood god, we take the skulls from our enemies for building the skull throne."

Comment author: 06 February 2012 12:17:56PM 1 point [-]

Good one.

Lojban is by design combinatorial and has an explicit indicator for metaphorical expressions. So it is like a turing complete programming language, you can probably translate Hamlet, but I do not know how well it would work.

In addition to paper-machine's post, there are The Christian Bible, Tao Te Ching, The Metamorphosis (Franz Kafka), Le Petit Prince (Antoine de Saint-Exupéry), as well as numerous short stories.

Source

Comment author: 06 February 2012 12:06:07PM 1 point [-]
Comment author: 06 February 2012 12:09:14PM 1 point [-]

Yeah, that's probably good enough. ;)

Comment author: 25 January 2012 01:50:59AM 9 points [-]

75% is beyond a reasonable doubt? I always thought that would be more like 99%. At the very least, 95%.

Comment author: 25 January 2012 03:01:43AM 1 point [-]

Some time ago, I went looking for numbers, and the best I found was a specialized survey which asked how certain people would have to be in order to convict someone as being guilty 'beyond a reasonable doubt'. The answers varied depending on the crime, from 75% for petty larceny to 95% for murder; so I assumed that the higher numbers were because of the significant punishments involved, and that people wanted to be extra sure that if they voted to convict, that the person really deserved it; and thus that the 'real' meaning of 'beyond a reasonable doubt' was the lower number, 75%.

A bit of Googling turns up http://www.law.northwestern.edu/faculty/fulltime/diamond/papers/conflictBetweenPrecisionAndFlexibility.pdf , which appears to be what I was looking at.

Comment author: 25 January 2012 03:48:08AM *  4 points [-]

"Beyond a reasonable doubt" seems to suggest that the chance of being wrong is small enough to be safely ignored unless the utilities involved are enormous, a standard that I would expect to require at least around 98% confidence. If people noticed that you were wrong 1 in 4 times that you say there is no reasonable doubt, they would think of you as severely overconfident. The numbers you assign to "preponderance of the evidence" and "clear and convincing evidence" also seem badly skewed, though less so.

Comment author: 25 January 2012 04:47:39AM 1 point [-]

One theory I've come up with is that the true value of the term 'beyond a reasonable doubt' is less in the specific percentage value, and more in that it makes for a significant difference in the evidence required to convict someone of a civil tort (in which they are merely required to compensate the harmed party) and the evidence required to predict that someone is likely to commit further criminal actions in the future (and thus it would be reasonable to take additional measures, beyond simple harm-compensation, to deal with the expected future threat); and that the 'reasonable doubt' standard is simply what happens to result in the right rate of convictions. (I wrote this idea up in more detail at http://www.ncc-1776.org/tle2011/tle639-20111002-05.html .)

The numbers you assign to "preponderance of the evidence" and "clear and convincing evidence" also seem badly skewed, though less so.

'Preponderance of the evidence' simply means more likely than not - 50%+1. (As I see TimS posted.) 'Clear and convincing evidence' is a level between 'preponderance' and 'beyond reasonable doubt', so without having found any particular surveys or statistics, I put it midway between the two.

Comment author: 25 January 2012 03:51:38AM *  5 points [-]

As a lawyer, I want to say that "preponderance of the evidence" is satisfied by the theory that is most likely given the evidence presented. It's fancy talk for "we can't both lose."

Edit: 50% plus epsilon satisfies the legal standard "preponderance of the evidence"

Comment author: 24 January 2012 10:03:13PM *  0 points [-]

Experimental cmavo are almost always a giant waste of time, with the possible exception of la'oi.

I don't see how the above couldn't be accomplished with cu'o.

EDIT EX: .i lo nu mi klama cu pimucu'o

Comment author: 24 January 2012 10:26:12PM -1 points [-]

http://www.lojban.org/tiki/Criteria+for+evaluating+experimental+cmavo mentions that just because an idea can be expressed without using cmavo, doesn't necessarily mean that the cmavo itself is a bad one. Eg, I could probably express the ideas represented all of baseline Lojban's evidentials without using those specific cmavo, but they're still a handy thing to have.

I've found some use in bei'e; if anyone else does, that's fine, too; if you don't have any use for it, that's still just fine; and in the meantime, it doesn't seem to be using up any scarce resources to have it listed.

Comment author: 24 January 2012 10:57:54PM -1 points [-]

You link to a discussion in which one member of the byfy explicitly disagrees with another member on exactly this issue. That's hardly legislative.

The argument was not from a scarcity of experimental cmavo space. It wastes everyone's time to create trivial experimental cmavo. Compare with la'oi, which was adopted (at least in the IRC channel) almost immediately because it filled an actual gap between what people wanted to say and what the language previously allowed.

Finally, it doesn't make any sense to say that a cmavo is of selma'o MAI, but is "placed like an evidential." Either something is UI or not.

Comment author: 24 January 2012 11:19:01PM -2 points [-]

Okay, so you don't like this tool, and think that the metaphorical toolbox should only contain flathead screwdrivers, not Phillips-head ones; that's fine, I'm not trying to force anyone to use this if they don't want to. But I'm not quite sure what it is you're suggesting I /do/. Do you want me to stop using this word when I think about Bayesian confidence levels? Do you think I should stop telling people about the use I've found in this word? What is the best future that you are hoping I help bring into being?

Finally, it doesn't make any sense to say that a cmavo is of selma'o MAI, but is "placed like an evidential." Either something is UI or not.

Both UI and MAI, including evidentials, fall under the category of "free modifiers", which are supposed to be able to be placed anywhere in a bridi without "changing the meaning". However, evidentials do change the meaning of a sentence, by making it a statement of "how it is for the speaker" - and so, presumably, the idea of free modifiers "not changing the meaning" is somewhat loosely applied.

The basic idea of this word is to tag an individual sumti, or a whole sentence, with a particular number; the only way in Lojban I know of to create a free modifier which can have any number is to make it a MAI; so, technically, that's what I've assigned bei'e as. However, practically, the purpose of the number is to describe the user's belief-level in that sumti, which is very close to how evidentials are used; and so, in a non-technical sense, I describe it as being placed 'like' an evidential. So it's not a UI - it's just used in pretty much the same way that UIs are.

Comment author: 24 January 2012 08:29:45PM 3 points [-]

.a'uru'e I sort of like this because I sort of like almost any tinkering with lojban. Still, I'm not sure if using this for myself would have any more of an effect than just making sure to consciously register the probabilities of my expectations. Of course that conscious attention to it seems to be exactly the benefit you suggested it might have. It would probably take a little getting used to the logarithmic change, but after that period I feel like I would have a better feel for probability in general. I don't have a very good intuition grasp on them now.

ta'o The second column on your site about cniglic, the second column is using tengwar, isn't it?

Comment author: 24 January 2012 09:15:28PM 1 point [-]

ta'o The second column on your site about cniglic, the second column is using tengwar, isn't it?

Indeed it is; http://www.datapacrat.com/cniglic/tengwar.html is my reference for writing Lojban in Tengwar.

Comment author: 24 January 2012 08:03:43PM *  7 points [-]

Why decibels and not bits? I prefer bits, would that fit in your system?

Why? 2 is more natural than ten, and log2(x) is much more natural than log10(x)*10. Worse, there isn't widespread agreement on how decibels are defined; some people (engineers near me) use log10(x)*20 for some reason.

Comment author: 25 January 2012 01:47:43AM *  3 points [-]

some people (engineers near me) use log10(x)*20 for some reason.

The power of sound goes up with the square of the amplitude, so if something has a 10 decibel increase in amplitude, it has a 20 decibel increase in power. As such, I can see how someone might get it mixed up.

A decibel is defined as 10log10(x/y), where y is whatever you're comparing it to. It should never mean 20log10(x/y).

Comment author: 25 January 2012 06:08:26AM 0 points [-]

Well thanks for clearing that up. I would have loved to have that explained when we were all trying to figure out who decided that 20 was a good idea.

Do you know with sound what the baseline 0 db is?

Comment author: 25 January 2012 06:36:57AM 0 points [-]

It's supposed to be on the edge of human hearing. It's also a round number. I don't remember beyond that.

Comment author: 25 January 2012 07:15:03PM 0 points [-]

round number

measured in what? Power density or something?

Comment author: 26 January 2012 12:36:35AM 0 points [-]

It's 20 µPa RMS, so measured in pressure.

Comment author: 26 January 2012 02:26:41AM 0 points [-]

neat, thanks!

Comment author: 24 January 2012 09:11:51PM 1 point [-]

I picked decibels because that's what I'm getting used to as I try to absorb E.T. Jaynes' book "Probability Theory: The Logic of Science".

Lojban is flexible enough to use binary rather than decimal; technically, if you say "ju'u re", everything you say after that is assumed to be in base 2, with the digits "no" 0 and "pa" (and radix point "pi"). Bei'e is still new enough that I can add a note about the use of binary as an option; I'm less familar with using bits in this sense than I am decibels, so if you could give me a reference (or even just type out a representative sample) of how bits compare to a percentage probability, then I could easily add that to the current definition.

Comment author: 24 January 2012 09:23:37PM *  2 points [-]

So "ju'u re pi no pa" is 1/4?

``````0 -> 1:1, 50%
1 -> 2:1, 67%
2 -> 4:1, 80%
3 -> 8:1, 89%
4 -> 16:1, 94%
5 -> 32:1, 97%
6 -> 64:1, 98.5%
etc
``````
Comment author: 24 January 2012 09:40:40PM 1 point [-]

So "ju'u re pi no pa" is 1/4?

Exactly so.

0 -> 1:1, 50% 1 -> 2:1, 67%

To see if I understand this correctly; then in this approach, each increase of 1 bit is equivalent to an increase of 3 decibels?

0 dbs -> 1:1 -> 0 bits 3 dbs -> 2:1 -> 1 bit 6 dbs -> 4:1 -> 2 bits 9 dbs -> 8:1 -> 3 bits 12 dbs -> 16:1 -> 4 bits 15 dbs -> 32:1 -> 5 bits 18 dbs -> 64:1 -> 6 bits

(For precise numbers for the decibels, I use the formula: decibels = 10 * log(10) (LevelOfBelief / (1 - LevelOfBelief)) . So to check how many decibels are equivalent to 64:1 odds, I plug into Google Calculator: 10 * log ((64/65) / (1/65)) , getting 18 and change.)

Comment author: 24 January 2012 09:45:27PM *  1 point [-]

each increase of 1 bit is equivalent to an increase of 3 decibels?

approximately. because 3 bits is a factor of 8 which is approximately 10, and 3 decibels is approximately one third of the way to 10.

Comment author: 25 January 2012 09:03:03PM 0 points [-]

I've tried to take the 'outside view' on this, to see if I'd originally come up with the idea of using bits, whether it would be worth switching to decibels. Using decibels, only two digits brings you all the way to the billions-to-one level of odds, which seems sufficient for everyday purposes; and decibels dividing probability-space more finely allows for easy differentiation between some useful probability numbers, such as 'beyond a reasonable doubt' and 'clear and convincing evidence', which would be blurred if using bits.

So I think that I'm going to keep bei'e as being measured in decibels, add a note to the definition about conversion to bits... and, I think, add a note that anyone who really wants to have an experimental cmavo that uses bits is as free to create and use it as I was to create bei'e. Sound good to you?

Comment author: 25 January 2012 09:24:18PM 1 point [-]

sure why not.

I just find bits more natural. People talk about twice as much (+1 bit) often, and bits are the unit used in information theory and computer science.

Comment author: 24 January 2012 10:30:42PM 1 point [-]

Okay, so we don't actually have to change the base we're counting in, we'd be changing the units we're counting - instead of X number of decibels, it would be X number of bits, and X can be expressed in base 10 either way.

I'd like to take a day or so to think about the best way to approach this - it may very well be as simple as adding a note to the reftext about how to convert decibels to bits by dividing by three.

Comment author: 24 January 2012 10:44:59PM *  1 point [-]

Yeah, number of bits is still expressed in base ten (lol, I just realized that all bases are base 10 in their own base).

EDIT: log(10)/log(2) = 3.3219

Comment author: 25 January 2012 08:05:14PM *  1 point [-]

It seems like divide by 3 should be about right. 2**10 is roughly 10**3, so 30 decibels is about 10 bits. (1024 versus 1000).

Comment author: 25 January 2012 08:40:15PM 1 point [-]
``````log10(1024) = 3.010
10/3.010 = 3.32
``````

You are right, the conversion factor is 3.01.

Comment author: 24 January 2012 08:01:44PM 5 points [-]

We could try something similar even without Lojban, just in written English. We could include the 'evidentials' in text using some agreed upon notation. I guess it would be inconvenient at first, but later it could become more easy.

I just cannot imagine that I would seriously estimate the probability of every sentence I write. That would make my writing too slow, or I would just assign some arbitrary probabilities -- like 80% or 95% -- to most things.

Comment author: 30 May 2012 02:03:09AM *  1 point [-]

Well, here's a first stab. We only need to cover 50-100% since English gives us negations: "unlikely" versus "likely", "unprobable" versus "probable". (If we can express 60% and we want to express 40%, we can just negate whatever we say for 60%.) Going by the above scheme, the 50-100% range requires 13 modifiers. If I replace the >99% for 99%, which I don't think is very useful, I need 13 or so. For infinity or 100%, I think it's better to signal a discontinuity by using a pair like "certain"/"impossible" (neg infinity or 0%) - since they aren't probabilities. I originally set had the range start at 50%, but then it was pointed out that negation was funny, so I realize I had to make that a special word as well, along with 0% and 100%. Half-way through, if I switch from "likely" to "probable" I can reuse the previous modifiers, so I only need to think of 5-6 modifiers.

It turns out that this is a really hard balancing act. I think I'm roughly satisfied with:

``````db % odds English
-∞ 00% 1:∞ impossible
0 50% 1:1 possible
∞ 100% ∞:1 certain
1 55.7% 5:4 likely
2 61.3% 3:2 somewhat likely
3 66.6% 2:1 quite likely
4 71.5% 5:2 very likely
5 76.0% 3:1 highly likely
6 80.0% 4:1 extremely likely
7 83.3% 5:1 probable
8 86.3% 6:1 somewhat probable
9 88.8% 8:1 quite probable
10 90.9% 10:1 very probable
13 95.2% 20:1 highly probable
20 99% 99:1 extremely probable
``````

I'll think about this for a while more, and then I think I'll go through gwern.net and try to rationalize all uses of informal probability to use this scheme. I'm calibrated, so I might as well get some mileage out of it!

Comment author: 30 May 2012 05:22:05AM 5 points [-]

Translating numerical probabilities into verbal labels has been an active area of research. As an entry point into that literature, see the review article Teigen & Brun (2003, Verbal expressions of uncertainty and probability).

You might want to take a look at some of the other attempts out there to try to come up with labels that are more intuitive (I see "likely" and "probable" as equivalent, which would make this system where "somewhat probable" > "extremely likely" very unintuitive for me). Teigen and Brun cite several attempts which "have been made to construct standard lists of verbal expressions, where each phrase is coordinated with an appropriate numeric probability (Beyth-Marom, 1982; Hamm, 1991; Tavana, Kennedy & Mohebbi, 1997; Renooij & Witteman, 1999)." The full citations for those 4 papers are:

Beyth-Marom, R.(1982). How probable is probable? A numerical translation of verbal probability expressions. Journal of Forecasting, 1, 257–269.
Hamm, RM (1991). Selection of verbal probabilities: A solution for some problems of verbal probability expression. Organizational Behavior and Human Decision Processes, 48, 193–223.
Tavana, M., Kennedy, DT & Mohebbi, B.(1997). An applied study using the analytic hierarchy process to translate common verbal phrases to numerical probabilities. Journal of Behavioral Decision Making, 10, 133–150.
Renooij, S. & Witteman, C.(1999). Talking probabilities: Communicating probabilistic information with words and numbers. International Journal of Approximate Reasoning, 22, 169–194.

Comment author: 06 June 2012 09:58:29PM *  0 points [-]

After reading through those cited papers, I think the Kessler scale is still the best of the suggestions and simpler than my own suggestion. I guess I'll just use that in the future. I've made some flashcards to help me memorize them.

Comment author: 31 May 2012 05:03:58PM *  0 points [-]

Kesselman's thesis suggests this mapping: Kesselman List of Estimative Words

• 100% Certainty
• 86-99% Highly Likely
• 56-70% Likely
• 46-55% Chances a Little Better [or Less]
• 31-45% Unlikely
• 13-30% Highly Unlikely
• 1-15% Remote
• 0% Impossibility

I find the middle phrasing entirely unsatisfactory ("possible" is an obvious replacement), and the chunking is a little crude, but I do agree it should be impossible for most people to get the relative rankings wrong and invert any pairs. Not sure if it's better or not; need to read some of your cites, although the review's various PDF homes are all dead right now. EDIT: the book is available though.)

Comment author: 25 June 2012 01:09:30PM 0 points [-]

I find the middle phrasing entirely unsatisfactory ("possible" is an obvious replacement)

"Possible" seems to have two distinct meanings. The first one fits your usage, but the other is more of a binary expression, used to express the fact that something is not impossible. In other words, anything whose probability is equal or greater than 1% (say) can be tagged with "possible", and using this sense of "possible" for the 46-55% range seems wrong - it would deserve a stronger word. To avoid the risk of confusion about which sense is meant, I suggest using something like "entirely possible".

Comment author: 25 June 2012 02:31:14PM 0 points [-]

To me, 'entirely possible' doesn't convey around 50-50; so why bother sticking in an entire other word?

Comment author: 31 May 2012 06:48:50PM 0 points [-]

Notes from Teigen & Brun:

The recurrent findings in these studies are (1) a reasonable degree of between-group consistency, combined with (2) a high degree of within-group variability. In other words, mean estimates of “very probable”, “doubtful” and “improbable” are reasonably similar from study to study, supporting the claim that probability words are translatable; but, at the same time, the interindividual variability of estimates is large enough to represent a potential communication problem. If, for instance, the doctor tells the patient that a cure is “possible”, she may mean a 5 per cent chance, but it may be interpreted to mean a 70 per cent chance, or vice versa. This variability is typically underestimated by the participants themselves. Brun and Teigen (1988) asked medical doctors to specify a range within which would fall 90 per cent of other doctors’ interpretations. This interval included on the average (for 14 verbal phrases) less than 65 per cent of the actual individual estimates. Amer, Hackenbrack and Nelson (1994) found that auditors’ 90 per cent ranges included, on average, only 56 per cent of the individual estimates (for 23 phrases). In other words, the problem posed by interindividual variability appears to be aggravated by a low degree of variability awareness.

...several attempts have been made to construct standard lists of verbal expressions, where each phrase is coordinated with an appropriate numeric probability (Beyth-Marom, 1982; Hamm, 1991; Tavana, Kennedy & Mohebbi, 1997; Renooij & Witteman, 1999)

...Verbal phrases are, furthermore, parts of ordinary language, and thus sensitive to conversational implicatures. So I may say that a particular outcome is somewhat uncertain, not because I think it has a low probability of occurring, but because I want to modify some actual, imagined or implied belief in its occurrence. Such modifications can go in two directions, either upwards or downwards on the probability scale. Verbal probability expressions can accordingly be categorised as having a positive or a negative directionality. They determine whether attention should be directed to the attainment or the non-attainment of the target outcome, and, in doing so, they have the ability to influence people’s judg- ments and decisions in an unambiguous way. Words may be denotatively vague, but they are argumentatively precise. If you tell me that success is “possible”, I know I am being encouraged, even if I do not know whether you have a probability of 30 per cent or of 70 per cent in mind. If you say it is “not certain”, I know I am advised to be careful and to think twice. But if you tell me there is a 45 per cent probability I will not know what to think. The information is precise, but its pragmatic meaning is undecided. Do you mean uncertainty (I have only a 45 per cent chance) or possibility (at least I have a 45 per cent chance)? Likelihood or doubt? Or both?

Comment author: 31 May 2012 06:21:21PM 0 points [-]

The cached HTML of the review is available.

Comment author: 24 January 2012 09:20:33PM 3 points [-]

Evidentials in Lojban are optional, not mandatory; and most of the point of this exercise, for me, is to see if I can improve the probability I assign to sentences from being simply arbitrary. I know that as I started this, many of the probabilities I assigned have been laughably inaccurate - and I haven't been able to think of a way to improve my estimation any faster than by constantly practicing my estimations.

Comment author: 24 January 2012 08:10:56PM 3 points [-]

Using probabilities is assuming too much information you don't actually know. Probabilities depends on calibration, good math, understanding the massive difference between 98%+1% and 50%+1%, and so on.

I'd use more qualitative measures that could be mapped to quantities if need be. Tautology, strong disbelief, disbelief, weak disbelief, etc.

Comment author: 24 January 2012 09:27:49PM 5 points [-]

understanding the massive difference between 98%+1% and 50%+1%

This is one of the more valuable lessons that using logarithmic decibels, instead of linear probabilities, provides. Going from 98% to 99% adds 3 decibels; going from 50% to 51% adds 0.17 decibels.

Qualitative measures are fine - Lojban even has a 'number' word meaning 'about' ("ji'i"), and even getting a rough feel for the confidence-levels involved can be a step up from not having any idea at all, and is a step closer to having a better calibration for quantitative measures.