Probability distributions and writing style

dclayh

Probability distributions and writing style — LessWrong

7 Probability distributions and writing style

by dclayh

4th Jun 2009

1 min read

7

In his recent post, rhollerith wrote,

I am more likely than not vastly better off than I would have been if <I had made decision X>

This reminded me of the slogan for the water-filtration system my workplaces uses,

We're 100% sure it's 99.9% pure!

because both sentences make a claim and give an associated probability for it. Now in this second example, the actual version is better than the expectation-value-preserving "We're 99.9% sure it's 100% pure", because the actual version implies a lower variance in outcomes (and expectation values being equal, a lower variance is nearly always better). But this leads to the question of why rhollerith didn't write something like "I am almost certainly at least somewhat better off than I would have been...".

So I ask: when writing nontechnically, do you prefer to give a modest conclusion with high confidence, or a strong conclusion with moderate confidence? And does this vary with whether you're trying to persuade or merely describe?

(Also feel free to post other examples of this sort of statement from LW or elsewhere; I'd search for them myself if I had any good ideas on how to do so.)

Probability & Statistics

Personal Blog

7

Probability distributions and writing style

New Comment

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:09 AM

[-]Psychohistorian17y20

When writing non technically, I prefer to give the most accurate answer available. If I'm highly confident of a modest conclusion, I state that. If I'm moderately confident of a strong conclusion, I state that. They are two entirely different statements, and, while there are cases where both apply, by no means are they equivalent.

In the relationship context, "better off" tends to come in large, uncertain chunks, so what rhollerith said is reasonable, what you suggested he might say seems very unlikely to be the case.

[-]dclayh17y40

They are two entirely different statements, and, while there are cases where both apply, by no means are they equivalent.

They're obviously not completely equivalent, but in cases where your measurements form some Gaussian (or similar) distribution, which is very common, the you have the choice of saying things like (to use the water-purifying example), "we're 85% confident it's at least 99.97% pure", "we're 97.7% confident it's at least 99.3% pure", "We're 99.9% confident it's at least 98.5% pure", etc., etc., each of which represents a different part of the curve. Now obviously the most complete answer here would be to say "our data are decribed by a Gaussian of mean X and st. dev. Y", but people don't frequently do that in informal contexts, so how do you reduce it to one claim with one confidence?

In the relationship context, "better off" tends to come in large, uncertain chunks, so what rhollerith said is reasonable, what you suggested he might say seems very unlikely to be the case.

Would you go into why that is? It doesn't seem intuitive to me at all. Why shouldn't a relationship improve your life by just a small amount?

[-]sketerpot17y00

Now obviously the most complete answer here would be to say "our data are decribed by a Gaussian of mean X and st. dev. Y", but people don't frequently do that in informal contexts, so how do you reduce it to one claim with one confidence?

My rule of thumb is to say I'm about 95% sure that the true value is within two standard deviations of the mean. It's usually a pretty good compromise, easy to reason with intuitively (try it!), and if your readers actually care about this you can always tack on a little parenthetical note that says "(Gaussian distribution, mean = X, std. dev. = Y)". Or stick it in a footnote, or whatever you can manage without terrifying your readers.

[-]Psychohistorian17y00

Would you go into why that is? It doesn't seem intuitive to me at all. Why shouldn't a relationship improve your life by just a small amount?

From the context, it appears there are two basic outcomes different from the current status quo:

Status quo: Relationship, Utility = High

1st option: No relationship, Utility = Low, probability = probably over .5, under .9

2nd option: Relationship, Utility = from highly negative to extremely high, cumulative probability ~.4-.1.

Thus, what would likely happen is that he's not in a relationship. If he is in a relationship, his happiness could be anywhere over the map. Since it's already high, it's unlikely (though possible) that he would be better off (and even less likely that he would be drastically better off). There's some chance he's just a little bit better off, if he were in a slightly worse relationship. And then there's a rather large chance he's much better off, if the alternative is no relationship or a miserable one. Thus, he's probably vastly better off, but he's not almost certainly a little bit better off. At the risk of overgeneralizing, I'd say that a lot of low-certainty, high-stakes personal utility calculations tend to be non-Gaussian.

And of course the probabilities here are purely for illustrative purposes. If he thought that there was, say, a 10% chance of being single and a 45% chance of being in a miserable relationship, you'd get the same results. I'm assuming his language accurately mapped his estimates.

[-]rhollerith17y00

dclayh, I have replied to you privately.

Specifically, the first likely google hit for "dclayh" is a Livejournal user of that name, so I used Livejournal to send a private message to that user.

Contact rhollerith

[-]dclayh17y00

Thanks for the message. Yes, I believe I'm the only "dclayh" on the internet; at least, all 77 google results are about me.

[-]PhilGoetz17y00

I believe that most people have a strong preference for a modest conclusion with high confidence over a strong conclusion with moderate confidence, and that this is a systematic and incorrect human bias.

[-]rhollerith17y00

BTW it would be great to have all my writings subjected to examination by the community to determine whether the writings use probability distributions, utility functions and the language of causality correctly and sensibly.

Moderation Log