Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

# Bayesianism for Humans

52 29 October 2013 11:54PM

Recently, I completed my first systematic read-through of the sequences. One of the biggest effects this had on me was considerably warming my attitude towards Bayesianism. Not long ago, if you'd asked me my opinion of Bayesianism, I'd probably have said something like, "Bayes' theorem is all well and good when you know what numbers to plug in, but all too often you don't."

Now I realize that that objection is based on a misunderstanding of Bayesianism, or at least Bayesianism-as-advocated-by-Eliezer-Yudkowsky. "When (Not) To Use Probabilities" is all about this issue, but a cleaner expression of Eliezer's true view may be this quote from "Beautiful Probability":

No, you can't always do the exact Bayesian calculation for a problem.  Sometimes you must seek an approximation; often, indeed.  This doesn't mean that probability theory has ceased to apply, any more than your inability to calculate the aerodynamics of a 747 on an atom-by-atom basis implies that the 747 is not made out of atoms.  Whatever approximation you use, it works to the extent that it approximates the ideal Bayesian calculation - and fails to the extent that it departs.

The practical upshot of seeing Bayesianism as an ideal to be approximated, I think, is this: you should avoid engaging in any reasoning that's demonstrably nonsensical in Bayesian terms. Furthermore, Bayesian reasoning can be fruitfully mined for heuristics that are useful in the real world. That's an idea that actually has real-world applications for human beings, hence the title of this post, "Bayesianism for Humans."

Here's my attempt to make an initial list of more directly applicable corollaries to Bayesianism. Many of these corollaries are non-obvious, yet eminently sensible once you think about them, which I think makes for a far better argument for Bayesianism than Dutch Book-type arguments with little real-world relevance. Most (but not all) of the links are to posts within the sequences, which hopefully will allow this post to double as a decent introductory guide to the parts of the sequences that explain Bayesianism.

Sort By: Best
Comment author: [deleted] 29 October 2013 07:20:50PM 13 points [-]

You should expect that, on average, a test will leave your beliefs unchanged.

Not quite. You do a test because you expect your beliefs to change. A better phrasing is "You should not expect that a test will move your beliefs in any particular direction." Of course this doesn't capture the theorem that "prior = expected posterior", but that is very hard to communicate accurately in English without referring directly to probability theory concepts. At least strive for not having alternate interpretations that are wrong.

I would add There are two kinds of "no evidence". There's "no evidence for X" because there's no evidence either way because X hasn't been tested, and "no evidence for X" where it's been tested and all the evidence points to not-X. People often use the first kind of "no evidence" as if had the same force as the second. This is totally obvious under Bayesianism, but not widely understood among the scientifically literate.

Comment author: 08 November 2013 05:26:08AM *  2 points [-]

See this for another example of confusion between the two kinds of "no evidence". I summarize:

People think that Cognitive Behavioral Therapy (CBT) is better than psychodynamic/Freudian therapies. This is because CBT has been tested to be better than placebo, but Freudian therapies have not been tested at all; mainly due to historical reasons. Of course, the fact that psychodynamic therapies have not been tested and therefore have no evidence in their favor, isn't evidence against psychodynamic therapies. They simply have no evidence either way.

And when they went out tested CBT, psychodynamic therapies and placebo, they found that CBT and psychodynamics were about equally better than placebo.

Comment author: [deleted] 30 October 2013 06:26:09PM *  6 points [-]

Bayesianism was recently an important boon for me, and the credit belongs entirely to LW. My newborn son has several warning signs for a genetic disease called NF-1. Almost everybody in my family panicked, but I was able to calm myself, and my family members down by pointing out that even if these signs rarely appear in those without the disease, nevertheless the disease is rare enough that it was still quite unlikely that he was afflicted. This in part helped to prevent my son from getting an expensive and painful genetic test. We've since talked to a doctor who assured us that he is very unlikely to have NF-1.

And I didn't do any fine grained math. Bayesianism just led me to be aware of the question of the incidence rate of the disease as a factor.

Comment author: 30 October 2013 07:06:28PM 4 points [-]

an expensive and painful genetic test

I thought nowadays genetic testing is completely not painful (requiring a cheek swab at most) and relatively inexpensive. Is that not so?

Comment author: 30 October 2013 08:07:48PM *  4 points [-]

Common NF-1 genetic tests require blood samples or cultured biopsy cells, rather than buccal swabs, and can cost over a thousand US dollars. The large size of the gene, and its fairly unusual expression methods, seem to leave that as the preferred tool in the medical literature, even for future technique discovery.

Comment author: 01 November 2013 08:24:45AM 1 point [-]

Common NF-1 genetic tests require blood samples or cultured biopsy cells, rather than buccal swabs, and can cost over a thousand US dollars. The large size of the gene, and its fairly unusual expression methods, seem to leave that as the preferred tool in the medical literature, even for future technique discovery.

Why can't you just target a SNP? Why does the size of the gene matter?

Comment author: 01 November 2013 10:44:12AM *  12 points [-]

A large gene is a large target for spontaneous mutation. Most people with the disease did not inherit it but instead had something inside the large gene go wrong between their parents and them. For the spontaneous mutations, you likely have never seen that particular difference before.

You also have no idea where in the gene the problem could be and you just ned to sequence the thing. With current sequencing technology you basically need to either throw the entire genome into an Illumina sequencer for many thousands of dollars, or do a number of small custom Sanger sequencing reactions which read you out about 600-800 specific base pairs at a time which are individually not that expensive or difficult but can add up when you need to tile them over a large area. Seeing as the gene is 350 kilobases, in this case it adds up both in terms of cost and in terms of source DNA you need.

SNPs are only useful when there is one or a few ancestral mutant alleles that have spread through the population and in which you can either look for one known causitive change, or a nearby unique SNP that gets dragged along for the ride with the disease allele because it is quite close to it.

EDIT to clear up some questions from a few layers up in the chain: These days looking for known, relatively common genetic variants is very easy, as the success of 23andme illustrates. These tests use microarrays to look for SNPs - this process does not involve sequencing though, but instead only tests the sequence similarity (via binding affinity) of a sample to a set of short reference strands. In order to identify a particular allele with this technique though it needs to have been detected in previous work. The only way to confidently figure out rare or unique variants is to outright sequence and that gets expensive for regions larger than a few kilobases. And hilariously enough, due to the multiple forms of sequencing technology that exist if you need to sequence an area larger than a megabase or two it becomes cheaper to just sequence the entire genome.

Comment author: 29 October 2013 03:12:29AM 4 points [-]

Recently, I completed my first systematic read-through of the sequences.

What was your methodology for the read-through? How much time did it take? Was it worth the time investment?

Comment author: 29 October 2013 04:41:57AM 5 points [-]

I loaded the mobi version on to my Kindle and reading it at every spare moment. (I get a lot of reading done by taking out my Kindle at every spare moment.) I didn't have a more sophisticated "methodology" than that. A substantial percentage, maybe half of it, ended up getting read on a long weekend camping trip, when I was without other electronics to distract me. I'd estimate it took ~60 hours of actual time spent reading total, though I don't really know. And yeah I'd say it was worth it.

Comment author: 29 October 2013 06:31:45AM 1 point [-]

Just to check, do you mean the sequences, or the complete blog posts? The latter took me flipping ages to get through...

Comment author: 29 October 2013 05:01:37PM 0 points [-]

Yup, the complete blog posts.

Comment author: 29 October 2013 08:44:50AM 4 points [-]

Richard Carrier's Proving History has a couple of chapters of worked examples of applying Bayesian considerations to real-life historical arguments.

Comment author: 30 October 2013 04:44:26PM 3 points [-]

Are they good examples?

Comment author: 31 October 2013 12:01:46PM *  -1 points [-]

They worked for me, showed me how I could usefully apply this stuff qualitatively (comparisons) without working out the actual numbers.

The actual examples are the question of the historical Jesus. You could say that's an inherently controversial example, therefore bad. However, the question is then "compared to what?" If there is a better set of worked examples, then please present it.

Comment author: 29 October 2013 04:33:47PM 2 points [-]

The post What Bayesianism Taught Me is similar to this one; your post has some elements that that one doesn't have, and that one has a few that you don't have. Combining the two, you end up with quite a nice list.

Comment author: 29 October 2013 04:58:20PM 2 points [-]

I want to like that post, because the formatting is so much tidier than the formatting on my post, but I actually disagree with the first two points. I'm in favor of just rolling with the fact that "Bayesian evidence" isn't what we ordinarily mean by "evidence," as useful as the former is. Also, Eliezer's "I don't know" post misses the pragmatics of saying, "I don't know"; we say "I don't know" if we don't have any information the other person is going to care about (the other person usually won't care that there are 10-1000 apples in a tree outside).

Comment author: 29 October 2013 06:19:56PM 1 point [-]

That's true, those points ignore the pragmatics of a social situation in which you use the phrase "I don't know" or "There's no evidence for that". But if you put yourself in the shoes of the boss instead of the employee (in the example given in "I don't know"), where even if you have "no information" you still have to make a decision, then remembering that you probably DO know something that can at least give you an indication of what to do, is useful.

The points are also useful when the discussion is with a rationalist.

Comment author: 29 October 2013 06:26:47PM *  1 point [-]

The problem isn't with "I don't know", but with "I don't know anything about that." I agree that "I don't know" is useful.

Comment author: 30 October 2013 08:19:27AM *  -1 points [-]

Even for an ideal reasoner, successful retrospective predictions clearly do not play the same role as prospective predictions. The former must inevitably be part of locating the hypothesis; they thus play a weaker role in confirming it. Eliezer's story you link to is about how the "traditional science" dictum about not using retrospective predictions can be just reversed stupidity; but just reversing young Eliezer's stupidity in the story one more time doesn't yield intelligence.

Edit: this comment has been downvoted, and in considering why that may be, I think there's ambiguities in both "ideal reasoner" and "play the same role". Yes, the value of evidence does not change depending on when a hypothesis was first articulated, so some limitless entity that was capable of simultaneously evaluating all possible hypotheses would not care. However, a perfectly rational but finite reasoner could reasonably consider some amount old evidence to have been "used up" in selecting the hypothesis from an implicit background of alternative hypotheses, without having to enumerate all of those alternatives; and thus habitually avoid recounting a certain amount of retrospective evidence. Any "successful prediction" would presumably be by a hypothesis that had already passed this threshold (otherwise it's just called a "lucky wild-ass guess"). I'm speaking in simple heuristic terms here, but this could be made more rigorous and numeric, up to and including a superhuman level I'd consider "ideal".

Comment author: 31 October 2013 03:27:36PM 0 points [-]

This was a great post. I'll use it to introduce people to key concepts in the future.

Many of these focus on the posterior's first moment. For continuous distributions, the higher moments matter, too. A test that I expected to lower the variance in my posterior would be considered "confirming" as I use the word. I can't lower the variance before the test is done because it's still possible the mean will change.

Comment author: 29 October 2013 11:31:11AM -3 points [-]

Relatedly, there's Conservation of Expected Evidence. A rational person can't seek to confirm their beliefs, only to test them. You should expect that, on average, a test will leave your beliefs unchanged. If not, you should update your beliefs now based on how you expect the test to turn out.

This appears to be wrong:
Shake a box containing a coin. What is your belief that the coin landed heads? 50% . Will your belief change if you open the box and look inside it? Sure it will.

Comment author: 29 October 2013 11:55:47AM *  9 points [-]

You should expect that, on average, a test will leave your beliefs unchanged.

Emphasis mine.

When I shake the box, my belief that the coin landed heads is 50%. When I look inside, my belief changes, yes, but two one of two options of equal probability: 0% (I see it came out tails), or 100% (I see it came out heads.)

It is trivial to see that my expected posterior belief is 0% * 1/2 + 100% * 1/2 = 50%, or in other words, it's exactly equal to my prior belief.

Comment author: 30 October 2013 01:30:16PM 4 points [-]

The question is whether 'change' signifies only a magnitude or also a direction. The average magnitude of the change in belief when doing an experiment is larger than zero. But the average of change as vector quantity, indicating the difference between belief after and before the test, is zero.

If you drive your car to work and back, then the average velocity of your trip is 0, but the average speed is positive.

Comment author: 01 November 2013 02:25:52PM 4 points [-]

The thread descending from this comment exemplifies a pit that is easy to fall into when reading an informal moral drawn from a precise mathematical result: mistaking the former for the latter, and arguing about the former instead of going to the latter. The whole nugatory discussion would be avoided had people gone back to the original mathematics, which is not deep, and is given in one of the Sequence posts the OP linked to.

This mathematics, which is simple and straightforward, but not a complete triviality, says precisely what is meant by the informal phrase, "Conservation of Expected Evidence", and provides an immediate answer to questions such as "but making an observation will change your belief, so you can expect your belief to change!", or "but what about a lottery ticket, you expect that to lose, don't you?"

There's no point in basing an argument on secondary sources when the primary source is right there.

Comment author: 01 November 2013 03:00:06PM 0 points [-]

I think the problem is that people tend to derive incorrect, or at least misleading, informal beliefs from the correct math.

Comment author: 01 November 2013 02:35:38PM *  2 points [-]

http://en.wikipedia.org/wiki/Law_of_total_expectation

Expectation of your belief E(X) is not the same as your belief X.

Comment author: 29 October 2013 05:00:10PM 0 points [-]

There's a sense in which what I said is true (see ygert's comment), but I agree it's confusing. Suggested re-word? Or maybe I should just cut that point.

Comment author: 29 October 2013 05:25:18PM *  2 points [-]

I think that problem is in the sentence

You should expect that, on average, a test will leave your beliefs unchanged.

That happens to be not true. A test which ouputs useful information WILL change your beliefs. Especially given point 2, one can say "Any informative test will always change your beliefs".

What's tricky here is expectation. You expect your beliefs to change but you don't know in which direction. So your expectation is for zero change even though you know that you'll get some non-zero change.

This looks paradoxical, but is the entirely standard way in which statistics (in particular random variables) operate. Consider a toss of a fair coin. The expectation is half heads half tails which is guaranteed not to happen. You know you'll get either heads or tail but not which one of those two. The expectation will not match the outcome -- all it can do is be equidistant (appropriately weighted) from all possible outcomes.

Comment author: 29 October 2013 07:40:07PM 1 point [-]

I might go with:

Your expectation of the possible beliefs you could have after seeing the test results should match your current belief.

Another option is to try to illustrate both CoEE and Beliefs Pay Rent in Anticipated Experiences at the same time, since I think failing BPRiAE demonstrates an easy way to fail CoEE.

Comment author: 29 October 2013 04:20:07PM *  0 points [-]

All the passage says is that if you believe the coin is unbiased, then you expect to see a roughly 50-50 split between heads and tails. If you expect to see 70:30 split of heads:tails, you ought to believe that the coin is so biased before you do the experiment. It looks trivial when applied to coins, but less so in other contexts. This is a statement about priors, not posteriors, hence the term "expectation". In Eliezer's example, if you are p% confident that an accused is a witch, then you should expect a definitive witch test to exonerate the accused (100-p)% of the time. If any outcome "confirms witchiness", then the test in question is not a test of witchiness.