Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Bayesianism for Humans

52 Post author: ChrisHallquist 29 October 2013 11:54PM

Recently, I completed my first systematic read-through of the sequences. One of the biggest effects this had on me was considerably warming my attitude towards Bayesianism. Not long ago, if you'd asked me my opinion of Bayesianism, I'd probably have said something like, "Bayes' theorem is all well and good when you know what numbers to plug in, but all too often you don't."

Now I realize that that objection is based on a misunderstanding of Bayesianism, or at least Bayesianism-as-advocated-by-Eliezer-Yudkowsky. "When (Not) To Use Probabilities" is all about this issue, but a cleaner expression of Eliezer's true view may be this quote from "Beautiful Probability":

No, you can't always do the exact Bayesian calculation for a problem.  Sometimes you must seek an approximation; often, indeed.  This doesn't mean that probability theory has ceased to apply, any more than your inability to calculate the aerodynamics of a 747 on an atom-by-atom basis implies that the 747 is not made out of atoms.  Whatever approximation you use, it works to the extent that it approximates the ideal Bayesian calculation - and fails to the extent that it departs.

The practical upshot of seeing Bayesianism as an ideal to be approximated, I think, is this: you should avoid engaging in any reasoning that's demonstrably nonsensical in Bayesian terms. Furthermore, Bayesian reasoning can be fruitfully mined for heuristics that are useful in the real world. That's an idea that actually has real-world applications for human beings, hence the title of this post, "Bayesianism for Humans."

Here's my attempt to make an initial list of more directly applicable corollaries to Bayesianism. Many of these corollaries are non-obvious, yet eminently sensible once you think about them, which I think makes for a far better argument for Bayesianism than Dutch Book-type arguments with little real-world relevance. Most (but not all) of the links are to posts within the sequences, which hopefully will allow this post to double as a decent introductory guide to the parts of the sequences that explain Bayesianism.

Comments (37)

Comment author: [deleted] 29 October 2013 07:20:50PM 13 points [-]

You should expect that, on average, a test will leave your beliefs unchanged.

Not quite. You do a test because you expect your beliefs to change. A better phrasing is "You should not expect that a test will move your beliefs in any particular direction." Of course this doesn't capture the theorem that "prior = expected posterior", but that is very hard to communicate accurately in English without referring directly to probability theory concepts. At least strive for not having alternate interpretations that are wrong.

I would add There are two kinds of "no evidence". There's "no evidence for X" because there's no evidence either way because X hasn't been tested, and "no evidence for X" where it's been tested and all the evidence points to not-X. People often use the first kind of "no evidence" as if had the same force as the second. This is totally obvious under Bayesianism, but not widely understood among the scientifically literate.

Comment author: Stabilizer 08 November 2013 05:26:08AM *  2 points [-]

See this for another example of confusion between the two kinds of "no evidence". I summarize:

People think that Cognitive Behavioral Therapy (CBT) is better than psychodynamic/Freudian therapies. This is because CBT has been tested to be better than placebo, but Freudian therapies have not been tested at all; mainly due to historical reasons. Of course, the fact that psychodynamic therapies have not been tested and therefore have no evidence in their favor, isn't evidence against psychodynamic therapies. They simply have no evidence either way.

And when they went out tested CBT, psychodynamic therapies and placebo, they found that CBT and psychodynamics were about equally better than placebo.

Comment author: [deleted] 30 October 2013 06:26:09PM *  6 points [-]

Bayesianism was recently an important boon for me, and the credit belongs entirely to LW. My newborn son has several warning signs for a genetic disease called NF-1. Almost everybody in my family panicked, but I was able to calm myself, and my family members down by pointing out that even if these signs rarely appear in those without the disease, nevertheless the disease is rare enough that it was still quite unlikely that he was afflicted. This in part helped to prevent my son from getting an expensive and painful genetic test. We've since talked to a doctor who assured us that he is very unlikely to have NF-1.

And I didn't do any fine grained math. Bayesianism just led me to be aware of the question of the incidence rate of the disease as a factor.

Comment author: Lumifer 30 October 2013 07:06:28PM 4 points [-]

an expensive and painful genetic test

I thought nowadays genetic testing is completely not painful (requiring a cheek swab at most) and relatively inexpensive. Is that not so?

Comment author: gattsuru 30 October 2013 08:07:48PM *  4 points [-]

Common NF-1 genetic tests require blood samples or cultured biopsy cells, rather than buccal swabs, and can cost over a thousand US dollars. The large size of the gene, and its fairly unusual expression methods, seem to leave that as the preferred tool in the medical literature, even for future technique discovery.

Comment author: ChristianKl 01 November 2013 08:24:45AM 1 point [-]

Common NF-1 genetic tests require blood samples or cultured biopsy cells, rather than buccal swabs, and can cost over a thousand US dollars. The large size of the gene, and its fairly unusual expression methods, seem to leave that as the preferred tool in the medical literature, even for future technique discovery.

Why can't you just target a SNP? Why does the size of the gene matter?

Comment author: CellBioGuy 01 November 2013 10:44:12AM *  12 points [-]

A large gene is a large target for spontaneous mutation. Most people with the disease did not inherit it but instead had something inside the large gene go wrong between their parents and them. For the spontaneous mutations, you likely have never seen that particular difference before.

You also have no idea where in the gene the problem could be and you just ned to sequence the thing. With current sequencing technology you basically need to either throw the entire genome into an Illumina sequencer for many thousands of dollars, or do a number of small custom Sanger sequencing reactions which read you out about 600-800 specific base pairs at a time which are individually not that expensive or difficult but can add up when you need to tile them over a large area. Seeing as the gene is 350 kilobases, in this case it adds up both in terms of cost and in terms of source DNA you need.

SNPs are only useful when there is one or a few ancestral mutant alleles that have spread through the population and in which you can either look for one known causitive change, or a nearby unique SNP that gets dragged along for the ride with the disease allele because it is quite close to it.

EDIT to clear up some questions from a few layers up in the chain: These days looking for known, relatively common genetic variants is very easy, as the success of 23andme illustrates. These tests use microarrays to look for SNPs - this process does not involve sequencing though, but instead only tests the sequence similarity (via binding affinity) of a sample to a set of short reference strands. In order to identify a particular allele with this technique though it needs to have been detected in previous work. The only way to confidently figure out rare or unique variants is to outright sequence and that gets expensive for regions larger than a few kilobases. And hilariously enough, due to the multiple forms of sequencing technology that exist if you need to sequence an area larger than a megabase or two it becomes cheaper to just sequence the entire genome.

Comment author: peter_hurford 29 October 2013 03:12:29AM 4 points [-]

Recently, I completed my first systematic read-through of the sequences.

What was your methodology for the read-through? How much time did it take? Was it worth the time investment?

Comment author: ChrisHallquist 29 October 2013 04:41:57AM 5 points [-]

I loaded the mobi version on to my Kindle and reading it at every spare moment. (I get a lot of reading done by taking out my Kindle at every spare moment.) I didn't have a more sophisticated "methodology" than that. A substantial percentage, maybe half of it, ended up getting read on a long weekend camping trip, when I was without other electronics to distract me. I'd estimate it took ~60 hours of actual time spent reading total, though I don't really know. And yeah I'd say it was worth it.

Comment author: Benito 29 October 2013 06:31:45AM 1 point [-]

Just to check, do you mean the sequences, or the complete blog posts? The latter took me flipping ages to get through...

Comment author: ChrisHallquist 29 October 2013 05:01:37PM 0 points [-]

Yup, the complete blog posts.

Comment author: David_Gerard 29 October 2013 08:44:50AM 4 points [-]

Richard Carrier's Proving History has a couple of chapters of worked examples of applying Bayesian considerations to real-life historical arguments.

Comment author: Jayson_Virissimo 30 October 2013 04:44:26PM 3 points [-]

Are they good examples?

Comment author: David_Gerard 31 October 2013 12:01:46PM *  -1 points [-]

They worked for me, showed me how I could usefully apply this stuff qualitatively (comparisons) without working out the actual numbers.

The actual examples are the question of the historical Jesus. You could say that's an inherently controversial example, therefore bad. However, the question is then "compared to what?" If there is a better set of worked examples, then please present it.

Comment author: bartimaeus 29 October 2013 04:33:47PM 2 points [-]

The post What Bayesianism Taught Me is similar to this one; your post has some elements that that one doesn't have, and that one has a few that you don't have. Combining the two, you end up with quite a nice list.

Comment author: ChrisHallquist 29 October 2013 04:58:20PM 2 points [-]

I want to like that post, because the formatting is so much tidier than the formatting on my post, but I actually disagree with the first two points. I'm in favor of just rolling with the fact that "Bayesian evidence" isn't what we ordinarily mean by "evidence," as useful as the former is. Also, Eliezer's "I don't know" post misses the pragmatics of saying, "I don't know"; we say "I don't know" if we don't have any information the other person is going to care about (the other person usually won't care that there are 10-1000 apples in a tree outside).

Comment author: bartimaeus 29 October 2013 06:19:56PM 1 point [-]

That's true, those points ignore the pragmatics of a social situation in which you use the phrase "I don't know" or "There's no evidence for that". But if you put yourself in the shoes of the boss instead of the employee (in the example given in "I don't know"), where even if you have "no information" you still have to make a decision, then remembering that you probably DO know something that can at least give you an indication of what to do, is useful.

The points are also useful when the discussion is with a rationalist.

Comment author: Tyrrell_McAllister 29 October 2013 06:26:47PM *  1 point [-]

The problem isn't with "I don't know", but with "I don't know anything about that." I agree that "I don't know" is useful.

Comment author: homunq 30 October 2013 08:19:27AM *  -1 points [-]

Even for an ideal reasoner, successful retrospective predictions clearly do not play the same role as prospective predictions. The former must inevitably be part of locating the hypothesis; they thus play a weaker role in confirming it. Eliezer's story you link to is about how the "traditional science" dictum about not using retrospective predictions can be just reversed stupidity; but just reversing young Eliezer's stupidity in the story one more time doesn't yield intelligence.

Edit: this comment has been downvoted, and in considering why that may be, I think there's ambiguities in both "ideal reasoner" and "play the same role". Yes, the value of evidence does not change depending on when a hypothesis was first articulated, so some limitless entity that was capable of simultaneously evaluating all possible hypotheses would not care. However, a perfectly rational but finite reasoner could reasonably consider some amount old evidence to have been "used up" in selecting the hypothesis from an implicit background of alternative hypotheses, without having to enumerate all of those alternatives; and thus habitually avoid recounting a certain amount of retrospective evidence. Any "successful prediction" would presumably be by a hypothesis that had already passed this threshold (otherwise it's just called a "lucky wild-ass guess"). I'm speaking in simple heuristic terms here, but this could be made more rigorous and numeric, up to and including a superhuman level I'd consider "ideal".

Comment author: b1shop 31 October 2013 03:27:36PM 0 points [-]

This was a great post. I'll use it to introduce people to key concepts in the future.

Many of these focus on the posterior's first moment. For continuous distributions, the higher moments matter, too. A test that I expected to lower the variance in my posterior would be considered "confirming" as I use the word. I can't lower the variance before the test is done because it's still possible the mean will change.